# Stage 1: Define the Problem Statement and Solve it Yourself

**Problem Statement**: Most budgeting apps still require manual data entry of incomes/expenses, which heavily limits their practical utility. This is likely because transactions can be recorded in various formats(e.g. a physical receipt with 2-line spacing and Courier font, a bank account statement pdf with single-line spacing and Times New Roman font), which makes the programming of an automated transaction recording software intractable.

**Proposed Solution(Data Flow from solving it myself)**: Picture of transaction -> Isolate transactions, associating any respective not yet applied additions/deductions, and any additional specifications per text request -> Apply additions/deductions on the transactions-> Surface trransactions in App

Scope: Inputs of Receipts only(for now)

# Stage 2: Define the Steps and Build a System that has them

A 3-Step System:

1. A **LLM Tool** that processes the image  
   **Input:** images of receipts  
   **Output:** structured outputs with the optional fields:
   - Transaction Amount
   - Transaction Date
   - Transaction Type (Outgoing/Incoming)
   - Transaction Category (Food/Transportation/Rent/Debt Payments/etc)
   - Transaction Recipient (Outgoing)
   - Transaction Sender (Incoming)
   - Transaction Description
   - Not Yet Applied Additions/Deductions


2. A **Code Tool** that applies the Additions/Deductions  
    **Input**: structured outputs from the LLM Tool  
    **Output**: the same structured output from the input, with the additions/deductions field dropped and applied to the Transaction Amount  

3. A **Code Tool** that surfaces the transactions on the App UI

**Input**: structured outputs from the Code Tool

**Output**: a ui component


In [16]:
import json
from pathlib import Path
from typing import Optional, List, Literal, Union, Any

import google.genai as genai
from google.genai import types
from pydantic import BaseModel, condecimal
from datetime import date
from dotenv import load_dotenv


# ---------- Domain models (kept at module scope so they can be reused/imported) ----------

Money = condecimal(max_digits=12, decimal_places=2, ge=0)


class AdjustmentBase(BaseModel):
    kind: Literal["addition", "deduction"]
    description: Optional[str] = None


class PercentAdjustment(AdjustmentBase):
    percent: int


class AmountAdjustment(AdjustmentBase):
    amount: Money


Adjustment = Union[PercentAdjustment, AmountAdjustment]


class Transaction(BaseModel):
    transaction_amount: Optional[Money] = None
    transaction_date: Optional[date] = None
    transaction_type: Optional[Literal["outgoing", "incoming"]] = None
    transaction_category: Optional[str] = None
    transaction_recipient: Optional[str] = None
    transaction_sender: Optional[str] = None
    transaction_description: Optional[str] = None
    not_yet_applied_additions_deductions: Optional[List[Adjustment]] = None


# ---------- Helper functions ----------

def load_environment(dotenv_path: Optional[str] = "../../.env") -> None:
    """
    Load environment variables from the provided .env path if it exists.
    Falls back to default environment if the file is not present.
    """
    if dotenv_path and Path(dotenv_path).exists():
        load_dotenv(dotenv_path=dotenv_path)
    else:
        load_dotenv()  # no-op if no .env; still loads system env


def read_image_bytes(image_path: str) -> bytes:
    """
    Read an image from disk and return its bytes.
    Raises FileNotFoundError with a helpful message if missing.
    """
    p = Path(image_path)
    if not p.exists():
        raise FileNotFoundError(f"Could not locate the image file at {image_path}")
    return p.read_bytes()


def generate_structured_transactions(
    *,
    client: genai.Client,
    image_bytes: bytes,
    user_text: str,
    system_instruction: Optional[str] = None,
    model: str = "gemini-2.5-flash",
    image_mime_type: str = "image/jpeg",
) -> Any:
    """
    Call the model with the receipt image and prompt, requesting a structured response.

    Returns the raw response object from the SDK (so you can inspect metadata if desired).
    """
    default_sys_instruction = """You are a transaction recording agent. You are given an image of a receipt and an optional user request.
    You need to extract information from transactions the receipt image per the provided schema, while adhering to the user request, if given. 
    You must only list the information as it is written in the receipt image, without any additional changes. Hence, for any additions/deductions not already directly applied onto the raw transaction amounts(e.g. taxes/service charges), ensure that they are recorded as not yet applied addiitions/deductions, and if it is a percentage that is not yet written, extract only the percentage, not the implicitly resulting amount."""
    sys_instruction = system_instruction if system_instruction else default_sys_instruction

    response = client.models.generate_content(
        model=model,
        contents=[
            user_text,
            types.Part.from_bytes(
                data=image_bytes,
                mime_type=image_mime_type,
            ),
        ],
        config={
            "response_mime_type": "application/json",
            # Ask Gemini to emit a list of Transaction objects as JSON
            "response_schema": list[Transaction],
            "system_instruction": sys_instruction,
        },
    )
    return response

# ---------- Orchestrator / example entry-point ----------

def extract_transactions_from_receipt(
    image_path: str,
    user_text: str,
    *,
    dotenv_path: Optional[str] = "../../.env",
    model: str = "gemini-2.5-flash",
    system_instruction: Optional[str] = None,
    image_mime_type: str = "image/jpeg",
) -> Any:
    """
    High-level convenience wrapper that:
    1) Loads environment variables
    2) Reads the image
    3) Calls Gemini for structured extraction
    4) Parses the JSON (or returns raw text if not JSON)

    Returns a Python object (dict/list) if JSON, otherwise a string with raw text.
    """
    load_environment(dotenv_path)
    image_bytes = read_image_bytes(image_path)
    client = genai.Client()
    response = generate_structured_transactions(
        client=client,
        image_bytes=image_bytes,
        user_text=user_text,
        system_instruction=system_instruction,
        model=model,
        image_mime_type=image_mime_type,
    )
    return response



# Example user request & inputs
user_request = "I only ate the Omurice, so only record the transaction for that."
image_path = "test_images/food_receipt_1.jpg"

data = extract_transactions_from_receipt(
    image_path=image_path,
    user_text=user_request,
    dotenv_path="../../.env",
    model="gemini-2.5-flash",
    image_mime_type="image/jpeg",
)
data = json.loads(data.text)
print(json.dumps(data, indent=2, ensure_ascii=False))



[
  {
    "transaction_amount": 29.9,
    "transaction_date": "2025-03-29",
    "transaction_type": "outgoing",
    "transaction_category": "food",
    "transaction_recipient": "Monas Group Sdn Bhd",
    "transaction_description": "Omurice",
    "not_yet_applied_additions_deductions": [
      {
        "kind": "addition",
        "description": "SERVICE CHARGE",
        "percent": 10
      }
    ]
  }
]


In [7]:
# --- concise adjustments (running subtotal, 2dp HALF_UP) ---
from decimal import Decimal, ROUND_HALF_UP

TWOPLACES = Decimal("0.01")
def q(x: Decimal) -> Decimal: return x.quantize(TWOPLACES, rounding=ROUND_HALF_UP)

def apply_adjustments(data: list[dict]) -> list[dict]:
    out = []
    for tx in data:
        base = q(Decimal(str(tx["transaction_amount"])))
        subtotal = base
        total_adds = Decimal("0")
        total_deds = Decimal("0")
        applied = []

        for adj in (tx.get("not_yet_applied_additions_deductions") or []):
            kind = adj["kind"]  # "addition" | "deduction"
            desc = adj.get("description")

            if adj.get("amount") is not None:
                amt = q(Decimal(str(adj["amount"])))
                applied_on = None
                typ = "amount"
            else:
                pct = Decimal(str(adj["percent"])) / Decimal(100)
                applied_on = q(subtotal)  # running base for %
                amt = q(applied_on * pct)
                typ = "percent"

            delta = amt if kind == "addition" else -amt
            subtotal = q(subtotal + delta)
            if delta >= 0: total_adds += amt
            else:          total_deds += amt.copy_abs()

            applied.append({
                "kind": kind,
                "type": typ,
                "value": str(adj.get("amount", adj.get("percent"))),
                "applied_on": None if applied_on is None else str(applied_on),
                "amount_delta": str(delta),
                "description": desc,
            })

        out.append({
            **tx,
            "final_amount": str(subtotal),
        })
    return out

# usage:
applied_results = apply_adjustments(data)
print(json.dumps(applied_results, indent=2))


[
  {
    "transaction_amount": 29.9,
    "transaction_date": "2025-03-29",
    "transaction_type": "outgoing",
    "transaction_category": "Food",
    "transaction_recipient": "Monas Group Sdn Bhd",
    "transaction_description": "Omurice",
    "not_yet_applied_additions_deductions": [
      {
        "kind": "addition",
        "description": "SERVICE CHARGE",
        "percent": 10
      }
    ],
    "final_amount": "32.89"
  }
]


In [8]:
import pandas as pd
df = pd.DataFrame(applied_results)
df

Unnamed: 0,transaction_amount,transaction_date,transaction_type,transaction_category,transaction_recipient,transaction_description,not_yet_applied_additions_deductions,final_amount
0,29.9,2025-03-29,outgoing,Food,Monas Group Sdn Bhd,Omurice,"[{'kind': 'addition', 'description': 'SERVICE ...",32.89


# Stage 3: Define the Qualities and Optimise the System to have them

Qualities:
1. Accuracy of the information should be >90%
2. Transaction Category should be 'Relevant'

In [17]:
# Example user request & inputs
user_request = "Record all transactions separately."
image_path = "test_images/food_receipt_1.jpg"

data = extract_transactions_from_receipt(
    image_path=image_path,
    user_text=user_request,
    dotenv_path="../../.env",
    model="gemini-2.5-flash",
    image_mime_type="image/jpeg",
)
data = json.loads(data.text)
applied_results = apply_adjustments(data)
# print(json.dumps(applied_results, indent=2))
df = pd.DataFrame(applied_results)
df

Unnamed: 0,transaction_amount,transaction_date,transaction_type,transaction_category,transaction_recipient,transaction_description,not_yet_applied_additions_deductions,final_amount
0,28.0,2025-03-29,outgoing,food,Monas Group Sdn Bhd,Smoked Duck Mayo Pizza,"[{'kind': 'addition', 'description': 'SERVICE ...",30.8
1,24.9,2025-03-29,outgoing,food,Monas Group Sdn Bhd,Shashuka,"[{'kind': 'addition', 'description': 'SERVICE ...",27.39
2,29.9,2025-03-29,outgoing,food,Monas Group Sdn Bhd,"Omurice (Original Japanese Curry, Chicken Katsu)","[{'kind': 'addition', 'description': 'SERVICE ...",32.89
3,33.9,2025-03-29,outgoing,food,Monas Group Sdn Bhd,Tomyum Mentaiko Squid Ink Pasta,"[{'kind': 'addition', 'description': 'SERVICE ...",37.29
4,14.9,2025-03-29,outgoing,beverages,Monas Group Sdn Bhd,Classic Matcha Latte (COLD),"[{'kind': 'addition', 'description': 'SERVICE ...",16.39
5,19.9,2025-03-29,outgoing,dessert,Monas Group Sdn Bhd,Tiramisu,"[{'kind': 'addition', 'description': 'SERVICE ...",21.89


In [21]:
# Example user request & inputs
user_request = "Record all transactions separately."
image_path = "test_images/food_receipt_2.jpg"

data = extract_transactions_from_receipt(
    image_path=image_path,
    user_text=user_request,
    dotenv_path="../../.env",
    model="gemini-2.5-flash",
    image_mime_type="image/jpeg",
)
data = json.loads(data.text)
applied_results = apply_adjustments(data)
# print(json.dumps(applied_results, indent=2))
df = pd.DataFrame(applied_results)
df

Unnamed: 0,transaction_amount,transaction_date,transaction_type,transaction_category,transaction_recipient,transaction_description,not_yet_applied_additions_deductions,final_amount
0,30.86,2025-01-18,outgoing,food,ICHIBAN RAMEN,Red Spicy Ichiban Ramen,"[{'kind': 'addition', 'description': 'ServiceC...",51.89
1,8.44,2025-01-18,outgoing,food,ICHIBAN RAMEN,RAO(Chicken Gyoza),"[{'kind': 'addition', 'description': 'ServiceC...",25.74
2,0.86,2025-01-18,outgoing,beverage,ICHIBAN RAMEN,Plain Water Cold,"[{'kind': 'addition', 'description': 'ServiceC...",16.91
3,2.93,2025-01-18,outgoing,food,ICHIBAN RAMEN,Egg,"[{'kind': 'addition', 'description': 'ServiceC...",19.32
4,2.93,2025-01-18,outgoing,beverage,ICHIBAN RAMEN,Green Tea Hot,"[{'kind': 'addition', 'description': 'ServiceC...",19.32
5,29.14,2025-01-18,outgoing,food,ICHIBAN RAMEN,Ichiban Ramen M,"[{'kind': 'addition', 'description': 'ServiceC...",49.88
6,17.16,2025-01-18,outgoing,food,ICHIBAN RAMEN,Karaage Ramen M,"[{'kind': 'addition', 'description': 'ServiceC...",35.92
7,3.45,2025-01-18,outgoing,food,ICHIBAN RAMEN,Ajitama Egg,"[{'kind': 'addition', 'description': 'ServiceC...",19.93
