# Budget Analysis V1

This is my second attempt at viewing my spending data. In `v0`, we explored how to deduplicate the data across `*.csv` files for when I arbitrarily download multiple files. First, I'm going to construct all the AI generated code into the block below.

In [3]:
import finance_cleaner as fc

df = fc.clean_transactions("../data/transactions_*.csv")
df.to_csv('../data/finance_cleaner_output.csv')


=== READING IN FILES ===

✅ Read in frame transactions_2025_06_13.csv with shape (1988, 9)
✅ Read in frame transactions_dummy_to_test_duplicates.csv with shape (1988, 9)

=== CONVERTING TYPES ===

✅ Converted date string to_datetime
✅ Converted boolean strings to boolean
✅ Converted boolean strings to boolean
✅ Converted amount strings to numeric

=== ADDING ROW HASHES ===

✅ Created columns IntraKey, CrossKey, and RowID
✅ Set RowID as index

=== COALESCING DUPLICATES on key='CrossKey' ===

  Merging 2 rows for group '0a85a'. RowIDs: ['0a85a_1', '0a85a_2']
  Merging 2 rows for group '0dd4d'. RowIDs: ['0dd4d_1', '0dd4d_2']
  Merging 3 rows for group '0f0d5'. RowIDs: ['0f0d5_1', '0f0d5_2', '0f0d5_3']
  ... (remaining rows hidden)

✅ Total rows after de-duplication: 3885

=== REMOVING DUPLICATES on key='IntraKey' ===

  Removing 1939 duplicate rows:
  RowID: 47 | Date: 2025-02-10 00:00:00 | Desc: FID BKG SVC LLC DES MONEYLINE INDN LEN G HUANG CO PPD | Amt: -200.0
  RowID: 82 | Date: 2024

# LLM Integration

Let's do some set up so that we can use LLM's to parse and understand our CSV.

In [6]:
from openrouter import OpenRouterClient

# Set up the client
client = OpenRouterClient()

# Get first 5 rows as string
first_5_rows = df.head(5).to_string()

# Query LLM with budget expert system prompt
response = client.quick_query(
    first_5_rows,
    system_message="Provide a brief summary of this transaction data, with short sentences and bullet points"
)

print("\n=== Budget Analysis ===")
print(response)


=== Budget Analysis ===
Here's a summary of the transaction data:

*   The data contains financial transactions.
*   Includes information like amount, account, category, and date.
*   Shows descriptions of the transactions.
*   Mentions the institution where the transaction occurred.
*   Some transactions are transfers from Ally Bank to a person.
*   Venmo transactions involve energy, rideshares, and income.



# More Nuanced Category Understanding

With this in mind, we can seek to accomplish a more nuanced understanding of how we organize data. Rather than providing hardcoded regex rules, I seek to provide a list of human language rules that we can use to get more nuanced understandings of the data.