FinLens AI 🔍📱

FinLens AI is a premium, modern, end-to-end Machine Learning web application designed to turn messy, raw Indonesian financial transaction texts and screenshots into structured, clean financial insights.

Equipped with a real NLP classification pipeline (TF-IDF + Logistic Regression), a custom Indonesian amount & merchant regex extraction parser, and client-side OCR (Tesseract.js), FinLens AI mimics the behavior of elite startup fintech tools like Vercel, Linear, and Raycast.

✨ Key Features

🧠 Machine Learning Classifier: Real NLP category classification using a trained scikit-learn model, with a hybrid priors rule engine override for high-precision accuracy.
🧾 Unified Ledger: Integrated expenses (classified with ML) and income (manual source records) into a single chronological financial ledger timeline.
📈 Balance Engine: Real-time balance tracking, starting from an initial user-defined amount with automated net balance calculations.
📆 Multi-Month Filters: Ability to navigate and filter transaction logs, budgets, and net cashflow metrics month-by-month.
🎨 Premium Light & Dark Themes: Beautiful Editorial-style UI with instant theme switching, custom animations, and zero-hydration flash on page load.
👤 Personal Profiles: Highly customizable user profile modal containing a 32-emoji avatar grid, bio tagline, and automatic persistent sync.
🔔 Micro-interactions (Toast System): Delightful sliding animations with real-time feedback for entries, deletes, updates, and errors.
🚀 Cloud Run & Docker Ready: Optimized multi-stage Dockerfiles utilizing Next.js standalone output for extremely lightweight and cost-effective deployments on Google Cloud Run.

🏗️ Architecture Design

Below is the text-based architecture mapping the transaction classification and extraction flow:

                                    +------------------------------+
                                    |   USER INPUT TRANSACTION     |
                                    +--------------+---------------+
                                                   |
                            +----------------------+----------------------+
                            |                                             |
                            v                                             v
                 [ Manual Text Input ]                           [ Receipt Screenshot ]
                            |                                             |
                            |                                             v
                            |                                     [ Tesseract.js OCR ]
                            |                                             |
                            |                                             v
                            |                                     [ Editable Textarea ]
                            |                                             |
                            +----------------------+----------------------+
                                                   |
                                                   v
                                         +--------------------+
                                         |    POST /predict   |
                                         +----------+---------+
                                                   |
                      +-----------------------------+-----------------------------+
                      |                                                           |
                      v                                                           v
         +------------+-------------+                                +------------+-------------+
         |   ML Model Classifier    |                                |   Regex Parsing Engine    |
         +------------+-------------+                                +------------+-------------+
                      |                                                           |
        [ clean_text() Normalization ]                                 [ extract_merchant() ]
                      |                                             (Matches 100+ Indonesian brands)
                      v                                                           |
          [ TfidfVectorizer (NLP) ]                                               v
                      |                                                 [ extract_amount() ]
                      v                                            (Parses Rp 20.000, 50rb, 10k etc.)
          [ Logistic Regression ]                                                 |
                      |                                                           |
                      v                                                           v
          (Category + Confidence %)                                      (Merchant & Sum integers)
                      |                                                           |
                      +-----------------------------+-----------------------------+
                                                   |
                                                   v
                                      +------------+------------+
                                      |   Structured AI Output  |
                                      +------------+------------+
                                                   |
                                                   v
                                     [ LocalStorage Transaction ]
                                                   |
                                                   v
                                     [ Recharts Dashboard Insights ]

🧠 Machine Learning Pipeline

Our Machine Learning component is a custom Indonesian Natural Language Processing (NLP) classifier:

Text Preprocessing (clean_text):
- Converts text to lower case.
- Standardizes currency prefixes (rp, idr) to a clean 'rupiah' identifier.
- Standardizes numeric thousands abbreviation suffixes (converting 50rb or 120k to 50000 or 120000 respectively).
- Removes Indonesian thousand delimiters (dots) inside numbers, e.g., 25.000 -> 25000.
- Replaces all digit-only numbers with a generic 'amount' token to prevent the classifier from over-fitting on specific numerical figures rather than semantic words.
- Scrubs punctuation and removes excess whitespace.
Feature Extraction (TfidfVectorizer):
- Extracts character/word patterns using Term Frequency-Inverse Document Frequency.
- Extends extraction to bi-grams (ngram_range=(1,2)) to capture multi-word semantic cues (e.g. "kopi kenangan", "kartu kredit").
- Drops sparse words using min_df=2 to ensure stability and keep training noise-free.
Classification Model (LogisticRegression):
- Trains using a balanced class weight configuration (class_weight='balanced') to neutralize category frequency differences.
- Outputs highly accurate category labels along with probability confidence scores (predict_proba).
- Achieved a 97.30% testing accuracy score on our dataset.

🛠️ Extraction Engines

While the ML model specializes in mapping semantic text to category buckets (food, transport, shopping, bills, entertainment, healthcare), our core FastAPI Parsing Engine extracts structured attributes:

Indonesian Amount Extractor (extract_amount):
- Identifies suffix modifiers (rb, k -> e.g., 25rb = 25000, 120k = 120000).
- Identifies prefix indicators (rp., idr followed by digits).
- Standardizes and eliminates thousand separator markers (25.000 -> 25000).
- Standardizes fallback standalone figures within expected transactional bounds (500 Rp to 100 Million Rp).
Brand Merchant Matcher (extract_merchant):
- Features a case-insensitive search index matching 100+ Indonesian companies spanning transit, e-commerce, dining, entertainment, utilities, banks, and healthcare providers.
- Utilizes sorted length-matching to avoid conflicts (e.g., matching "Mie Gacoan" before "Gacoan", or "ShopeePay" before "Shopee").

🚀 Setup & Execution Instructions

Follow these instructions to spin up the entire machine learning pipeline, uvicorn backend, and Next.js frontend.

1. Train the ML Model

Before launching the backend API, the training script must be run to compile model.joblib.

# 1. Navigate to the ML directory
cd ml

# 2. Spin up a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Install core packages
pip install --upgrade pip
pip install -r requirements.txt

# 4. Execute the training pipeline (compiles model.joblib)
python train_model.py

To verify your saved pipeline predictions, you can optionally run:

python evaluate_model.py

2. Launch the FastAPI Backend

With ml/model.joblib created, you can now launch our inference API server.

# 1. Open a new terminal and navigate to the backend directory
cd backend

# 2. Spin up a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Install backend packages
pip install --upgrade pip
pip install -r requirements.txt

# 4. Launch the Uvicorn live server
uvicorn main:app --reload --port 8000

The API Swagger docs are hosted at: http://localhost:8000/docs

3. Spin Up the Next.js Frontend

Start the Next.js dashboard client.

# 1. Open a new terminal and navigate to the frontend directory
cd frontend

# 2. Install NPM dependencies
npm install

# 3. Boot up the local hot-reloaded development client
npm run dev

Open your web browser and navigate to: http://localhost:3000

📡 Example API Contract

POST `/predict`

Sends raw financial logs to our API inference backend.

Request Payload

{
  "text": "Makan siang di McD Rp 75.000 via QRIS"
}

Response Payload

{
  "text": "Makan siang di McD Rp 75.000 via QRIS",
  "category": "food",
  "confidence": 0.985429188048291,
  "merchant": "McD",
  "amount": 75000,
  "explanation": "This transaction is categorized as Food & Dining because it contains keywords or merchants associated with dining, food delivery, or beverages. We identified merchant 'McD' and amount of Rp 75.000 from the transaction description."
}

☁️ Production Cloud Deployment

FinLens AI is production-ready for Google Cloud Run using its built-in Docker support.

For complete, detailed instructions on how to build, deploy, configure CORS, and scale both your Next.js and FastAPI services on GCP, please consult the Cloud Run Deployment Blueprint.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
frontend		frontend
ml		ml
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinLens AI 🔍📱

✨ Key Features

🏗️ Architecture Design

🧠 Machine Learning Pipeline

🛠️ Extraction Engines

🚀 Setup & Execution Instructions

1. Train the ML Model

2. Launch the FastAPI Backend

3. Spin Up the Next.js Frontend

📡 Example API Contract

POST `/predict`

Request Payload

Response Payload

☁️ Production Cloud Deployment

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FinLens AI 🔍📱

✨ Key Features

🏗️ Architecture Design

🧠 Machine Learning Pipeline

🛠️ Extraction Engines

🚀 Setup & Execution Instructions

1. Train the ML Model

2. Launch the FastAPI Backend

3. Spin Up the Next.js Frontend

📡 Example API Contract

POST /predict

Request Payload

Response Payload

☁️ Production Cloud Deployment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

POST `/predict`