Skip to content

rizkyhamdana/FinLensAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FinLens AI πŸ”πŸ“±

FinLens AI is a premium, modern, end-to-end Machine Learning web application designed to turn messy, raw Indonesian financial transaction texts and screenshots into structured, clean financial insights.

Equipped with a real NLP classification pipeline (TF-IDF + Logistic Regression), a custom Indonesian amount & merchant regex extraction parser, and client-side OCR (Tesseract.js), FinLens AI mimics the behavior of elite startup fintech tools like Vercel, Linear, and Raycast.


✨ Key Features

  • 🧠 Machine Learning Classifier: Real NLP category classification using a trained scikit-learn model, with a hybrid priors rule engine override for high-precision accuracy.
  • 🧾 Unified Ledger: Integrated expenses (classified with ML) and income (manual source records) into a single chronological financial ledger timeline.
  • πŸ“ˆ Balance Engine: Real-time balance tracking, starting from an initial user-defined amount with automated net balance calculations.
  • πŸ“† Multi-Month Filters: Ability to navigate and filter transaction logs, budgets, and net cashflow metrics month-by-month.
  • 🎨 Premium Light & Dark Themes: Beautiful Editorial-style UI with instant theme switching, custom animations, and zero-hydration flash on page load.
  • πŸ‘€ Personal Profiles: Highly customizable user profile modal containing a 32-emoji avatar grid, bio tagline, and automatic persistent sync.
  • πŸ”” Micro-interactions (Toast System): Delightful sliding animations with real-time feedback for entries, deletes, updates, and errors.
  • πŸš€ Cloud Run & Docker Ready: Optimized multi-stage Dockerfiles utilizing Next.js standalone output for extremely lightweight and cost-effective deployments on Google Cloud Run.

πŸ—οΈ Architecture Design

Below is the text-based architecture mapping the transaction classification and extraction flow:

                                    +------------------------------+
                                    |   USER INPUT TRANSACTION     |
                                    +--------------+---------------+
                                                   |
                            +----------------------+----------------------+
                            |                                             |
                            v                                             v
                 [ Manual Text Input ]                           [ Receipt Screenshot ]
                            |                                             |
                            |                                             v
                            |                                     [ Tesseract.js OCR ]
                            |                                             |
                            |                                             v
                            |                                     [ Editable Textarea ]
                            |                                             |
                            +----------------------+----------------------+
                                                   |
                                                   v
                                         +--------------------+
                                         |    POST /predict   |
                                         +----------+---------+
                                                   |
                      +-----------------------------+-----------------------------+
                      |                                                           |
                      v                                                           v
         +------------+-------------+                                +------------+-------------+
         |   ML Model Classifier    |                                |   Regex Parsing Engine    |
         +------------+-------------+                                +------------+-------------+
                      |                                                           |
        [ clean_text() Normalization ]                                 [ extract_merchant() ]
                      |                                             (Matches 100+ Indonesian brands)
                      v                                                           |
          [ TfidfVectorizer (NLP) ]                                               v
                      |                                                 [ extract_amount() ]
                      v                                            (Parses Rp 20.000, 50rb, 10k etc.)
          [ Logistic Regression ]                                                 |
                      |                                                           |
                      v                                                           v
          (Category + Confidence %)                                      (Merchant & Sum integers)
                      |                                                           |
                      +-----------------------------+-----------------------------+
                                                   |
                                                   v
                                      +------------+------------+
                                      |   Structured AI Output  |
                                      +------------+------------+
                                                   |
                                                   v
                                     [ LocalStorage Transaction ]
                                                   |
                                                   v
                                     [ Recharts Dashboard Insights ]

🧠 Machine Learning Pipeline

Our Machine Learning component is a custom Indonesian Natural Language Processing (NLP) classifier:

  1. Text Preprocessing (clean_text):

    • Converts text to lower case.
    • Standardizes currency prefixes (rp, idr) to a clean 'rupiah' identifier.
    • Standardizes numeric thousands abbreviation suffixes (converting 50rb or 120k to 50000 or 120000 respectively).
    • Removes Indonesian thousand delimiters (dots) inside numbers, e.g., 25.000 -> 25000.
    • Replaces all digit-only numbers with a generic 'amount' token to prevent the classifier from over-fitting on specific numerical figures rather than semantic words.
    • Scrubs punctuation and removes excess whitespace.
  2. Feature Extraction (TfidfVectorizer):

    • Extracts character/word patterns using Term Frequency-Inverse Document Frequency.
    • Extends extraction to bi-grams (ngram_range=(1,2)) to capture multi-word semantic cues (e.g. "kopi kenangan", "kartu kredit").
    • Drops sparse words using min_df=2 to ensure stability and keep training noise-free.
  3. Classification Model (LogisticRegression):

    • Trains using a balanced class weight configuration (class_weight='balanced') to neutralize category frequency differences.
    • Outputs highly accurate category labels along with probability confidence scores (predict_proba).
    • Achieved a 97.30% testing accuracy score on our dataset.

πŸ› οΈ Extraction Engines

While the ML model specializes in mapping semantic text to category buckets (food, transport, shopping, bills, entertainment, healthcare), our core FastAPI Parsing Engine extracts structured attributes:

  1. Indonesian Amount Extractor (extract_amount):

    • Identifies suffix modifiers (rb, k -> e.g., 25rb = 25000, 120k = 120000).
    • Identifies prefix indicators (rp., idr followed by digits).
    • Standardizes and eliminates thousand separator markers (25.000 -> 25000).
    • Standardizes fallback standalone figures within expected transactional bounds (500 Rp to 100 Million Rp).
  2. Brand Merchant Matcher (extract_merchant):

    • Features a case-insensitive search index matching 100+ Indonesian companies spanning transit, e-commerce, dining, entertainment, utilities, banks, and healthcare providers.
    • Utilizes sorted length-matching to avoid conflicts (e.g., matching "Mie Gacoan" before "Gacoan", or "ShopeePay" before "Shopee").

πŸš€ Setup & Execution Instructions

Follow these instructions to spin up the entire machine learning pipeline, uvicorn backend, and Next.js frontend.

1. Train the ML Model

Before launching the backend API, the training script must be run to compile model.joblib.

# 1. Navigate to the ML directory
cd ml

# 2. Spin up a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Install core packages
pip install --upgrade pip
pip install -r requirements.txt

# 4. Execute the training pipeline (compiles model.joblib)
python train_model.py

To verify your saved pipeline predictions, you can optionally run:

python evaluate_model.py

2. Launch the FastAPI Backend

With ml/model.joblib created, you can now launch our inference API server.

# 1. Open a new terminal and navigate to the backend directory
cd backend

# 2. Spin up a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Install backend packages
pip install --upgrade pip
pip install -r requirements.txt

# 4. Launch the Uvicorn live server
uvicorn main:app --reload --port 8000

The API Swagger docs are hosted at: http://localhost:8000/docs


3. Spin Up the Next.js Frontend

Start the Next.js dashboard client.

# 1. Open a new terminal and navigate to the frontend directory
cd frontend

# 2. Install NPM dependencies
npm install

# 3. Boot up the local hot-reloaded development client
npm run dev

Open your web browser and navigate to: http://localhost:3000


πŸ“‘ Example API Contract

POST /predict

Sends raw financial logs to our API inference backend.

Request Payload

{
  "text": "Makan siang di McD Rp 75.000 via QRIS"
}

Response Payload

{
  "text": "Makan siang di McD Rp 75.000 via QRIS",
  "category": "food",
  "confidence": 0.985429188048291,
  "merchant": "McD",
  "amount": 75000,
  "explanation": "This transaction is categorized as Food & Dining because it contains keywords or merchants associated with dining, food delivery, or beverages. We identified merchant 'McD' and amount of Rp 75.000 from the transaction description."
}

☁️ Production Cloud Deployment

FinLens AI is production-ready for Google Cloud Run using its built-in Docker support.

For complete, detailed instructions on how to build, deploy, configure CORS, and scale both your Next.js and FastAPI services on GCP, please consult the Cloud Run Deployment Blueprint.

About

AI-powered financial insights platform for Indonesian transactions. Features real-time NLP classification (TF-IDF + Logistic Regression), Tesseract.js OCR, and automated merchant/amount extraction. Built with Next.js, FastAPI, and Scikit-Learn to turn messy financial logs into structured data. πŸš€πŸ”πŸ“±

Topics

Resources

Stars

Watchers

Forks

Contributors