FinLens AI is a premium, modern, end-to-end Machine Learning web application designed to turn messy, raw Indonesian financial transaction texts and screenshots into structured, clean financial insights.
Equipped with a real NLP classification pipeline (TF-IDF + Logistic Regression), a custom Indonesian amount & merchant regex extraction parser, and client-side OCR (Tesseract.js), FinLens AI mimics the behavior of elite startup fintech tools like Vercel, Linear, and Raycast.
- π§ Machine Learning Classifier: Real NLP category classification using a trained scikit-learn model, with a hybrid priors rule engine override for high-precision accuracy.
- π§Ύ Unified Ledger: Integrated expenses (classified with ML) and income (manual source records) into a single chronological financial ledger timeline.
- π Balance Engine: Real-time balance tracking, starting from an initial user-defined amount with automated net balance calculations.
- π Multi-Month Filters: Ability to navigate and filter transaction logs, budgets, and net cashflow metrics month-by-month.
- π¨ Premium Light & Dark Themes: Beautiful Editorial-style UI with instant theme switching, custom animations, and zero-hydration flash on page load.
- π€ Personal Profiles: Highly customizable user profile modal containing a 32-emoji avatar grid, bio tagline, and automatic persistent sync.
- π Micro-interactions (Toast System): Delightful sliding animations with real-time feedback for entries, deletes, updates, and errors.
- π Cloud Run & Docker Ready: Optimized multi-stage Dockerfiles utilizing Next.js
standaloneoutput for extremely lightweight and cost-effective deployments on Google Cloud Run.
Below is the text-based architecture mapping the transaction classification and extraction flow:
+------------------------------+
| USER INPUT TRANSACTION |
+--------------+---------------+
|
+----------------------+----------------------+
| |
v v
[ Manual Text Input ] [ Receipt Screenshot ]
| |
| v
| [ Tesseract.js OCR ]
| |
| v
| [ Editable Textarea ]
| |
+----------------------+----------------------+
|
v
+--------------------+
| POST /predict |
+----------+---------+
|
+-----------------------------+-----------------------------+
| |
v v
+------------+-------------+ +------------+-------------+
| ML Model Classifier | | Regex Parsing Engine |
+------------+-------------+ +------------+-------------+
| |
[ clean_text() Normalization ] [ extract_merchant() ]
| (Matches 100+ Indonesian brands)
v |
[ TfidfVectorizer (NLP) ] v
| [ extract_amount() ]
v (Parses Rp 20.000, 50rb, 10k etc.)
[ Logistic Regression ] |
| |
v v
(Category + Confidence %) (Merchant & Sum integers)
| |
+-----------------------------+-----------------------------+
|
v
+------------+------------+
| Structured AI Output |
+------------+------------+
|
v
[ LocalStorage Transaction ]
|
v
[ Recharts Dashboard Insights ]
Our Machine Learning component is a custom Indonesian Natural Language Processing (NLP) classifier:
-
Text Preprocessing (
clean_text):- Converts text to lower case.
- Standardizes currency prefixes (
rp,idr) to a clean'rupiah'identifier. - Standardizes numeric thousands abbreviation suffixes (converting
50rbor120kto50000or120000respectively). - Removes Indonesian thousand delimiters (dots) inside numbers, e.g.,
25.000->25000. - Replaces all digit-only numbers with a generic
'amount'token to prevent the classifier from over-fitting on specific numerical figures rather than semantic words. - Scrubs punctuation and removes excess whitespace.
-
Feature Extraction (
TfidfVectorizer):- Extracts character/word patterns using Term Frequency-Inverse Document Frequency.
- Extends extraction to bi-grams (
ngram_range=(1,2)) to capture multi-word semantic cues (e.g."kopi kenangan","kartu kredit"). - Drops sparse words using
min_df=2to ensure stability and keep training noise-free.
-
Classification Model (
LogisticRegression):- Trains using a balanced class weight configuration (
class_weight='balanced') to neutralize category frequency differences. - Outputs highly accurate category labels along with probability confidence scores (
predict_proba). - Achieved a 97.30% testing accuracy score on our dataset.
- Trains using a balanced class weight configuration (
While the ML model specializes in mapping semantic text to category buckets (food, transport, shopping, bills, entertainment, healthcare), our core FastAPI Parsing Engine extracts structured attributes:
-
Indonesian Amount Extractor (
extract_amount):- Identifies suffix modifiers (
rb,k-> e.g.,25rb=25000,120k=120000). - Identifies prefix indicators (
rp.,idrfollowed by digits). - Standardizes and eliminates thousand separator markers (
25.000->25000). - Standardizes fallback standalone figures within expected transactional bounds (500 Rp to 100 Million Rp).
- Identifies suffix modifiers (
-
Brand Merchant Matcher (
extract_merchant):- Features a case-insensitive search index matching 100+ Indonesian companies spanning transit, e-commerce, dining, entertainment, utilities, banks, and healthcare providers.
- Utilizes sorted length-matching to avoid conflicts (e.g., matching
"Mie Gacoan"before"Gacoan", or"ShopeePay"before"Shopee").
Follow these instructions to spin up the entire machine learning pipeline, uvicorn backend, and Next.js frontend.
Before launching the backend API, the training script must be run to compile model.joblib.
# 1. Navigate to the ML directory
cd ml
# 2. Spin up a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 3. Install core packages
pip install --upgrade pip
pip install -r requirements.txt
# 4. Execute the training pipeline (compiles model.joblib)
python train_model.pyTo verify your saved pipeline predictions, you can optionally run:
python evaluate_model.pyWith ml/model.joblib created, you can now launch our inference API server.
# 1. Open a new terminal and navigate to the backend directory
cd backend
# 2. Spin up a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 3. Install backend packages
pip install --upgrade pip
pip install -r requirements.txt
# 4. Launch the Uvicorn live server
uvicorn main:app --reload --port 8000The API Swagger docs are hosted at: http://localhost:8000/docs
Start the Next.js dashboard client.
# 1. Open a new terminal and navigate to the frontend directory
cd frontend
# 2. Install NPM dependencies
npm install
# 3. Boot up the local hot-reloaded development client
npm run devOpen your web browser and navigate to: http://localhost:3000
Sends raw financial logs to our API inference backend.
{
"text": "Makan siang di McD Rp 75.000 via QRIS"
}{
"text": "Makan siang di McD Rp 75.000 via QRIS",
"category": "food",
"confidence": 0.985429188048291,
"merchant": "McD",
"amount": 75000,
"explanation": "This transaction is categorized as Food & Dining because it contains keywords or merchants associated with dining, food delivery, or beverages. We identified merchant 'McD' and amount of Rp 75.000 from the transaction description."
}FinLens AI is production-ready for Google Cloud Run using its built-in Docker support.
For complete, detailed instructions on how to build, deploy, configure CORS, and scale both your Next.js and FastAPI services on GCP, please consult the Cloud Run Deployment Blueprint.