A fully client-side, zero-dependency AI system for predicting warehouse routing destinations from parcel tracking IDs — built entirely in a single HTML file.
- Overview
- Features
- How It Works — The Three-Layer Cascade Engine
- AI Model — Deep Dive
- Multi-Device Session System
- User Interface
- Getting Started
- CSV Training Data Format
- Tech Stack
- Architecture Diagram
- Browser Compatibility
- License
Warehouse Routing Intelligence is a single-file web application that uses a three-layer machine learning cascade to predict which warehouse a parcel belongs to based solely on its tracking ID (and optionally its merchant name). It is designed for warehouse scanning workflows where fast, accurate routing decisions are critical.
The entire AI pipeline — training, inference, serialization, and UI — runs 100% in the browser. No server, no backend, no external ML library. You bring a CSV of historical routing data; WRI learns from it and starts classifying in real-time.
- 🧠 Three-layer AI cascade — rule engine → merchant lookup → gradient-boosted tree ensemble
- 📋 Train from CSV — upload any CSV with
trackingId, merchant, warehousecolumns to train a model on the fly - 💾 Save & load models — export trained models as JSON files and reload them instantly
- 📡 Multi-device real-time sessions — link a desktop hub to mobile scanner devices over peer-to-peer WebRTC (via PeerJS)
- 📱 Mobile & desktop modes — responsive UI with a dedicated mobile scanning view and fullscreen result overlay
- 🔗 QR code join — generate a scannable QR code so mobile devices can auto-join a session
- 🌗 Dark/light theme — toggleable via floating action button
- 📊 Live analytics — per-warehouse and per-merchant scan counters with progress bars
- 🔁 No internet required after load — fully offline capable once the page is loaded
Every tracking ID lookup passes through up to three decision layers. The engine returns as soon as any layer produces a confident result, falling through to the next only when needed.
Input ID
│
▼
┌─────────────────────────────┐
│ Layer 1: Rule Engine │ ~65% coverage, ~98% accuracy
│ (deterministic patterns) │──────────────────► Result (if certain)
└─────────────┬───────────────┘
│ uncertain / not covered
▼
┌─────────────────────────────┐
│ Layer 2: Merchant Lookup │ Pure merchants → certain
│ (lookup table from CSV) │──────────────────► Result (if pure merchant)
└─────────────┬───────────────┘
│ mixed merchant or unknown
▼
┌─────────────────────────────┐
│ Layer 3: GBT Ensemble │ Handles ambiguous POZ1 vs POZ2 splits
│ (trained on feature vecs) │──────────────────► Result + confidence
└─────────────────────────────┘
│ GBT + soft merchant signal blend
▼
Final prediction: { warehouse, confidence, resolvedBy }
Covers approximately 65% of all IDs with roughly 98% accuracy using hard-coded structural patterns derived from data analysis.
20-digit numeric IDs (DHL/GLS format)
- Inspects the sort code at positions 8–11 and 4–7
- Known BER3 sort codes:
7577,5249,5192,5139,7137,5140,9213,7824,0609 - Known POZ sort codes:
0734,5320,7760,8803,8830,9247,8859,8852(passed to GBT for POZ1 vs POZ2 disambiguation) - Prefix matching on first 10 characters for additional certainty
11-digit numeric IDs
- Prefix
1155→ BER3 (99% confidence) - Prefix
115→ BER3 (88% confidence)
International format ([2 letters][digits][2 letters], e.g. CD123456DE)
- Prefixes
CD,LE,CB,UE,CM→ BER3 (96% confidence) - Prefixes
CA,UF→ passed to GBT (POZ split)
(Y) internal system codes
(Y)#[A-Z]{2}...(withoutSRDE) → BER3 (98% confidence)- Contains
SRDE→ POZ, passed to GBT - Other
(Y)[A-Z0-9]codes → BER3 (85% confidence)
Explicit prefix rules
LE\d→ BER3 (96%)SCC→ BER3 (99%)CD3→ BER3 (99%)9L3→ POZ1 (99%)
If the rule engine returns certain: false, the ID falls through to Layer 2.
Built at training time from the CSV data. Merchants are categorized as:
- Pure merchants — 100% of their historical parcels routed to a single warehouse. 162 out of 190 merchants fall into this bucket. Lookups are deterministic (confidence = 1.0).
- Mixed merchants — split routing history, typically between POZ variants. 28 merchants fall here. Their data becomes a soft signal blended with Layer 3.
When a pure merchant match is found, the result is returned immediately. Mixed merchant data is passed as a weighted prior to the GBT model in Layer 3.
A pure-JavaScript GBT implementation with no external ML dependencies. This layer handles the hardest cases — particularly disambiguating POZ1 vs POZ2 routing where IDs share similar structural patterns.
Architecture:
- Multi-class classifier (one tree ensemble per output class, OvR style)
- Ensemble of
Nboosting rounds, each round adds one tree per class - Leaf values represent soft probability distributions (not hard labels)
- Final prediction uses softmax over summed logits across all trees
- Learning rate:
0.25 - Training uses a 15% validation split for accuracy reporting
Tree construction:
- Gini impurity criterion for split selection
- Random feature subsampling per node:
max(4, floor(sqrt(n_features)))features considered per split - Up to 10 candidate split thresholds per feature (midpoints between unique values)
- Configurable
maxDepthandminSamplesper leaf
Each tracking ID is converted to a fixed-length feature vector (Float32Array) of ~42 dimensions:
| Group | Features | Description |
|---|---|---|
| Format type | 6 | One-hot: NUM20, NUM11, INTL, Y_HASH_ALPHA, Y_HASH_NUM, Y_OTHER |
| ID length | 1 | Normalised to [0, 1] over range 0–30 |
| Positional char codes | 14 | Character codes at positions 0–13, normalised by 127 |
| 4-char window hashes | 4 | Rolling hash of 4-char windows at positions 0, 4, 8, 12 |
| Prefix hashes | 5 | Hash of first 4, 6, 8, 10, 12 characters |
| Suffix hashes | 2 | Hash of last 2 and last 4 characters |
| Character ratios | 3 | Digit ratio, alpha ratio, special-char ratio |
| Positional bigrams | 9 | Binary features for known discriminative position+bigram combos |
Hashing function (32-bit polynomial rolling hash):
h = 0
for each char c: h = (h * 31 + charCode(c)) & 0xFFFF
normalized = h / 65535
Discriminative positional bigrams used:
0:00, 4:04, 8:07, 8:05, 8:08, 0:LE, 0:CD, 0:CA, 0:CB
Training is triggered by uploading a CSV file. The full pipeline runs asynchronously in the browser:
- CSV parsing — extracts
{ trackingId → { merchant, warehouse } }mapping - Stat building — constructs prefix maps, n-gram stats (2–4), Markov suffix stats, length stats, positional bigram stats
- Merchant table building — classifies each merchant as pure or mixed
- Feature matrix construction — calls
extractFeatures(id)for every training ID - Label encoding — maps warehouse strings to class indices
- Train/validation split — 15% held out for accuracy estimation
- GBT training loop — iterates boosting rounds, fitting one shallow tree per class per round on the residuals
- Validation accuracy — computed on held-out split and displayed in the UI
- Auto-save — completed model is serialised and downloaded as
warehouse_model.json
A progress bar in the sidebar tracks training completion in real-time.
Trained models are saved as JSON with the following schema:
{
"version": 2,
"savedAt": "<ISO timestamp>",
"gbtClasses": ["BER3", "POZ1", "POZ2"],
"gbtAccuracy": 0.94,
"gbt": [ /* serialised tree ensemble */ ],
"merchantStats": { "<merchant>": { "<warehouse>": <count> } },
"pureMerchants": { "<merchant>": "<warehouse>" },
"mixedMerchants": { "<merchant>": true },
"learnedPatterns": { /* prefix map */ },
"nGramStats": { /* n-gram stats */ },
"markovStats": { /* suffix stats */ },
"lengthStats": { /* length→warehouse counts */ },
"positionalStats": { /* positional bigram stats */ },
"ruleStats": { /* prefix and positional window stats */ }
}Tree nodes are compactly serialised:
- Internal node:
{ l: 0, f: <featIdx>, t: <threshold>, L: <leftNode>, R: <rightNode> } - Leaf node:
{ l: 1, v: <probabilityArray> }
Version compatibility:
version >= 2loads the full GBT ensembleversion 1(legacy) restores only stat layers (no GBT); merchant tables are rebuilt from raw stats
hybridPredict(id, merchantHint):
1. Exact match in training data? → return immediately, confidence 1.0
2. Rule engine returns certain result? → return, rule confidence
3. Merchant is pure? → return, confidence 1.0
4. GBT model available?
a. Merchant has soft signal?
→ blend: GBT probs × 0.75 + merchant probs × 0.35, softmax
b. No merchant signal?
→ return GBT result directly
5. Merchant soft signal only (no GBT)? → return, confidence × 0.7
6. No signal at all? → return "Unknown", confidence 0WRI uses PeerJS (WebRTC data channels via 0.peerjs.com) to connect multiple devices in real-time.
Session roles:
- Host (desktop) — creates a 6-character alphanumeric session code, acts as the relay hub
- Scanner (mobile) — joins by entering the code or scanning a QR code
Session message protocol:
| Message type | Direction | Payload |
|---|---|---|
hello |
Scanner → Host | { label, deviceId } |
welcome |
Host → Scanner | { sessionCode } |
scan |
Any → Host → All | { id, wh, conf, merchant, deviceId, deviceLabel, ts } |
history |
Host → new Scanner | { scans: last 20 } |
bye |
Scanner → Host | — |
QR code join flow:
The QR code encodes a URL with ?join=<CODE>. Opening that URL on a mobile device automatically switches to mobile view, pre-fills the code, and initiates the join handshake.
Session capacity: Up to 60 scans are retained in the shared session history.
Desktop layout (two-column dashboard):
- Sidebar — model state panel, train/load controls, session controls, live device list, per-warehouse stats, inline QR code panel
- Main area — tracking ID lookup bar, result panel (warehouse badge + confidence + layer details), recent scans table, merchant summary table
Mobile layout:
- Fullscreen result overlay with large warehouse name (colour-coded), confidence bar, and auto-dismiss timer ring
- Counter strip showing BER / POZ / Total
- Mobile join screen on first load
Confidence colour coding:
| Badge | Threshold |
|---|---|
| HIGH (green) | ≥ 80% |
| MED (amber) | ≥ 55% |
| LOW (orange) | ≥ 35% |
| VERY LOW (red) | < 35% |
Warehouse colour coding:
- 🟢
BER*— green - 🔴
POZ*— red - 🟡
Unknown— amber - 🔵 Other — blue
Just open index.html in any modern browser. No build step, no npm, no server.
# Option A: direct file open
open index.html
# Option B: simple local server (avoids any browser file:// quirks)
python3 -m http.server 8080
# then visit http://localhost:8080Create a CSV file with at least three columns:
trackingId,merchant,warehouse
00340434756781234567,Acme GmbH,BER3
00340435073219876543,Beta Store,POZ1
LE123456789DE,Gamma Ltd,BER3
The header row is auto-detected (any header containing trackingid, tracking_id, or id).
- Click ▶ Train in the sidebar to expand the train panel
- Upload your CSV file
- Watch the progress bar — training runs in the browser
- The model auto-saves as
warehouse_model.jsonwhen done
Click Load model and select a previously saved warehouse_model.json.
Type or scan (via barcode scanner keyboard-wedge) a tracking ID into the lookup bar and press Enter or click Lookup.
- On desktop: click + New session to get a 6-letter code
- On mobile: open the same
index.html, enter the code, or scan the QR code shown in the sidebar
| Column | Required | Notes |
|---|---|---|
trackingId |
✅ | Raw tracking ID string, any format |
merchant |
✅ (can be blank) | Merchant/seller name — used for Layer 2 |
warehouse |
✅ | Target label, e.g. BER3, POZ1, POZ2 |
- Quoted fields and embedded commas are handled correctly
- Header row is optional but recommended
- Minimum 5 rows required to train GBT; fewer rows train stat layers only
| Component | Technology |
|---|---|
| UI framework | Vanilla HTML/CSS/JS (no framework) |
| Fonts | JetBrains Mono, Syne (Google Fonts) |
| P2P networking | PeerJS 1.5.4 over WebRTC |
| QR code generation | qrcodejs 1.0.0 |
| ML engine | Custom pure-JS GBT implementation |
| Storage | In-browser (Blob + FileReader); model exported as .json |
| Runtime dependencies | Zero (ML, UI, session logic all self-contained) |
┌─────────────────────────────────────────────────────────────┐
│ index.html │
│ │
│ ┌──────────────┐ ┌──────────────────────────────────┐ │
│ │ CSV Upload │───►│ Training Pipeline │ │
│ └──────────────┘ │ parseCsv → buildRuleStats │ │
│ │ buildMerchantStats │ │
│ ┌──────────────┐ │ buildNGramStats │ │
│ │ Model Load │ │ trainGbtModel → saveModel │ │
│ │ (.json) │ └──────────────┬───────────────────┘ │
│ └──────┬───────┘ │ │
│ │ ┌──────────────▼───────────────────┐ │
│ └───────────►│ In-Memory Model State │ │
│ │ ruleStats, merchantStats │ │
│ │ pureMerchants, mixedMerchants │ │
│ │ gbtModel[], gbtClasses[] │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────┐ ┌──────────────▼───────────────────┐ │
│ │ Tracking ID │───►│ hybridPredict(id, merchant) │ │
│ │ Input │ │ Layer 1 → Layer 2 → Layer 3 │ │
│ └──────────────┘ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────┐ ┌──────────────▼───────────────────┐ │
│ │ Result UI │◄───│ { warehouse, confidence, │ │
│ │ + Session │ │ resolvedBy, layerDetails } │ │
│ │ Broadcast │ └──────────────────────────────────┘ │
│ └──────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ PeerJS Session Layer (WebRTC) │ │
│ │ Host ◄──────────────────────────────► Scanner(s) │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
| Browser | Support |
|---|---|
| Chrome / Edge (v90+) | ✅ Full support |
| Firefox (v88+) | ✅ Full support |
| Safari (v15+) | ✅ Full support |
| Mobile Chrome/Safari | ✅ Full support |
WebRTC (PeerJS sessions) requires an internet connection to the STUN/signalling server on first connect, but all ML inference works fully offline.
MIT License — free to use, modify, and distribute.