Decision Tree Cultivator β Build, reshape, and validate CART decision trees for classification and regression, in the browser.
Arborist is a browser-based interactive tool for building decision trees with a unique bonsai workflow β grow a tree from data, then reshape it manually with domain knowledge. Designed for geoscientists, geometallurgists, and anyone who needs transparent, auditable classification and regression rules β with live holdout validation that keeps you honest about what your edits actually do to the model.
Single HTML file. Works offline. USB-stickable.
Or run it locally:
- Download
index.html - Open it in any modern browser
- Load a CSV (or pick an example dataset from File β Load β¦)
- Click π± Grow Tree
That's it. No install, no server, no internet required.
- Live holdout validation with stratified, drillhole-grouped, and spatial partition strategies. Train-vs-test scores update on every bonsai edit (the reviewer-response feature).
- Spatial exclusion buffer β when XYZ columns are assigned, the validation panel splits the test set into "leaky" (within the buffer of any training sample) and "isolated" (beyond it). The Ξ between leaky and isolated accuracy is the autocorrelation-leakage diagnostic.
- Feature importance panel β bar chart of weighted Gini decrease per feature.
- 3D scatter panel β point cloud of samples in space, coloured by predicted / true / misclassification, via @gcu/dee.
- Dockview-based panel layout β every panel is dockable, splittable, tabbable, and floatable. Long-form dialogs (SQL import, Leapfrog export, Help) open as floating panels so they don't block the tree.
- Column roles β Target / X / Y / Z / Drillhole ID assigned in the Configuration panel, with auto-detect for common names (
x/east/xcoord,dhid/hole_id, etc.). Roles drive partitioning strategies and the 3D scatter. - mimic-io JSON export β sklearn-shaped parallel arrays, loadable by the Python shim without scikit-learn as a dependency.
- Classification using Gini impurity with exhaustive threshold search
- Regression using variance reduction (MSE) with RΒ² metric
- Auto-detection of mode from target column type
- Configurable max depth, min leaf size, min split size
The core differentiator. After growing a data-driven tree, reshape it interactively:
- β Prune to Leaf β collapse a subtree, simplifying the model
- πΏ Regrow from Leaf β let CART find the best split for a leaf
- β‘ Force Split β manually split on any feature at any threshold, with an interactive impurity chart for visual threshold selection
- Top Splits β view the 6 best data-driven splits ranked by gain; click to apply
- Undo stack (30 levels) and full reset to original tree
- Holdout train/test split, pinned at session start. The tree is built on train; metrics always come from test. Click β» Resample to re-roll with a new seed.
- Strategies: Random, Stratified (by class), Drillhole-grouped (whole holes stay together).
- Spatial buffer: when X/Y/Z roles are assigned, set a buffer distance to split the test set into leaky vs isolated. Reports the percentile distribution of test-to-train min distances so you can pick a sensible buffer.
- Metrics: train-vs-test accuracy, Cohen's ΞΊ, per-class precision/recall, confusion matrix heatmap, RΒ² and RMSE for regression.
Click any node to see:
- Class distribution bar (classification) or value histogram (regression)
- Gini / variance, sample count, prediction, confidence
- Split details with child statistics and gain
- Bonsai action buttons contextual to node type
When nothing is selected, the inspector shows the root node by default β the inspector is always surfacing something useful.
Sample point cloud in space (uses Three.js + @gcu/dee). Colour by predicted class / true class / misclassification. Orbit / pan / zoom. Available whenever at least one X/Y/Z role is assigned.
Bar chart of weighted Gini decrease per feature, normalised to 100 %. Reacts to bonsai edits.
Click column type badges (num # / cat β) in the Dataset panel to toggle numeric β categorical β essential for coded variables like lithology codes, zone IDs, or drillhole numbers that parse as numeric but should split categorically.
- π Rules β human-readable IF/AND/THEN pseudocode
- π Python β nested
if/elsefunction, paste into any script - π Excel IF() β nested
=IF()formula for spreadsheets - π SQL CASE WHEN β for block model software (Vulcan, Surpac, Datamine, Deswik)
- π CSV β full dataset with predictions, confidence, and leaf IDs appended
- π mimic-io JSON β sklearn-compatible parallel arrays; loadable via the Python shim
- πΈ Leapfrog .lfcalc β calculation file importable into Leapfrog Geo
Paste a CASE WHEN block from Minitab, legacy block models, or any SQL source. Arborist parses the conditions and reconstructs a binary tree. If data is loaded, accuracy is computed instantly β then reshape with bonsai tools and re-export the improved version.
- Save/Load to browser IndexedDB (no size limit)
- Export/Import as JSON files for sharing between machines
- Projects store: CSV data, tree structure, config, column type overrides, and tree mode
An interactive multi-step tutorial accessible from the splash screen or Help β Guided Workshop. Each step has action buttons that drive the real UI β loads data, selects nodes, applies operations. Covers CART theory, Gini impurity mathematics, variance reduction, and the full bonsai workflow with geological context.
Geometallurgical domaining β define processing domains (oxide/transition/fresh, ore types, metallurgical zones) from geochemical data, with expert control over boundary placement and honest holdout-with-spatial-buffer validation.
Resource estimation domains β build estimation domains where each leaf defines a stationary zone for variogram modeling and kriging.
Ore/waste classification β generate auditable rules for grade control that can be exported as SQL directly into block model software.
Legacy model auditing β import existing domain rules (SQL), evaluate against new data, refine with bonsai tools, and re-export updated rules.
Regulatory documentation β the transparent IF/THEN rule format produces domain definitions that are straightforward to document in JORC Table 1 (Section 3) or NI 43-101 technical reports, where Competent Persons / Qualified Persons must justify estimation domain boundaries.
- Language: Vanilla JavaScript, single HTML file
- Build: Zero-dep concat of
src/*.js+ vendored libs intoindex.html. Runnode build.js. - Vendored: dockview-core 5.2.0, three.js r160 + OrbitControls, @gcu/dee, pollywog (all MIT)
- Storage: IndexedDB for projects (no localStorage)
- Algorithm complexity: O(n Β· m Β· log n) per tree level, n = samples, m = features
- Tested on: Chrome, Firefox, Safari, Edge (any browser with ES2020+ support)
arborist/
index.html β built output, single-file app, served by GitHub Pages
build.js β concat build (zero deps)
src/
index.html β shell with <!-- INLINE: --> tokens
state.js β global state + pub/sub channels
columns.js β column-role assignment (target, X/Y/Z, dhid)
validation.js β partitioning + metrics
spatial.js β distances + exclusion buffer
csv.js, cart.js, bonsai.js, export.js, persistence.js, ...
panels/ β dockview panel modules
dataset.js, config.js, tree.js, inspector.js, rules.js,
validation.js, importance.js, scatter.js, workshop.js, help.js
vendor/ β dockview-core, three.min.js, dee.js, ...
style.css
app.js, menubar.js
docs/
SPEC-v2.md β v2 specification
python-shim/ β arborist_mimic.py loader for mimic-io JSON
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees. Wadsworth & Brooks/Cole, Monterey, CA. ISBN 978-0-412-04841-8.
Pedregosa, F. et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, pp. 2825β2830.
The CART implementation follows Breiman et al. (1984). The incremental sweep for optimal numeric thresholds is inspired by scikit-learn's DecisionTreeClassifier. No code was used from either source β Arborist is a clean-room implementation in vanilla JavaScript.
WA β Works in an Airplane. Fully offline, single HTML file, zero network calls at runtime. Deployable on air-gapped mine site laptops, field camp tablets, or opened from a USB stick.
MIT Β© 2026 Arthur Endlein Correia, JΓ©ssica da Matta
- Arthur Endlein Correia (@endarthur)
- JΓ©ssica da Matta
Geoscientific Chaos Union
