Quant_Model_Testbench

Quant_Model_Testbench is a lightweight experimentation framework for systematically evaluating machine learning models across feature subsets and hyperparameter combinations.

Instead of manually trying different model configurations, the testbench automates experiment generation, execution, and logging. Results are stored incrementally so experiments can be analyzed later and promising configurations can be refined through deeper searches.

The repository currently demonstrates the framework using the Titanic survival prediction dataset, but the testbench itself is dataset-agnostic and can be applied to any structured dataset.

Motivation

Machine learning experimentation often becomes disorganized:

repeated manual testing
inconsistent experiment tracking
hyperparameter tuning done ad-hoc
results scattered across notebooks

Quant_Model_Testbench addresses this by providing a simple system that:

enumerates feature combinations
tests hyperparameter grids
logs structured experiment results
supports iterative model refinement

The goal is to make model experimentation systematic, reproducible, and analyzable.

Core Idea

The testbench explores model performance along two primary axes.

Feature Subsets

Different combinations of dataset features are tested to determine which subsets contain the strongest predictive signal.

Example feature combinations:

[Pclass, Sex]

[Pclass, Sex, Fare]

[Sex, Age, Fare, Parch]

Hyperparameter Combinations

Each model is evaluated across different hyperparameter settings.

Example:

RandomForestClassifier
├── n_estimators = [10, 50, 100]
├── criterion = [gini, entropy]

Together these generate many experiment configurations which the testbench evaluates automatically.

Architecture Overview

The system separates dataset handling, feature exploration, model execution, and experiment logging.

                     ┌────────────────────┐
                     │   Input Dataset    │
                     │   (CSV / Pandas)   │
                     └─────────┬──────────┘
                               │
                               ▼
                     ┌────────────────────┐
                     │    Feature Pool    │
                     │ Feature Subsetting │
                     └─────────┬──────────┘
                               │
                               ▼
                     ┌────────────────────┐
                     │ Model Testbench    │
                     │ Experiment Engine  │
                     └─────────┬──────────┘
                               │
                ┌──────────────┴──────────────┐
                ▼                             ▼
      Hyperparameter Generator        Model Execution
         (grid / combos)              (sklearn models)
                │                             │
                └──────────────┬──────────────┘
                               ▼
                     ┌────────────────────┐
                     │   Metric Engine    │
                     │ ACC, AUC, F1, MAE  │
                     └─────────┬──────────┘
                               ▼
                     ┌────────────────────┐
                     │   Experiment Log   │
                     │  CSV + JSONL store │
                     └─────────┬──────────┘
                               ▼
                     ┌────────────────────┐
                     │  Result Analysis   │
                     │ Best model ranking │
                     └────────────────────┘

Experiment Modes

The framework supports two experimentation modes.

Quick Mode

Quick mode performs a broad exploration of the search space.

Characteristics:

tests many feature subsets
uses limited hyperparameter combinations
runs relatively fast

Purpose:

Identify promising feature sets and models.

Full Mode

Full mode performs deep hyperparameter searches.

Characteristics:

selected feature subsets are locked
full hyperparameter grids are explored
focuses on optimizing promising models

Purpose:

Find the best configuration for the most promising models discovered during quick mode.

Supported Models

The current implementation supports the following scikit-learn models:

DecisionTreeClassifier
DecisionTreeRegressor
RandomForestClassifier
RandomForestRegressor

Both classification and regression approaches are supported.

Evaluation Metrics

Experiments can be ranked using several metrics:

Metric	Description
MAE	Mean Absolute Error
LL	Log Loss
ACC	Accuracy
AUC	ROC AUC Score
F1	F1 Score

The ranking metric can be selected interactively during result analysis.

Experiment Logging

All experiment results are written incrementally to both:

CSV files for quick inspection
JSONL files for structured experiment records

Each experiment entry includes:

model type
feature subset
hyperparameters
evaluation metrics

Example log entry:

{
  "model": "RandomForestClassifier",
  "features": ["Pclass", "Sex", "Parch", "Fare"],
  "hyper": {
    "n_estimators": 100,
    "criterion": "entropy"
  },
  "metrics": {
    "ACC": 0.8659,
    "AUC": 0.8612,
    "F1": 0.8286,
    "MAE": 0.1341
  }
}

Example Results (Quick Sweep)

A quick experiment run produced 250 model configurations.

Top configurations ranked by AUC:

Rank	Model	Key Features	AUC	Accuracy
1	RandomForestClassifier	Pclass, Sex, Fare, Parch	~0.86	~0.86
2	RandomForestClassifier	Passenger class + demographics	~0.85	~0.86
3	DecisionTreeClassifier	Sex, Fare, Pclass	~0.84	~0.84

Observed strong predictive features:

Sex
Pclass
Fare
Parch

These features capture key demographic and socioeconomic signals associated with survival outcomes.

Output Directory Structure

Experiment results are organized into timestamped directories.

output/
│
├── quick_feature_combos/
│   └── <timestamp>/
│       ├── q.csv
│       └── q.jsonl
│
├── quick_features_full_hypers_combos/
│   └── <timestamp>/
│       ├── qf.csv
│       └── qf.jsonl
│
└── full_features_full_hypers_combos/
    └── <timestamp>/
        ├── ff.csv
        └── ff.jsonl

This prevents overwriting previous experiment results and allows long-term experiment tracking.

Running the Testbench

Clone the repository

git clone https://github.com/yourusername/Quant_Model_Testbench
cd Quant_Model_Testbench

Setup the environment

./setup.py

This script creates a virtual environment and installs required dependencies.

Run experiments

python main.py

The CLI will guide you through:

starting a new experiment
selecting quick or full test modes
ranking models by evaluation metrics
running deeper hyperparameter searches

Example CLI Workflow

$ python main.py

Proceed with a fresh Model Testbench instead of analyzing past results? (Y/n)
> Y

Of the two available test modes - "quick" and "full", would you like to proceed with "quick"? (Y/n)
> Y

After experiments complete:

Top Models according to AUC
------------------------------------------------

1 | RandomForestClassifier
    Features: [Pclass, Sex, Parch, Fare]

2 | RandomForestClassifier
    Features: [Pclass, Sex, Age, Fare]

3 | DecisionTreeClassifier
    Features: [Sex, Fare, Pclass]

The user can then select a configuration for deeper testing.

Dataset

The repository demonstrates the testbench using the Titanic dataset from the Kaggle competition:

Titanic – Machine Learning from Disaster

Expected dataset location:

kaggle_data/train.csv

However, the framework can ingest any structured CSV dataset with a defined prediction target.

Project Structure

Quant_Model_Testbench
│
├── main.py
│
├── src/
│   └── test_utils.py
│
├── kaggle_data/
│   └── train.csv
│
├── output/
│
├── data_xplore.py
│
└── README.md

Research Workflow

The intended experimentation cycle:

1. Load dataset
2. Run quick feature sweep
3. Identify top feature sets
4. Select promising model
5. Lock feature subset
6. Run full hyperparameter grid
7. Evaluate best configuration
8. Iterate or deploy

This workflow helps prevent:

ad-hoc tuning
lost experiment configurations
unreproducible results

Design Goals

Quant_Model_Testbench focuses on:

Reproducibility

Every experiment is logged and recoverable.

Structured Exploration

Feature and hyperparameter combinations are generated systematically.

Incremental Research

Broad exploration first, followed by focused optimization.

Future Improvements

Possible extensions include:

additional models (XGBoost, LightGBM, CatBoost)
experiment parallelization
cross-validation integration
automated feature importance analysis
visualization dashboards
experiment comparison tools

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.demo		.demo
kaggle_data		kaggle_data
src		src
.gitignore		.gitignore
README.md		README.md
data_xplor.py		data_xplor.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quant_Model_Testbench

Motivation

Core Idea

Feature Subsets

Hyperparameter Combinations

Architecture Overview

Experiment Modes

Quick Mode

Full Mode

Supported Models

Evaluation Metrics

Experiment Logging

Example Results (Quick Sweep)

Output Directory Structure

Running the Testbench

Clone the repository

Setup the environment

Run experiments

Example CLI Workflow

Dataset

Project Structure

Research Workflow

Design Goals

Reproducibility

Structured Exploration

Incremental Research

Future Improvements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quant_Model_Testbench

Motivation

Core Idea

Feature Subsets

Hyperparameter Combinations

Architecture Overview

Experiment Modes

Quick Mode

Full Mode

Supported Models

Evaluation Metrics

Experiment Logging

Example Results (Quick Sweep)

Output Directory Structure

Running the Testbench

Clone the repository

Setup the environment

Run experiments

Example CLI Workflow

Dataset

Project Structure

Research Workflow

Design Goals

Reproducibility

Structured Exploration

Incremental Research

Future Improvements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages