Skip to content

joelclouds/Quant_Model_Testbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quant_Model_Testbench

Quant_Model_Testbench is a lightweight experimentation framework for systematically evaluating machine learning models across feature subsets and hyperparameter combinations.

Instead of manually trying different model configurations, the testbench automates experiment generation, execution, and logging. Results are stored incrementally so experiments can be analyzed later and promising configurations can be refined through deeper searches.

The repository currently demonstrates the framework using the Titanic survival prediction dataset, but the testbench itself is dataset-agnostic and can be applied to any structured dataset.


Motivation

Machine learning experimentation often becomes disorganized:

  • repeated manual testing
  • inconsistent experiment tracking
  • hyperparameter tuning done ad-hoc
  • results scattered across notebooks

Quant_Model_Testbench addresses this by providing a simple system that:

  • enumerates feature combinations
  • tests hyperparameter grids
  • logs structured experiment results
  • supports iterative model refinement

The goal is to make model experimentation systematic, reproducible, and analyzable.


Core Idea

The testbench explores model performance along two primary axes.

Feature Subsets

Different combinations of dataset features are tested to determine which subsets contain the strongest predictive signal.

Example feature combinations:

[Pclass, Sex]

[Pclass, Sex, Fare]

[Sex, Age, Fare, Parch]

Hyperparameter Combinations

Each model is evaluated across different hyperparameter settings.

Example:

RandomForestClassifier
├── n_estimators = [10, 50, 100]
├── criterion = [gini, entropy]

Together these generate many experiment configurations which the testbench evaluates automatically.


Architecture Overview

The system separates dataset handling, feature exploration, model execution, and experiment logging.

                     ┌────────────────────┐
                     │   Input Dataset    │
                     │   (CSV / Pandas)   │
                     └─────────┬──────────┘
                               │
                               ▼
                     ┌────────────────────┐
                     │    Feature Pool    │
                     │ Feature Subsetting │
                     └─────────┬──────────┘
                               │
                               ▼
                     ┌────────────────────┐
                     │ Model Testbench    │
                     │ Experiment Engine  │
                     └─────────┬──────────┘
                               │
                ┌──────────────┴──────────────┐
                ▼                             ▼
      Hyperparameter Generator        Model Execution
         (grid / combos)              (sklearn models)
                │                             │
                └──────────────┬──────────────┘
                               ▼
                     ┌────────────────────┐
                     │   Metric Engine    │
                     │ ACC, AUC, F1, MAE  │
                     └─────────┬──────────┘
                               ▼
                     ┌────────────────────┐
                     │   Experiment Log   │
                     │  CSV + JSONL store │
                     └─────────┬──────────┘
                               ▼
                     ┌────────────────────┐
                     │  Result Analysis   │
                     │ Best model ranking │
                     └────────────────────┘

Experiment Modes

The framework supports two experimentation modes.

Quick Mode

Quick mode performs a broad exploration of the search space.

Characteristics:

  • tests many feature subsets
  • uses limited hyperparameter combinations
  • runs relatively fast

Purpose:

Identify promising feature sets and models.


Full Mode

Full mode performs deep hyperparameter searches.

Characteristics:

  • selected feature subsets are locked
  • full hyperparameter grids are explored
  • focuses on optimizing promising models

Purpose:

Find the best configuration for the most promising models discovered during quick mode.


Supported Models

The current implementation supports the following scikit-learn models:

  • DecisionTreeClassifier
  • DecisionTreeRegressor
  • RandomForestClassifier
  • RandomForestRegressor

Both classification and regression approaches are supported.


Evaluation Metrics

Experiments can be ranked using several metrics:

Metric Description
MAE Mean Absolute Error
LL Log Loss
ACC Accuracy
AUC ROC AUC Score
F1 F1 Score

The ranking metric can be selected interactively during result analysis.


Experiment Logging

All experiment results are written incrementally to both:

  • CSV files for quick inspection
  • JSONL files for structured experiment records

Each experiment entry includes:

  • model type
  • feature subset
  • hyperparameters
  • evaluation metrics

Example log entry:

{
  "model": "RandomForestClassifier",
  "features": ["Pclass", "Sex", "Parch", "Fare"],
  "hyper": {
    "n_estimators": 100,
    "criterion": "entropy"
  },
  "metrics": {
    "ACC": 0.8659,
    "AUC": 0.8612,
    "F1": 0.8286,
    "MAE": 0.1341
  }
}

Example Results (Quick Sweep)

A quick experiment run produced 250 model configurations.

Top configurations ranked by AUC:

Rank Model Key Features AUC Accuracy
1 RandomForestClassifier Pclass, Sex, Fare, Parch ~0.86 ~0.86
2 RandomForestClassifier Passenger class + demographics ~0.85 ~0.86
3 DecisionTreeClassifier Sex, Fare, Pclass ~0.84 ~0.84

Observed strong predictive features:

Sex
Pclass
Fare
Parch

These features capture key demographic and socioeconomic signals associated with survival outcomes.


Output Directory Structure

Experiment results are organized into timestamped directories.

output/
│
├── quick_feature_combos/
│   └── <timestamp>/
│       ├── q.csv
│       └── q.jsonl
│
├── quick_features_full_hypers_combos/
│   └── <timestamp>/
│       ├── qf.csv
│       └── qf.jsonl
│
└── full_features_full_hypers_combos/
    └── <timestamp>/
        ├── ff.csv
        └── ff.jsonl

This prevents overwriting previous experiment results and allows long-term experiment tracking.


Running the Testbench

Clone the repository

git clone https://github.com/yourusername/Quant_Model_Testbench
cd Quant_Model_Testbench

Setup the environment

./setup.py

This script creates a virtual environment and installs required dependencies.


Run experiments

python main.py

The CLI will guide you through:

  • starting a new experiment
  • selecting quick or full test modes
  • ranking models by evaluation metrics
  • running deeper hyperparameter searches

Example CLI Workflow

$ python main.py

Proceed with a fresh Model Testbench instead of analyzing past results? (Y/n)
> Y

Of the two available test modes - "quick" and "full", would you like to proceed with "quick"? (Y/n)
> Y

After experiments complete:

Top Models according to AUC
------------------------------------------------

1 | RandomForestClassifier
    Features: [Pclass, Sex, Parch, Fare]

2 | RandomForestClassifier
    Features: [Pclass, Sex, Age, Fare]

3 | DecisionTreeClassifier
    Features: [Sex, Fare, Pclass]

The user can then select a configuration for deeper testing.


Dataset

The repository demonstrates the testbench using the Titanic dataset from the Kaggle competition:

Titanic – Machine Learning from Disaster

Expected dataset location:

kaggle_data/train.csv

However, the framework can ingest any structured CSV dataset with a defined prediction target.


Project Structure

Quant_Model_Testbench
│
├── main.py
│
├── src/
│   └── test_utils.py
│
├── kaggle_data/
│   └── train.csv
│
├── output/
│
├── data_xplore.py
│
└── README.md

Research Workflow

The intended experimentation cycle:

1. Load dataset
2. Run quick feature sweep
3. Identify top feature sets
4. Select promising model
5. Lock feature subset
6. Run full hyperparameter grid
7. Evaluate best configuration
8. Iterate or deploy

This workflow helps prevent:

  • ad-hoc tuning
  • lost experiment configurations
  • unreproducible results

Design Goals

Quant_Model_Testbench focuses on:

Reproducibility

Every experiment is logged and recoverable.

Structured Exploration

Feature and hyperparameter combinations are generated systematically.

Incremental Research

Broad exploration first, followed by focused optimization.


Future Improvements

Possible extensions include:

  • additional models (XGBoost, LightGBM, CatBoost)
  • experiment parallelization
  • cross-validation integration
  • automated feature importance analysis
  • visualization dashboards
  • experiment comparison tools

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages