ALeRT: Active Learning Regression Toolbox

Welcome to the ALeRT (Active Learning Regression Toolbox) repository! This project provides a flexible framework for training and executing different machine learning regression models on hydrodynamic features of mixing devices. ALeRT allows for hyperparameter tuning, k-fold cross-validation, and active learning techniques to generate new data points for augmented model training.

Project Structure

Here's an overview of the file and folder structure in this repository:

.
├── base_model.py
├── best_models
│   ├── Decision_Tree
│   ├── K_Nearest_Neighbours
│   ├── MLP_Branched_Network
│   ├── Multi_Layer_Perceptron
│   ├── Random_Forest
│   ├── Support_Vector_Machine
│   └── XGBoost
├── config
│   └── config_paths.ini
├── csv_data
│   └── sp_geom
│       ├── dt
│       ├── ini
├── data_utils.py
├── DOE
│   └── sp_geom
│       ├── dt
│       ├── ini
├── figs
│   └── sp_geom
│       ├── dt
│       ├── ini
│       └── random
├── input_data
│   └── sp_geom
│       └── ini
├── input.py
├── model_lib.py
├── models
│   ├── Decision_Tree
│   │   └── hyperparam_tune
│   ├── K_Nearest_Neighbours
│   │   └── hyperparam_tune
│   ├── MLP_Branched_Network
│   │   └── hyperparam_tune
│   ├── Multi_Layer_Perceptron
│   │   └── hyperparam_tune
│   ├── Random_Forest
│   │   └── hyperparam_tune
│   ├── Support_Vector_Machine
│   │   └── hyperparam_tune
│   └── XGBoost
│       └── hyperparam_tune
├── model_utils.py
├── paths.py
├── pca_models
│   └── sp_geom
│       ├── dt
│       ├── ini
│       └── random
├── reg_train.py
├── resample
│   └── sp_geom
│       ├── dt
│       │   ├── log_rules
│       │   │   ├── dt_rules_1.log
│       │   │   ├── dt_rules_2.log
│       ├── gsx
│       │   └── log_rules
│       │       └── gsx_rules.log
│       └── random
├── run_augmodel.py
└── sampling.py

Usage

To use this repository, follow these steps:

Install Dependencies: Ensure you have all the necessary Python packages installed. If there is a requirements.txt file, use pip install -r requirements.txt.
Generate Features and Targets: Run input.py to process the csv data and labels from the DOE files to generate the features and targets and testing sets for the regression. A selection of which targets to include can be done at this stage, as well as carrying out dimensionality reduction via Principal Component Analysis (PCA)
Train and Evaluate Models: Run reg_train.py to choose an available regression model to be trained and evaluated, with the option to include hyperparameter tuning and kfold cross-validation.
Generate New Data Points: Use sampling.py to apply active learning techniques (uncertainty exploration from Decision tree or Greedy Sampling on the inputs) and generate rules for the new data points that should be added to the existing database.
Re-train with Augmented Data: Execute run_augmodel.py to re-train the regression models with the new data points obtained.

Workflow

The typical workflow for ALeRT involves the following steps:

Initial Data Preparation: Run input.py to set up the initial data for training.
Model Training: Use reg_train.py to train and evaluate models. This script also supports hyperparameter tuning and k-fold cross-validation.
Active Learning: Execute sampling.py to identify new data points to obtain through active learning techniques.
Model Re-training: Use run_augmodel.py to re-train the models with the newly acquired data points.

Modules Overview

base_model.py: Contains the backbone for all regression models (abstract parent classes).
data_utils.py: Provides utilities for data management, such as loading, pre-processing, scaling, saving, and augmenting, as well as carrying out PCA dimensionality reduction.
model_lib.py: Uses base_model.py to build and manage different regression models. Models available are : Decision Tree, XGBoost, Random Forest, K-Nearest-Neighbours, Support Vector Machine, Multi-Layer Perceptron and custom built MLP-branched
model_utils.py: Contains utility functions for model cross-validation, hyperparameter tuning and evaluation.
input.py: Script to generate train and test pickle files for later scripts to interpret as features and targets.
reg_train.py: Script to train and evaluate the regression models.
sampling.py: Script for active learning and generation of new data points.
run_augmodel.py: Script to re-train regression models with augmented data.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Contact

If you have any questions or suggestions, feel free to open an issue or contact the repository owner at j.valdes20@imperial.ac.uk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALeRT: Active Learning Regression Toolbox

Table of Contents

Project Structure

Usage

Workflow

Modules Overview

License

Contact

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.vscode		.vscode
.gitignore		.gitignore
README.md		README.md
base_model.py		base_model.py
input.py		input.py
model_lib.py		model_lib.py
model_utils.py		model_utils.py
paths.py		paths.py
reg_train.py		reg_train.py

jpv219/ALeRT

Folders and files

Latest commit

History

Repository files navigation

ALeRT: Active Learning Regression Toolbox

Table of Contents

Project Structure

Usage

Workflow

Modules Overview

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages