Modular ML Experiment Framework

Objective

Build a modular machine learning engineering framework built on top of ML FLow that can orchestrate training and hyperparameter tuning experiments

on various ML methods
using various cleaning and feature extraction pipelines.

The modularity results from the ease with which the data scientist can add, remove, or swap out the various models, hyperparameter sets, or feature extraction pipelines by simply reconfiguring a YAML file.

The framework then compares the best tuned model of each ML method and offers it up as a REST API.

Such a framework could be useful to data scientists because of the ease with which the data science pipeline could be set up. It could also be useful to ML engineers who can orchestrate a periodic training and deployment cycle.

Details
The user can supply through a YAML file the following:

URI of data set and ML Flow tracking server
names of training/cross-validation functions for various models stored in a designated file
names of cleaning feature extraction scripts for various models
training parameters/hyperparameters for various models
the metric on which to pick best model, etc.

The framework then runs experiments using ML Flow on various models using the supplied specs by

applying the chosen data cleaning and feature engineering pipelines to each model and
training/cross-validating the various models and using their respective functions and hyperparameters.
At the end of all the runs, it picks the best model which can served through a REST API endpoint at the user's discretion.

The framework is modular because to add another method type, all the data scientist has to do is add

the relevant cleaning, feature extraction, and training functions to the relevant files,
their details to the Specs.yaml file, and
add the necessary import statements.

Example use and Current status

The project is under development and in its current state represents the framework customised for a specific task: to find the best temperature forecasting model for a given dataset. I've picked this task because it's one that can be approached using various methods:

regression using various conventional machine learning and neural network tools, and
time series forecasting using various libraries such as Prophet or Darts

Thus, this dataset can a good example of how the framework an orchestrate in a modular fashion ML Flow experiments using the various machine learning methods.

To get a sense of how the eventual framework will work, please install the dependencies as listed in requirements.txt and run the train.py file. It will

read the the specifications for various models provided in Specs.yaml
engineer features using the multivariate_fe.py created for this specific problem
run the respective cross-validation or training functions various models in get_best_model.py
pick the best model of the best method.

Key additional features to be implemented

a command line tool that can set up the basic file structure with instructions on how to populate the files
ability to upsert tracking data to any URI, including one on any cloud service
parallelly run ML Flow experiments various methods through multiprocessing
generating performance and explainability charts and for each model type
facility to package a project created on the framework as a docker container so that can be part of a CI-CD pipeline for periodic retraining.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
utility files		utility files
.gitignore		.gitignore
README.md		README.md
README.md.backup		README.md.backup
Specs.yaml		Specs.yaml
get_best_model.py		get_best_model.py
imports.py		imports.py
multivariate_fe.py		multivariate_fe.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modular ML Experiment Framework

Objective

Example use and Current status

Key additional features to be implemented

About

Releases

Packages

Languages

kaiomurz/mlflow-framework

Folders and files

Latest commit

History

Repository files navigation

Modular ML Experiment Framework

Objective

Example use and Current status

Key additional features to be implemented

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages