E2E ML project using open-source MLOps tools

Problem Description and Dataset

This dataset contains 10,000 records, each of which corresponds to a different bank's user. The target is Exited, a binary variable that describes whether the user decided to leave the bank. There are row and customer identifiers, four columns describing personal information about the user (surname, location, gender and age), and some other columns containing information related to the loan (such as credit score, the current balance in the user's account and whether they are an active member among others).

Dataset source: https://www.kaggle.com/datasets/filippoo/deep-learning-az-ann

Use Case

The objective is to train an ML model that returns the probability of a customer churning. This is a binary classification task, therefore F1-score is a good metric to evaluate the performance of this dataset as it weights recall and precision equally, and a good retrieval algorithm will maximize both precision and recall simultaneously.

Setup

Python 3.8+ is required to run code from this repo.

$ git clone https://github.com/alex000kim/open-source-mlops-e2e
$ cd open-source-mlops-e2e
$ virtualenv -p python3 .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt

What's inside

$ tree 
.
├── Churn_Modelling_France.csv # raw data file
├── Churn_Modelling_Spain.csv # raw data file
├── LICENSE
├── README.md # this file
├── TODO.md # describes next steps
├── TrainChurnModel.ipynb # jupyter notebook with model training code
├── clf-model.joblib # saved model
└── feat_imp.csv # table with feature importance values

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.devcontainer		.devcontainer
.vscode		.vscode
data/raw		data/raw
models		models
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
Churn_Modelling_France.csv		Churn_Modelling_France.csv
Churn_Modelling_Spain.csv		Churn_Modelling_Spain.csv
LICENSE		LICENSE
README.md		README.md
TrainChurnModel.ipynb		TrainChurnModel.ipynb
clf-model.joblib		clf-model.joblib
feat_imp.csv		feat_imp.csv
requirements.txt		requirements.txt

License

myselfHimanshu/open-source-mlops-e2e

Folders and files

Latest commit

History

Repository files navigation

E2E ML project using open-source MLOps tools

Problem Description and Dataset

Use Case

Setup

What's inside

About

Resources

License

Stars

Watchers

Forks

Languages