Skip to content

Utilizes pycaret to automates machine learning workflows (Deployed at streamlit)

Notifications You must be signed in to change notification settings

ongaunjie1/pycaret_automl_streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Utilizes pandas-profilling and pycaret for a faster model selection process

This repository will showcase the simplest implementation of PyCaret into a streamlit app (only using pycaret's default settings)

  • Refer to the last section below for an example of running PyCaret in a notebook with additional features such as fine-tuning, model evaluation and model saving.

What does pandas-profiling do ?

  • Pandas-Profiling is a Python library that provides a simple and efficient way to perform exploratory data analysis (EDA) on a Pandas DataFrame. The library generates a comprehensive HTML report with various statistical and visual insights into the structure and characteristics of the dataset.

What does PyCaret do ?

  • PyCaret is an open-source, low-code maching learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and Model management tool that exponentially speeds up experiment cycle and makes you more productive.

App features

  • Utilize pandas-profiling to generate a HTML report with insights from a dataset.
  • Utilizes PyCaret to quickly compare between different algorithms, generates a table ranking each algorithms based on metrics.

Important Note:

  1. The app was created for classification problems and regression problems
  2. Purpose of the app is to quickly gauge the performance of different types of models on your dataset, allows for a quicker model selection process *(Only uses PyCaret's basic pre-processing steps and does not fine-tune model)
  3. To further modify the settings of PyCaret, Refer to the pycaret's documentation.
  4. The app might take a long time to run on the streamlit community cloud due to limited resources available
  5. Pycaret is CPU intensive, make sure your CPU is fast enough. Else, you could run Pycaret using GPU for a faster performance
  6. Simply add use_gpu=True into classification_setup(df, target=chosen_target, verbose=False, use_gpu=True) to utilize GPU instead of CPU

Other alternatives:

  • Run the code locally on your computer for a faster performance
  • Deploy the streamlit app on a paid cloud service for a faster performance

Further improvement:

  • Add pycaret's other functionality into the app such as adding more options for data pre-processing and model fine-tuning capabilities.

Docker Image

  • Pull command: docker pull ongaunjie1/automl-app:latest
  • Run command: docker run -d -p 8501:8501 ongaunjie1/automl-app:latest

Steps on how to use the AutoML app?

  • Step 1: Upload your dataset

image

  • Step 2: Select Profiling

image

  • Step 3: Select ML problem (Classification or Regression) and select target variable

image

  • Step 4: Run the modelling and review the output

image

A more in-depth use-case of PyCaret

List of modules for different machine learning problems

image

Example: Predicting employee churn (You can find the colab notebook within this repository (employee_churn.ipynb)

Model comparison table

image

Create the best performing model from the comparison table

image

Fine-tuning the best model

image

Show best model's params

image

Evaluate model: Click on the buttons to see different evaluation plots

image

Or plot them individually

image

Perform prediction on test dataset generated by PyCaret

image

Saving the model

image

Note: There are more Data preprocessing and Transformations function available in PyCaret:

About

Utilizes pycaret to automates machine learning workflows (Deployed at streamlit)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published