- https://github.com/pycaret/pycaret
- pycaret's documentation: https://pycaret.gitbook.io/docs/get-started/modules
- Strealit app deployed at Streamlit community cloud: https://automlapp-pycaret.streamlit.app/
This repository will showcase the simplest implementation of PyCaret into a streamlit app (only using pycaret's default settings)
- Refer to the last section below for an example of running PyCaret in a notebook with additional features such as fine-tuning, model evaluation and model saving.
- Pandas-Profiling is a Python library that provides a simple and efficient way to perform exploratory data analysis (EDA) on a Pandas DataFrame. The library generates a comprehensive HTML report with various statistical and visual insights into the structure and characteristics of the dataset.
- PyCaret is an open-source, low-code maching learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and Model management tool that exponentially speeds up experiment cycle and makes you more productive.
- Utilize pandas-profiling to generate a HTML report with insights from a dataset.
- Utilizes PyCaret to quickly compare between different algorithms, generates a table ranking each algorithms based on metrics.
- The app was created for classification problems and regression problems
- Purpose of the app is to quickly gauge the performance of different types of models on your dataset, allows for a quicker model selection process *(Only uses PyCaret's basic pre-processing steps and does not fine-tune model)
- To further modify the settings of PyCaret, Refer to the pycaret's documentation.
- The app might take a long time to run on the streamlit community cloud due to limited resources available
- Pycaret is CPU intensive, make sure your CPU is fast enough. Else, you could run Pycaret using GPU for a faster performance
- Simply add use_gpu=True into classification_setup(df, target=chosen_target, verbose=False, use_gpu=True) to utilize GPU instead of CPU
- Run the code locally on your computer for a faster performance
- Deploy the streamlit app on a paid cloud service for a faster performance
- Add pycaret's other functionality into the app such as adding more options for data pre-processing and model fine-tuning capabilities.
- Pull command: docker pull ongaunjie1/automl-app:latest
- Run command: docker run -d -p 8501:8501 ongaunjie1/automl-app:latest
- Step 1: Upload your dataset
- Step 2: Select Profiling
- Step 3: Select ML problem (Classification or Regression) and select target variable
- Step 4: Run the modelling and review the output
Example: Predicting employee churn (You can find the colab notebook within this repository (employee_churn.ipynb)
- Check them out at https://pycaret.gitbook.io/docs/get-started/preprocessing