<a href="https://colab.research.google.com/github/subhashpolisetti/Automated-ML-with-PyCaret/blob/main/PyCaret_Gradio_Decision_Tree_App_california_housing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyCaret and Gradio Integration

**PyCaret** is a low-code machine learning library that simplifies the process of training and deploying models, while **Gradio** is an open-source library that enables the creation of interactive web interfaces for machine learning applications. When combined, they allow users to easily build, showcase, and share machine learning models with user-friendly interfaces.

## Key Points

- **Integration**:
  - PyCaret can quickly train models using its setup functions and AutoML capabilities, while Gradio provides an easy way to create interactive interfaces for these models.

## Workflow

1. **Model Training**:
   - Use PyCaret to load data, preprocess it, and train models with just a few lines of code.
  
2. **Prediction Function**:
   - Define a function that takes user inputs and returns model predictions.
  
3. **Gradio Interface**:
   - Create an interactive interface using Gradio, specifying input types (e.g., text, images) and output formats.
  
4. **Launch**:
   - Run the Gradio interface to allow users to interact with the model in real-time.

## Benefits

- **Ease of Use**:
  - Minimal coding required to create robust machine learning applications.

- **Real-Time Interaction**:
  - Users can input data and see predictions immediately.

- **Sharing**:
  - Generate links to share applications with others easily.

## Example Use Case

Create a housing price predictor using a dataset, train it with PyCaret, and deploy it with Gradio for users to input house features and get immediate price predictions.

This combination of PyCaret and Gradio is ideal for data scientists and developers looking to prototype and showcase machine learning models efficiently.



In [21]:
# Install the full version of PyCaret along with all optional dependencies
!pip install pycaret[full]


Collecting shap~=0.44.0 (from pycaret[full])
  Downloading shap-0.44.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (24 kB)
Collecting interpret>=0.2.7 (from pycaret[full])
  Downloading interpret-0.6.3-py3-none-any.whl.metadata (1.1 kB)
Collecting umap-learn>=0.5.2 (from pycaret[full])
  Downloading umap_learn-0.5.6-py3-none-any.whl.metadata (21 kB)
Collecting ydata-profiling>=4.3.1 (from pycaret[full])
  Downloading ydata_profiling-4.10.0-py2.py3-none-any.whl.metadata (20 kB)
Collecting explainerdashboard>=0.3.8 (from pycaret[full])
  Downloading explainerdashboard-0.4.7-py3-none-any.whl.metadata (3.8 kB)
Collecting fairlearn==0.7.0 (from pycaret[full])
  Downloading fairlearn-0.7.0-py3-none-any.whl.metadata (7.3 kB)
Collecting kmodes>=0.11.1 (from pycaret[full])
  Downloading kmodes-0.12.2-py2.py3-none-any.whl.metadata (8.1 kB)
Collecting statsforecast<1.6.0,>=0.5.5 (from pycaret[full])
  Downloading statsforecast-1.5

In [1]:
from sklearn.datasets import fetch_california_housing
import pandas as pd

# Load the California housing dataset from sklearn
california_housing = fetch_california_housing()

# Convert the dataset into a pandas DataFrame
housing_data = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)

# Add the target variable to the DataFrame
housing_data['MedianHouseValue'] = california_housing.target

# Display the first few rows of the dataset
print(housing_data.head())


   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0  8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1  8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2  7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
3  5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
4  3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   

   Longitude  MedianHouseValue  
0    -122.23             4.526  
1    -122.22             3.585  
2    -122.24             3.521  
3    -122.25             3.413  
4    -122.25             3.422  


In [2]:
# Import all necessary functions from the PyCaret regression module
from pycaret.regression import *

# Set up the PyCaret regression experiment
# This initializes the environment, preprocesses the data, and sets the target variable for the regression task.
# 'MedHouseVal' is the target variable representing the median house value.
# 'session_id' is set for reproducibility of results.
# Set up the PyCaret regression experiment
# This initializes the environment, preprocesses the data, and sets the target variable for the regression task.
# 'MedianHouseValue' is the target variable representing the median house value.
# 'session_id' is set for reproducibility of results.
regression_experiment = setup(data=housing_data, target='MedianHouseValue', session_id=123)

# Display the setup information to understand the preprocessing steps taken
print(regression_experiment)


Unnamed: 0,Description,Value
0,Session id,123
1,Target,MedianHouseValue
2,Target type,Regression
3,Original data shape,"(20640, 9)"
4,Transformed data shape,"(20640, 9)"
5,Transformed train set shape,"(14447, 9)"
6,Transformed test set shape,"(6193, 9)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


<pycaret.regression.oop.RegressionExperiment object at 0x7fe33845f100>


In [5]:
# Create a decision tree model using PyCaret
# The 'create_model' function initializes a machine learning model specified by the input string.
# Here, 'dt' indicates that we are creating a Decision Tree model.
decision_tree_model = create_model('dt')


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.4403,0.4617,0.6795,0.68,0.1978,0.2413
1,0.4536,0.4773,0.6909,0.6453,0.2027,0.2376
2,0.462,0.534,0.7308,0.5993,0.2144,0.2443
3,0.4739,0.577,0.7596,0.5919,0.2195,0.2527
4,0.4589,0.5248,0.7244,0.6219,0.2128,0.2444
5,0.4828,0.5692,0.7545,0.5297,0.22,0.2675
6,0.4458,0.4685,0.6845,0.6437,0.2013,0.2387
7,0.4819,0.5678,0.7536,0.5711,0.2284,0.2779
8,0.4667,0.5257,0.725,0.5926,0.215,0.2574
9,0.4739,0.5527,0.7434,0.5694,0.2125,0.2502


Processing:   0%|          | 0/4 [00:00<?, ?it/s]

In [7]:

create_app(decision_tree_model)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://6d85fd03a5bd1ce7af.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


