<a href="https://colab.research.google.com/github/psrana/Machine-Learning-using-PyCaret/blob/main/01_Regression_without_Results_PyCaret.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# **PyCaret for Regression**
---
- It is a bundle of many Machine Learning algorithms.
- Only three lines of code is required to compare 20 ML models.
- Pycaret is available for:
    - Classification
    - Regression
    - Clustering
---

### **Self learning resource**
1. Tutorial on Pycaret **<a href="https://pycaret.readthedocs.io/en/latest/tutorials.html" target="_blank"> Click Here</a>**

2. Documentation on Pycaret-Regression: **<a href="https://pycaret.readthedocs.io/en/latest/api/regression.html" target="_blank"> Click Here </a>**

---

### **In this tutorial we will learn:**

- Getting Data
- Setting up Environment
- Create Model
- Tune Model
- Plot Model
- Finalize Model
- Predict Model
- Save / Load Model
---



### **(a) Install Pycaret**

In [None]:
!pip install pycaret &> /dev/null
print ("Pycaret installed sucessfully!!")

### **(b) Get the version of the pycaret**

In [None]:
from pycaret.utils import version
version()

---
# **1. Regression: Basics**
---
### **1.1 Get the list of datasets available in pycaret (Total Datasets = 55)**




In [None]:
from pycaret.datasets import get_data
dataSets = get_data('index')

---
### **1.2 Get the "boston" dataset (Step-I)**
---

In [None]:
bostonDataSet = get_data("boston")    # SN is 46

---
### **1.3 Parameter setting for all models (Step-II)**
---

In [None]:
from pycaret.regression import *
s = setup(data =  bostonDataSet, target='medv')

# Other Parameters:
# train_size = 0.7
# data_split_shuffle = False

---
### **1.4 Run all models (Step-III)**
---

In [None]:
cm = compare_models()

---
### **1.5 "Three line of code" for model comparison for "Boston" dataset**
---



In [None]:
from pickle import TRUE
from pycaret.datasets import get_data
from pycaret.regression import *

bostonDataSet = get_data("boston", verbose=False)
setup(data = bostonDataSet, target='medv', verbose=False, train_size = 0.7, data_split_shuffle = False)
cm = compare_models()

---
### **1.6 "Three line of code" for model comparison for "Insurance" dataset**
---



In [None]:
from pycaret.datasets import get_data
from pycaret.regression import *

insuranceDataSet = get_data("insurance", verbose=False)
setup(data = insuranceDataSet, target='charges', verbose=False, data_split_shuffle = False)
cm = compare_models()

---
# **2. Regression: working with user dataset**
---


In [None]:
from pycaret.regression import *
import pandas as pd

# First upload the user file (myData.csv) in the colab

# myDataSet = pd.read_csv("/content/bostonDataSet.csv")        # Uncomment and execute
# s = setup(data = myDataSet, target='medv', verbose=False)    # Uncomment and execute
# cm = compare_models()                                        # Uncomment and execute

---
# **3. Regression: Apply "Data Preprocessing"**
---

### **3.1 Model performance using "Normalization"**

In [None]:
setup(data = bostonDataSet, target = 'medv',
      normalize = True, normalize_method = 'zscore', verbose=False, data_split_shuffle = False)
cm = compare_models()

# Re-run the code again for different parameters
# normalize_method = {zscore, minmax, maxabs, robust}

---
####**Explore more parameters of "setup()" on pycaret**
---
- Explore setup() paramaeters in **Step 1.3**
- **<a href="https://pycaret.readthedocs.io/en/latest/api/regression.html" target="_blank"> Click Here</a>** for more

---
### **3.2 Model performance using "Feature Selection"**
---

In [None]:
setup(data = bostonDataSet, target = 'medv',
      feature_selection = True, feature_selection_method = 'classic',
      n_features_to_select = 0.2, verbose=False, data_split_shuffle = False)
cm = compare_models()


# Re-run the code again for different parameters
# feature_selection_method = {classic, univariate, sequential}
# n_features_to_select = {0.1, 0.2, 0.3, 0.4, 0.5, ..... }

---
### **3.3 Model performance using "Outlier Removal"**
---

In [None]:
setup(data = bostonDataSet, target = 'medv',
      remove_outliers = True, outliers_method = "iforest", outliers_threshold = 0.05,
      verbose=False, data_split_shuffle = False)
cm = compare_models()

# Re-run the code again for different parameters
# outliers_threshold = {0.04, 0.05, 0.06, 0.07, 0.08, ....}
# outliers_method = {iforest, ee, lof}

---
### **3.4 Model performance using "Transformation"**
---

In [None]:
setup(data = bostonDataSet, target = 'medv',
      transformation = True, transformation_method = 'yeo-johnson',
      verbose=False, data_split_shuffle = False)
cm = compare_models()

---
### **3.5 Model performance using "PCA"**
---

In [None]:
setup(data = bostonDataSet, target = 'medv',
      pca = True, pca_method = 'linear', verbose=False, data_split_shuffle = False)
cm = compare_models()

# Re-run the code again for different parameters
# pca_method = (linear, kernel, incremental)

---
### **3.6 Model performance using "Outlier Removal" + "Normalization"**
---

In [None]:
setup(data = bostonDataSet, target = 'medv',
      remove_outliers = True, outliers_threshold = 0.05,
      normalize = True, normalize_method = 'zscore', verbose=False, data_split_shuffle = False)
cm = compare_models()

---
### **3.7 Model performance using "Outlier Removal" +  "Normalization" + "Transformation"**
---

In [None]:
setup(data = bostonDataSet, target = 'medv',
      remove_outliers = True, outliers_threshold = 0.05,
      normalize = True, normalize_method = 'zscore',
      transformation = True, transformation_method = 'yeo-johnson',
      verbose=False, data_split_shuffle = False)
cm = compare_models()

---
### **3.8 Explore more parameters of "setup()" on pycaret**
---
- Explore setup() paramaeters in **Step 1.3**
- **<a href="https://pycaret.readthedocs.io/en/latest/api/regression.html" target="_blank"> Click Here</a>** for more

---
# **4. Regression: More Operations**
---
### **4.1 Build a single model - "RandomForest"**

In [None]:
from pycaret.datasets import get_data
from pycaret.regression import *

bostonDataSet = get_data("boston", verbose=False)    # SN is 46
setup(data =  bostonDataSet, target='medv', verbose=False)

rfModel = create_model('rf')
# Explore more parameters

---
### **4.2 Other available regression models**
---
-	'ada' - AdaBoost Regressor
-	'br' - Bayesian Ridge
-	'dt' - Decision Tree Regressor
-	'en'	- Elastic Net
-	'et' - Extra Trees Regressor
-	'gbr' - Gradient Boosting Regressor
-	'huber' - Huber Regressor
-	'knn' - K Neighbors Regressor
-	'llar' - Lasso Least Angle Regression
-	'lasso' - Lasso Regression
-	'lar' - Least Angle Regression
-	'lightgbm'	- Light Gradient Boosting Machine
-	'lr' - Linear Regression
-	'omp' - Orthogonal Matching Pursuit
-	'par' - Passive Aggressive Regressor
-	'rf' - Random Forest Regressor
-	'ridge' - Ridge Regression

---
### **4.3 Explore more parameters of "create_model()" on pycaret**
---

**<a href="https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.create_model" target="_blank"> Click Here</a>**

---
### **4.4 Make prediction on the "new unseen dataset"**
---
#### **Get the "new unseen dataset"**



In [None]:
# Select top 10 rows from boston dataset
newDataSet = get_data("boston").iloc[:10]

#### **Make prediction on "new unseen dataset"**

In [None]:
newPredictions = predict_model(rfModel, data = newDataSet)
newPredictions

---
### **4.5 "Scatter plot" b/w actual and predicted**
---

In [None]:
import matplotlib.pyplot as plt

predicted = newPredictions.iloc[:,-1]     # Last column
actual = newPredictions.iloc[:,-2]        # 2nd last column

plt.scatter(actual, predicted)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Actul Vs Predicted')
plt.savefig("result-scatter-plot.jpg", dpi=300)
plt.show()

---
### **4.6 Download the "Scatter plot"**
---

In [None]:
from google.colab import files
files.download('result-scatter-plot.jpg')

---
### **4.7 "Save" and "Download" the prediction result**
---

In [None]:
from google.colab import files

newPredictions.to_csv("NewPredictions.csv", index=False)
files.download('NewPredictions.csv')

---
### **4.8 "Save" the trained model**
---

In [None]:
sm = save_model(rfModel, 'rfModelFile')

---
### **4.9 Download the "trained model file" to user local system**
---

In [None]:
from google.colab import files
files.download('rfModelFile.pkl')

---
### **4.10  "Upload the trained model" --> "Load the model"  --> "Make the prediction" on "new unseen dataset"**
---
### **4.10.1 Upload the  "Trained Model"**


In [None]:
from google.colab import files
files.upload()

---
### **4.10.2 Load the "Model"**
---

In [None]:
rfModel = load_model('rfModelFile (1)')

---
### **4.10.3 Make the prediction on "new unseen dataset"**
---

In [None]:
newPredictions = predict_model(rfModel, data = newDataSet)
newPredictions

---
# **5. Plot the trained model**
---
**Following parameters can be plot for a trained model**

- Prediction Error Plot    - 'error'
- Learning Curve           - 'learning'
- Validation Curve         - 'vc'
- Feature Importance       - 'feature'
- Model Hyperparameter     - 'parameter'

---
### **5.1 Create RandomForest model or any other model**
---

In [None]:
rfModel = create_model('rf')

---
### **5.2 Plot the "error"**
---

In [None]:
plot_model(rfModel, plot='error')

---
### **5.3 Plot the "learning curve"**
---

In [None]:
plot_model(rfModel, plot='learning')

---
### **5.4 Plot the "validation curve"**
---

In [None]:
plot_model(rfModel, plot='vc')

---
### **5.5 Get the model "parameters"**
---

In [None]:
plot_model(rfModel, plot='parameter')

---
### **5.6 Explore the more parameters of "plot_model()" on pycaret**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.plot_model" target="_blank"> Click Here </a>**

---
# **6. Feature Importance**
---
### **6.1 Feature Importance using "Random Forest"**


In [None]:
rfModel = create_model('rf', verbose=False)
plot_model(rfModel, plot='feature')

---
### **6.2 Feature Importance using "Extra Trees Regressor"**
---

In [None]:
etModel = create_model('et', verbose=False)
plot_model(etModel, plot='feature')

---
### **6.3 Feature Importance using "Decision Tree"**
---

In [None]:
dtModel = create_model('dt', verbose=False)
plot_model(dtModel, plot='feature')

---
# **7. Tune/Optimize the model performance**
---
### **7.1 Train "Decision Tree" with default parameters**


In [None]:
dtModel = create_model('dt')

#### **Get the "parameters" of Decision Tree**

In [None]:
plot_model(dtModel, plot='parameter')

---
### **7.2 Tune "Decision Tree" model**
---

In [None]:
dtModelTuned = tune_model(dtModel, n_iter=200)

#### **Get the "tuned parameters" of Decision Tree**

In [None]:
plot_model(dtModelTuned, plot='parameter')

---
### **7.3 Explore more parameters of "tune_model()" on pycaret**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.tune_model" target="_blank"> Click Here </a>**

---
# **8. AutoML - Advanced Machine Learning**
---

- Select n Best Models:
  - Ensemble, Stacking, Begging, Blending
  - Auto tune the best n models

**<a href="https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.automl" target="_blank">Click Here</a>**


---
# **9. Deploy the model on AWS / Azure**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/regression.html#pycaret.regression.deploy_model" target="_blank">Click Here</a>**
