<a href="https://colab.research.google.com/github/shorya-ag/ML_Classification/blob/main/PyCaret_for_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# **PyCaret for Classification**
---
- It is a bundle of many Machine Learning algorithms.
- Only three lines of code is required to compare 20 ML models.
- Pycaret is available for:
    - Classification
    - Regression
    - Clustering

---

### **Self learning resource**
1. Tutorial on Pycaret **<a href="https://pycaret.readthedocs.io/en/latest/tutorials.html" target="_blank"> Click Here</a>**

2. Documentation on Pycaret-Classification: **<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html" target="_blank"> Click Here </a>**

---

### **In this tutorial we will learn:**

- Getting Data
- Setting up Environment
- Create Model
- Tune Model
- Plot Model
- Finalize Model
- Predict Model
- Save / Load Model
---



### **(a) Install Pycaret**

In [1]:
!pip install pycaret &> /dev/null
print ("Pycaret installed sucessfully!!")

Pycaret installed sucessfully!!


### **(b) Get the version of the pycaret**

In [2]:
from pycaret.utils import version
version()

'3.0.2'

---
# **1. Classification: Basics**
---
### **1.1 Get the list of datasets available in pycaret (Total Datasets = 55)**




In [3]:
from pycaret.datasets import get_data
dataSets = get_data('index')

Unnamed: 0,Dataset,Data Types,Default Task,Target Variable 1,Target Variable 2,# Instances,# Attributes,Missing Values
0,anomaly,Multivariate,Anomaly Detection,,,1000,10,N
1,france,Multivariate,Association Rule Mining,InvoiceNo,Description,8557,8,N
2,germany,Multivariate,Association Rule Mining,InvoiceNo,Description,9495,8,N
3,bank,Multivariate,Classification (Binary),deposit,,45211,17,N
4,blood,Multivariate,Classification (Binary),Class,,748,5,N
5,cancer,Multivariate,Classification (Binary),Class,,683,10,N
6,credit,Multivariate,Classification (Binary),default,,24000,24,N
7,diabetes,Multivariate,Classification (Binary),Class variable,,768,9,N
8,electrical_grid,Multivariate,Classification (Binary),stabf,,10000,14,N
9,employee,Multivariate,Classification (Binary),left,,14999,10,N


---
### **1.2 Get the "diabetes" dataset (Step-I)**
---

In [4]:
diabetesDataSet = get_data("diabetes")    # SN is 7

Unnamed: 0,Number of times pregnant,Plasma glucose concentration a 2 hours in an oral glucose tolerance test,Diastolic blood pressure (mm Hg),Triceps skin fold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age (years),Class variable
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


---
### **1.3 Parameter setting for all models (Step-II)**
---

In [5]:
from pycaret.classification import *
s = setup(data=diabetesDataSet, target='Class variable')

# Other Parameters:
# train_size = 0.7
# data_split_shuffle = False

Unnamed: 0,Description,Value
0,Session id,8746
1,Target,Class variable
2,Target type,Binary
3,Original data shape,"(768, 9)"
4,Transformed data shape,"(768, 9)"
5,Transformed train set shape,"(537, 9)"
6,Transformed test set shape,"(231, 9)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


---
### **1.4 Run all models (Step-III)**
---

In [6]:
cm = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.7672,0.8213,0.5497,0.7251,0.6189,0.4569,0.4703,0.56
ridge,Ridge Classifier,0.7616,0.0,0.5284,0.7202,0.603,0.4396,0.4546,0.087
lda,Linear Discriminant Analysis,0.7616,0.8259,0.5389,0.717,0.6082,0.443,0.457,0.071
rf,Random Forest Classifier,0.756,0.8007,0.5708,0.6848,0.6172,0.4411,0.4485,0.937
et,Extra Trees Classifier,0.7525,0.7962,0.5404,0.6932,0.6016,0.4268,0.4373,0.568
qda,Quadratic Discriminant Analysis,0.7487,0.8127,0.5605,0.6772,0.607,0.4255,0.434,0.194
gbc,Gradient Boosting Classifier,0.745,0.7967,0.6088,0.6484,0.6238,0.4317,0.4354,0.312
nb,Naive Bayes,0.7411,0.8022,0.5863,0.6446,0.6062,0.4157,0.4217,0.209
xgboost,Extreme Gradient Boosting,0.7357,0.7665,0.5874,0.6393,0.6036,0.4075,0.415,0.238
lightgbm,Light Gradient Boosting Machine,0.7321,0.7791,0.5725,0.636,0.5946,0.3971,0.4046,0.205


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

---
### **1.5 "Three line of code" for model comparison for "Diabetes" dataset**
---



In [7]:
from pycaret.datasets import get_data
from pycaret.classification import *

diabetesDataSet = get_data("diabetes")
setup(data=diabetesDataSet, target='Class variable')
cm = compare_models()

Unnamed: 0,Number of times pregnant,Plasma glucose concentration a 2 hours in an oral glucose tolerance test,Diastolic blood pressure (mm Hg),Triceps skin fold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age (years),Class variable
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Unnamed: 0,Description,Value
0,Session id,7650
1,Target,Class variable
2,Target type,Binary
3,Original data shape,"(768, 9)"
4,Transformed data shape,"(768, 9)"
5,Transformed train set shape,"(537, 9)"
6,Transformed test set shape,"(231, 9)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
et,Extra Trees Classifier,0.7653,0.8242,0.5561,0.7096,0.6208,0.455,0.464,0.598
gbc,Gradient Boosting Classifier,0.7635,0.8282,0.6158,0.684,0.6444,0.4687,0.4731,0.32
ridge,Ridge Classifier,0.7616,0.0,0.5404,0.7154,0.6096,0.4439,0.4569,0.1
lr,Logistic Regression,0.7597,0.8069,0.5512,0.7029,0.6121,0.4431,0.4537,0.091
lda,Linear Discriminant Analysis,0.7597,0.8062,0.5512,0.7008,0.6119,0.4428,0.4527,0.132
rf,Random Forest Classifier,0.756,0.8197,0.5564,0.6859,0.6124,0.4377,0.4442,0.625
xgboost,Extreme Gradient Boosting,0.7391,0.7912,0.6199,0.6278,0.621,0.4228,0.425,0.166
lightgbm,Light Gradient Boosting Machine,0.7318,0.7943,0.583,0.6237,0.5984,0.3986,0.4022,0.119
nb,Naive Bayes,0.7206,0.7897,0.5351,0.6171,0.5691,0.3652,0.3696,0.07
ada,Ada Boost Classifier,0.7151,0.7669,0.4927,0.6196,0.544,0.3421,0.3498,0.275


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

---
### **1.6 "Three line of code" for model comparison for "Cancer" dataset**
---



In [8]:
from pycaret.datasets import get_data
from pycaret.classification import *

cancerDataSet = get_data("cancer")
setup(data = cancerDataSet, target='Class')
cm = compare_models()

Unnamed: 0,Class,age,menopause,tumor-size,inv-nodes,node-caps,deg-malig,breast,breast-quad,irradiat
0,0,5,1,1,1,2,1,3,1,1
1,0,5,4,4,5,7,10,3,2,1
2,0,3,1,1,1,2,2,3,1,1
3,0,6,8,8,1,3,4,3,7,1
4,0,4,1,1,3,2,1,3,1,1


Unnamed: 0,Description,Value
0,Session id,967
1,Target,Class
2,Target type,Binary
3,Original data shape,"(683, 10)"
4,Transformed data shape,"(683, 10)"
5,Transformed train set shape,"(478, 10)"
6,Transformed test set shape,"(205, 10)"
7,Numeric features,9
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
et,Extra Trees Classifier,0.977,0.9968,0.9761,0.961,0.9678,0.95,0.9509,0.41
knn,K Neighbors Classifier,0.975,0.9936,0.9643,0.9659,0.9643,0.945,0.946,0.155
ada,Ada Boost Classifier,0.9749,0.9963,0.9699,0.9603,0.9643,0.9449,0.9459,0.292
rf,Random Forest Classifier,0.9728,0.9938,0.9636,0.9606,0.9612,0.9402,0.9413,0.444
lightgbm,Light Gradient Boosting Machine,0.9728,0.9964,0.9757,0.9501,0.9618,0.9407,0.9421,0.114
gbc,Gradient Boosting Classifier,0.9686,0.9945,0.9577,0.954,0.9551,0.931,0.9319,0.496
xgboost,Extreme Gradient Boosting,0.9686,0.9944,0.9636,0.9489,0.9555,0.9313,0.9321,0.113
nb,Naive Bayes,0.9665,0.9871,0.9941,0.9187,0.9544,0.928,0.9304,0.133
lr,Logistic Regression,0.9645,0.9965,0.946,0.9534,0.9489,0.9217,0.9225,0.089
ridge,Ridge Classifier,0.9624,0.0,0.9283,0.9648,0.945,0.9164,0.9182,0.062


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

---
# **2. Classification: working with user dataset**
---
### **2.1 Download the "diabetes" dataset to local system**
---


In [9]:
diabetesDataSet.to_csv("diabetesDataSet.csv", index=False)

from google.colab import files
files.download('diabetesDataSet.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---
### **2.2 Uploading "user file" from user system**
---

In [None]:
from google.colab import files
files.upload()

---
### **2.3 "Read" the uploaded file**
---

In [None]:
import pandas as pd
myDataSet = pd.read_csv('diabetesDataSet (1).csv')
myDataSet.head()

---
### **2.4 "Compare" the model performance**
---

In [12]:
from pycaret.classification import *

setup(data = myDataSet, target='Class variable')
cm = compare_models()

Unnamed: 0,Description,Value
0,Session id,1234
1,Target,Class variable
2,Target type,Binary
3,Original data shape,"(768, 9)"
4,Transformed data shape,"(768, 9)"
5,Transformed train set shape,"(537, 9)"
6,Transformed test set shape,"(231, 9)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
ridge,Ridge Classifier,0.7673,0.0,0.5246,0.742,0.607,0.4504,0.4685,0.146
lda,Linear Discriminant Analysis,0.7654,0.8289,0.5298,0.7298,0.6075,0.448,0.4633,0.091
et,Extra Trees Classifier,0.7635,0.8205,0.5673,0.7177,0.6241,0.4561,0.4693,0.455
lr,Logistic Regression,0.7616,0.8273,0.5351,0.7148,0.6048,0.4413,0.4549,0.451
rf,Random Forest Classifier,0.7597,0.8219,0.588,0.692,0.6273,0.4534,0.4623,0.698
gbc,Gradient Boosting Classifier,0.7503,0.8372,0.5827,0.6776,0.6117,0.4326,0.4443,0.348
nb,Naive Bayes,0.743,0.7996,0.5728,0.6489,0.6035,0.4165,0.421,0.178
ada,Ada Boost Classifier,0.7394,0.7805,0.5898,0.6513,0.6095,0.4165,0.4246,0.521
lightgbm,Light Gradient Boosting Machine,0.7374,0.8115,0.5944,0.6426,0.6091,0.4143,0.4211,0.328
qda,Quadratic Discriminant Analysis,0.7335,0.7946,0.5287,0.6443,0.5757,0.386,0.3927,0.174


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

---
### **2.5 "Three line of code" for model comparison for "user dataset**

##### Use it, while working on **"Anaconda/Jupyter notebook"** on local machine
---

In [13]:
from pycaret.classification import *
import pandas as pd

#myDataSet = pd.read_csv("myData.csv")
#s = setup(data = myDataSet, target='cancer')
#cm = compare_models()

---
# **3. Classification: Apply "Data Preprocessing"**
---

### **3.1 Model performance using "Normalization"**

In [14]:
setup(data=diabetesDataSet, target='Class variable',
      normalize = True, normalize_method = 'zscore')
cm = compare_models()

#normalize_method = {zscore, minmax, maxabs, robust}

Unnamed: 0,Description,Value
0,Session id,7281
1,Target,Class variable
2,Target type,Binary
3,Original data shape,"(768, 9)"
4,Transformed data shape,"(768, 9)"
5,Transformed train set shape,"(537, 9)"
6,Transformed test set shape,"(231, 9)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
ridge,Ridge Classifier,0.7898,0.0,0.5836,0.7657,0.6537,0.5088,0.5233,0.171
lr,Logistic Regression,0.7878,0.8411,0.5833,0.7587,0.6529,0.5054,0.5184,0.115
gbc,Gradient Boosting Classifier,0.7878,0.8535,0.6371,0.7237,0.6685,0.5161,0.5235,0.62
lda,Linear Discriminant Analysis,0.7842,0.8427,0.5836,0.7445,0.6476,0.4974,0.5087,0.118
nb,Naive Bayes,0.7767,0.8337,0.6161,0.7166,0.651,0.4908,0.5004,0.112
et,Extra Trees Classifier,0.7748,0.8337,0.5836,0.7252,0.6397,0.4801,0.4901,0.488
lightgbm,Light Gradient Boosting Machine,0.7747,0.8344,0.6526,0.6877,0.6667,0.4975,0.5001,0.288
xgboost,Extreme Gradient Boosting,0.7746,0.825,0.6687,0.6837,0.673,0.5017,0.5044,0.205
ada,Ada Boost Classifier,0.773,0.8209,0.6006,0.7123,0.6429,0.4803,0.4893,0.316
rf,Random Forest Classifier,0.771,0.841,0.6158,0.7092,0.6469,0.4806,0.4907,0.571


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

---
### **3.2 Model performance using "Feature Selection"**
---

In [16]:
setup(data=diabetesDataSet, target='Class variable',
      feature_selection = True, feature_selection_method = 'classic')
cm = compare_models()

#feature_selection_method = {classic, univariate, sequential}

Unnamed: 0,Description,Value
0,Session id,8517
1,Target,Class variable
2,Target type,Binary
3,Original data shape,"(768, 9)"
4,Transformed data shape,"(768, 2)"
5,Transformed train set shape,"(537, 2)"
6,Transformed test set shape,"(231, 2)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
nb,Naive Bayes,0.6536,0.6063,0.1383,0.5072,0.2138,0.0806,0.1092,0.583
qda,Quadratic Discriminant Analysis,0.6536,0.606,0.1383,0.5072,0.2138,0.0806,0.1092,0.271
dummy,Dummy Classifier,0.6518,0.5,0.0,0.0,0.0,0.0,0.0,0.221
ridge,Ridge Classifier,0.6462,0.0,0.0795,0.425,0.1272,0.0344,0.0562,0.139
lr,Logistic Regression,0.6443,0.6344,0.0848,0.4273,0.1324,0.0342,0.0567,1.373
lda,Linear Discriminant Analysis,0.6443,0.6344,0.0848,0.5056,0.1359,0.0343,0.0644,0.235
ada,Ada Boost Classifier,0.6331,0.6009,0.1547,0.4656,0.2243,0.0511,0.0714,0.364
gbc,Gradient Boosting Classifier,0.6275,0.5943,0.2289,0.428,0.2975,0.0766,0.0827,0.374
knn,K Neighbors Classifier,0.6164,0.5736,0.3041,0.4363,0.3559,0.0946,0.0993,0.431
lightgbm,Light Gradient Boosting Machine,0.6163,0.6081,0.2661,0.419,0.3211,0.0751,0.0791,0.206


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

---
### **3.3 Model performance using "Outlier Removal"**
---

In [17]:
setup(data=diabetesDataSet, target='Class variable',
      remove_outliers = True, outliers_threshold = 0.05)
cm = compare_models()

Unnamed: 0,Description,Value
0,Session id,4089
1,Target,Class variable
2,Target type,Binary
3,Original data shape,"(768, 9)"
4,Transformed data shape,"(741, 9)"
5,Transformed train set shape,"(510, 9)"
6,Transformed test set shape,"(231, 9)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.7709,0.8362,0.5661,0.7267,0.6326,0.4699,0.4803,0.529
lda,Linear Discriminant Analysis,0.769,0.8369,0.5661,0.7215,0.6308,0.4662,0.4762,0.138
ridge,Ridge Classifier,0.7635,0.0,0.5395,0.7266,0.6149,0.449,0.4628,0.175
rf,Random Forest Classifier,0.7635,0.8201,0.5561,0.7139,0.6204,0.4529,0.4636,0.531
et,Extra Trees Classifier,0.756,0.8117,0.5292,0.7021,0.6016,0.4308,0.4411,0.501
xgboost,Extreme Gradient Boosting,0.7544,0.7956,0.5883,0.6749,0.6242,0.4437,0.4492,0.458
qda,Quadratic Discriminant Analysis,0.7524,0.8052,0.6149,0.6671,0.6346,0.4485,0.4536,0.13
gbc,Gradient Boosting Classifier,0.7504,0.8035,0.5716,0.6682,0.6142,0.4316,0.4358,0.495
lightgbm,Light Gradient Boosting Machine,0.7486,0.8036,0.583,0.6646,0.6187,0.4326,0.4365,0.194
nb,Naive Bayes,0.7412,0.817,0.6257,0.6343,0.6274,0.4296,0.4317,0.129


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

---
### **3.4 Model performance using "Transformation"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      transformation = True, transformation_method = 'yeo-johnson')
cm = compare_models()

Unnamed: 0,Description,Value
0,Session id,691
1,Target,Class variable
2,Target type,Binary
3,Original data shape,"(768, 9)"
4,Transformed data shape,"(768, 9)"
5,Transformed train set shape,"(537, 9)"
6,Transformed test set shape,"(231, 9)"
7,Numeric features,8
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
nb,Naive Bayes,0.7594,0.8173,0.5392,0.7291,0.6107,0.4421,0.4591,0.302
ridge,Ridge Classifier,0.7538,0.0,0.4851,0.7272,0.5774,0.4134,0.4332,0.149
lr,Logistic Regression,0.7501,0.8077,0.5064,0.7066,0.5859,0.4133,0.4282,0.262
rf,Random Forest Classifier,0.7408,0.8016,0.5605,0.6632,0.6025,0.4126,0.4198,0.675
qda,Quadratic Discriminant Analysis,0.739,0.807,0.5661,0.6492,0.6028,0.4101,0.4137,0.24
ada,Ada Boost Classifier,0.7279,0.7739,0.5556,0.6293,0.587,0.3858,0.3897,0.376
knn,K Neighbors Classifier,0.6922,0.6919,0.4684,0.5799,0.5131,0.2928,0.2994,0.2
dt,Decision Tree Classifier,0.6886,0.654,0.5395,0.5646,0.5487,0.3127,0.3143,0.175
svm,SVM - Linear Kernel,0.5081,0.0,0.4541,0.3919,0.2829,-0.0034,0.0156,0.145


---
### **3.5 Model performance using "PCA"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      pca = True, pca_method = 'linear')
cm = compare_models()
#pca_method = {linear, kernel, incremental}

---
### **3.6 Model performance using "Outlier Removal" + "Normalization"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      remove_outliers = True, outliers_threshold = 0.05,
      normalize = True, normalize_method = 'zscore')
cm = compare_models()

---
### **3.7 Model performance using "Outlier Removal" +  "Normalization" + "Transformation"**
---

In [None]:
setup(data=diabetesDataSet, target='Class variable',
      remove_outliers = True, outliers_threshold = 0.05,
      normalize = True, normalize_method = 'zscore',
      transformation = True, transformation_method = 'yeo-johnson')
cm = compare_models()

---
### **3.8 Explore more parameters of "setup()" on pycaret**
---
- Explore setup() paramaeters in **Step 1.3**
- **<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html" target="_blank"> Click Here</a>** for more

---
# **4. Classification: More Operations**
---
### **4.1 Build a single model - "RandomForest"**

In [None]:
from pycaret.datasets import get_data
from pycaret.classification import *

diabetesDataSet = get_data("diabetes")
setup(data=diabetesDataSet, target='Class variable')

rfModel = create_model('rf')
# Explore more parameters

---
### **4.2 Other available classification models**
---
-	'ada' -	Ada Boost Classifier
-	'dt' -	Decision Tree Classifier
-	'et' -	Extra Trees Classifier
-	'gbc' -	Gradient Boosting Classifier
-	'knn' -	K Neighbors Classifier
-	'lightgbm' -	Light Gradient Boosting Machine
-	'lda' -	Linear Discriminant Analysis
-	'lr' -	Logistic Regression
-	'nb' -	Naive Bayes
-	'qda' -	Quadratic Discriminant Analysis
-	'rf' -	Random Forest Classifier
-	'ridge' -	Ridge Classifier
-	'svm' -	SVM - Linear Kernel

---
### **4.3 Explore more parameters of "create_model()" on pycaret**
---

**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.create_model" target="_blank"> Click Here</a>**

---
### **4.4 Make prediction on the "new unseen dataset"**
---
#### **Get the "new unseen dataset"**



In [None]:
# Select top 10 rows from diabetes dataset
newDataSet = get_data("diabetes").iloc[:10]

#### **Make prediction on "new unseen dataset"**

In [None]:
newPredictions = predict_model(rfModel, data = newDataSet)
newPredictions

---
### **4.5 "Save" and "Download" the prediction result**
---

In [None]:
newPredictions.to_csv("NewPredictions.csv", index=False)

from google.colab import files
files.download('NewPredictions.csv')

---
### **4.6 "Save" the trained model**
---

In [None]:
sm = save_model(rfModel, 'rfModelFile')

---
### **4.7 Download the "trained model file" to user local system**
---

In [None]:
from google.colab import files
files.download('rfModelFile.pkl')

---
### **4.8  "Upload the trained model" --> "Load the model"  --> "Make the prediction" on "new unseen dataset"**
---
### **4.8.1 Upload the  "Trained Model"**


In [None]:
from google.colab import files
files.upload()

---
### **4.8.2 Load the "Model"**
---

In [None]:
rfModel = load_model('rfModelFile (1)')

---
### **4.8.3 Make the prediction on "new unseen dataset"**
---

In [None]:
newPredictions = predict_model(rfModel, data = newDataSet)
newPredictions

---
# **5. Plot the trained model**
---
**Following parameters can be plot for a trained model**
*   Area Under the Curve         - 'auc'
*   Discrimination Threshold     - 'threshold'
*   Precision Recall Curve       - 'pr'
*   Confusion Matrix             - 'confusion_matrix'
*   Class Prediction Error       - 'error'
*   Classification Report        - 'class_report'
*   Decision Boundary            - 'boundary'
*   Recursive Feat. Selection    - 'rfe'
*   Learning Curve               - 'learning'
*   Manifold Learning            - 'manifold'
*   Calibration Curve            - 'calibration'
*   Validation Curve             - 'vc'
*   Dimension Learning           - 'dimension'
*   Feature Importance           - 'feature'
*   Model Hyperparameter         - 'parameter'

---
### **5.1 Create RandomForest model or any other model**
---

In [None]:
rfModel = create_model('rf')

---
### **5.2 Create "Confusion Matrix"**
---

In [None]:
plot_model(rfModel, plot='confusion_matrix')

---
### **5.3 Plot the "learning curve"**
---

In [None]:
plot_model(rfModel, plot='learning')

---
### **5.4 Plot the "AUC Curve" (Area Under the Curve)**
---

In [None]:
plot_model(rfModel, plot='auc')

---
### **5.5 Plot the "Decision Boundary"**
---

In [None]:
plot_model(rfModel, plot='boundary')

---
### **5.6 Get the model "parameters"**
---

In [None]:
plot_model(rfModel, plot='parameter')

---
### **5.7 Explore the more parameters of "plot_model()" on pycaret**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.plot_model" target="_blank"> Click Here </a>**

---
# **6. Feature Importance**
---
### **6.1 Feature Importance using "Random Forest"**


In [None]:
rfModel = create_model('rf', verbose=False)
plot_model(rfModel, plot='feature')

---
### **6.2 Feature Importance using "Extra Trees Regressor"**
---

In [None]:
etModel = create_model('et', verbose=False)
plot_model(etModel, plot='feature')

---
### **6.3 Feature Importance using "Decision Tree"**
---

In [None]:
dtModel = create_model('dt', verbose=False)
plot_model(dtModel, plot='feature')

---
# **7. Tune/Optimize the model performance**
---
### **7.1 Train "Decision Tree" with default parameters**


In [None]:
dtModel = create_model('dt')

#### **Get the "parameters" of Decision Tree**

In [None]:
plot_model(dtModel, plot='parameter')

---
### **7.2 Tune "Decision Tree" model**
---

In [None]:
dtModelTuned = tune_model(dtModel, n_iter=50)

#### **Get the "tuned parameters" of Decision Tree**

In [None]:
plot_model(dtModelTuned, plot='parameter')

---
### **7.3 Explore more parameters of "tune_model()" on pycaret**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.tune_model" target="_blank"> Click Here </a>**

---
# **8. AutoML - Advanced Machine Learning**
---

- Select n Best Models:
  - Ensemble, Stacking, Begging, Blending
  - Auto tune the best n models

**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.automl" target="_blank">Click Here</a>**


---
# **9. Deploy the model on AWS / Azure**
---
**<a href="https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.deploy_model" target="_blank">Click Here</a>**