![Co-learning Lounge](https://s3.ap-south-1.amazonaws.com/townscript-production/images/2545d2c7-a6e8-486e-97e6-737c42cef670.jpg)
Thanks to the Co-learning Lounge for pushing to create learning content on [PyCaret](https://pycaret.org/) with Kaggle playground problem.

You can find most updated and comprehensive learning material in their community.

Join and follow the [Co-learning Lounge](https://linktr.ee/colearninglounge) for more.

**Data Dictionary**:
* survival - Survival (0 = No; 1 = Yes)
* class - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
* name - Name
* sex - Sex
* age - Age
* sibsp - Number of Siblings/Spouses Aboard
* parch - Number of Parents/Children Aboard
* ticket - Ticket Number
* fare - Passenger Fare
* cabin - Cabin
* embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)

## Import Libararies

In [None]:
import numpy as np # linear algebra
import pandas as pd 

Quickly let us get into the installation and build a perfect model.

In [None]:
!pip3 install pycaret

In [None]:
train = pd.read_csv("/kaggle/input/titanic/train.csv")
test = pd.read_csv("/kaggle/input/titanic/test.csv")

**Setup**

* Setup() performs inferences about the data and creates the transformation pipeline to prepare the data for modeling and deployment. 
* Initializing setup() function performs some basic preprocessing tasks like ignoring the IDs and Date Columns, imputing the missing values, encoding the categorical variables, and splitting the dataset into the train-test split, data imbalance, feature selection, binning, etc. for the rest of the modeling steps. When you run the setup function, it will first confirm the data types, and then if you press enter, it will create an environment for data preprocessing.
* It takes 2 mendatory parameter Dataframe and name of the target column


In [None]:
from pycaret import classification
classification_setup = classification.setup(data=train,target='Survived', ignore_features = ['Ticket', 'Name', 'PassengerId'], silent = True, session_id=42)

So, now the necessary preprocessing is done, let’s create a classification model. 

**Compare Models**

* Compare_models function train all the models which are available in library using stratified cross validation, this function will return score grid of all model across k-fold(default=10).
* Scoring matrics used are Accuracy, AUC, Recall, Precision, F1, Kappa and MCC. Mean and standard deviation of the scores across the folds are also returned.
* You can blacklist(omit certain models from the comparison) and whiltelist(un only certain models for the comparison) the model, passig model ID’s as a list of strings
eg. whitelist = compare_models(whitelist = ['dt','rf','xgboost'])
blacklist = compare_models(blacklist = ['catboost', 'svm'])
* Best model return as per the sort parameter(default=Accuracy) passed.
* Also we can select N top models passing n_select(default=1) parameter

In [None]:
classification.compare_models()

In [None]:
from pycaret.classification import *
models()
compare_models(whitelist = models(type='ensemble').index.tolist())

This returns you pandas dataframe with all ready-to-use models available in the library.

Just a functional execution call, and it will compare all the classification models with few seconds and display the sorted score grid.

Note: It seems that the Ridge classifier gives higher accuracy than the rest classifier.

**Create Model**

* Let’s create an individual model that displays different evaluation matric using 10 k-fold with mean and std.
* create_model function takes just the one parameter – the model abbreviation as a string.

In [None]:
lgb_classifier = classification.create_model('lightgbm')

* Above score grid, shows the result of the model at each iteration and provide mean and std of it.

**Hyperparameter Tuning**

* Depending on the model evaluation metric(s) we are interested in pycaret helps us to straightaway zoom in on the top-performing model which we can further tune using the hyper-parameters.
* tune_model() function tune the hyperparameters of a model and it takes one parameter model abbreviation string (same as we used for creating model)

In [None]:
params = {'learning_rate':[0.15,0.1,0.05,0.01,0.005,0.001],
          'n_estimators':[100,250,500,750,1000,1250,1500,1750],
          'max_depth': np.random.randint(1, (len(train.columns)*.85),20),
          'max_features': np.random.randint(1, len(train.columns),20),
          'min_samples_split':[2,4,6,8,10,20,40,60,100], 
          'min_samples_leaf':[1,3,5,7,9],
          'criterion': ["gini", "entropy"]}

tune_lgb = classification.tune_model(lgb_classifier, custom_grid = params)

In [None]:
# Tune the model
params = {'alpha':[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]}
tune_ridge = classification.tune_model(create_model('ridge'), custom_grid = params, n_iter=50, fold=50)

In PyCaret, we can create bagging, boosting, blending, and stacking ensemble models with just one line of code.

**Ensemble Model**

In [None]:
# ensemble boosting
bagging = classification.ensemble_model(tune_lgb, method= 'Bagging')

**Blend Models**

Combining different machine learning models and use a majority vote or the average predicted probabilities in case of classification to predict the final outcome.

In [None]:
from pycaret.classification import blend_models
# blending all models
blend_all = blend_models(method='hard')

**Stack Models**

Stacking is an ensembling method that uses meta-learning. The idea behind stacking is to build a meta-model that generates the final prediction using the prediction of multiple base estimators.

In [None]:
# create individual models for stacking
ridge_cls = classification.create_model('ridge')
extre_tr = classification.create_model('et')
lgb = classification.create_model('lightgbm')
cat_cls = classification.create_model('catboost')
lg_cls = classification.create_model('lr')


In [None]:
from pycaret.classification import stack_models
# stacking models
stacker = stack_models(estimator_list = [ridge_cls, extre_tr, lgb, cat_cls, lg_cls],method='hard')

**Plot Model**
Pycaret also evaluate your model performance as easy as you build the model with different plots

In [None]:
interpret_model(tune_lgb)

In [None]:
from pycaret.classification import *
# plotting a model
plot_model(tune_lgb,plot = 'pr')

In [None]:
# plotting a model
plot_model(tune_lgb,plot = 'confusion_matrix')

In [None]:
# Validation Curve
plot_model(tune_lgb, plot = 'vc')`

In [None]:
# AUC Curve
plot_model(tune_lgb, plot = 'auc')

In [None]:
# error Curve
plot_model(tune_lgb, plot = 'error')

**Prediction**

In [None]:
y_pred = predict_model(tune_lgb, data=test)

In [None]:
y_pred

In [None]:
submission = pd.DataFrame({
        "PassengerId": test["PassengerId"],
        "Survived": y_pred['Label']
    })
submission.to_csv("submission.csv", index=False)