### In this notebook we gonna use two AutoML libraries:
1. H2O 
2. PyCaret

**It is a very introductory notebook about AutoML, we will not be going into much depth, though this notebook will give you a good idea about AutoML.**

> More about AutoML: Automated machine learning is the process of automating the process of applying machine learning to real-world problems. AutoML covers the complete pipeline from the raw dataset to the deployable machine learning model.

### Importing Libraries 

In [None]:
import pandas as pd
import numpy as np

### Reading the training and testing data


In [None]:
train = pd.read_csv('../input/tabular-playground-series-feb-2021/train.csv')
test = pd.read_csv('../input/tabular-playground-series-feb-2021/test.csv')

train.head()

#### Lets look at little summary of data


In [None]:
train.describe()

## **Pandas Profiling:** Yet nother automated library for EDA


In [None]:
import pandas_profiling as pp 
profile = pp.ProfileReport(train) 
profile.to_file("output.html")
profile

# 1. H2O.ai: Our First AutoML Library

> H2O AI Hybrid Cloud offers an end-to-end platform that democratizes artificial intelligence, enabling every employee, customer, and citizen with sophisticated AI technology and easy-to-use AI applications.

H2O is one the most advanced AutoML library, check more about H2O here: https://www.h2o.ai/

### Importing and Initializing h2o


In [None]:
import h2o
h2o.init()

### Changing the train data into h2o_train
> More about H2OFrame : H2OFrame is similar to pandas' ``DataFrame``, or R's ``data.frame``. One of the critical distinction is that the
> data is generally not held in memory, instead it is located on a (possibly remote) H2O cluster, and thus
> ``H2OFrame`` represents a mere handle to that data.

In [None]:
h2o_train = h2o.H2OFrame(
    train,
    skipped_columns=None ) 

### Divinding traning features and Terget feature

In [None]:
x = h2o_train.columns
y = 'target'
x.remove(y)   # will remove target column data from training daa

### Setting up the H2OAutoML
> More about it: The Automatic Machine Learning (AutoML) function automates the supervised machine learning model training process.


In [None]:
from h2o.automl import H2OAutoML 
aml = H2OAutoML(
    nfolds = 3,              # number of KFolds for cross-validation
    max_runtime_secs = 2000, # after this certain time limit, further model training will be discarded
    seed = 42,               # random seed
    stopping_metric ='RMSE', 
    sort_metric ='RMSE', 
    max_models=40            # maximum number of models you want H2o to train and validate on your data
) 

### Training the AutoML model

In [None]:
%%time
aml.train(x = x, y = y, training_frame = h2o_train) 

### View the H2O aml leaderboard

In [None]:
lb = aml.leaderboard 
lb.head(rows = lb.nrows)

### More details about the leader(best) trained model

In [None]:
aml.leader

## Changing our Test data in H2O dataframe format and Prediction using best model!!!

In [None]:
h2o_test = h2o.H2OFrame(test) 
preds = aml.predict(h2o_test)
preds.as_data_frame().values.flatten()

In [None]:
sample_submission = pd.read_csv('../input/tabular-playground-series-feb-2021/sample_submission.csv')
sample_submission['target'] = preds.as_data_frame().values.flatten()
sample_submission.to_csv('h2o_submission.csv', index=False)

# 2. PyCaret: Our Second AutoML Library

> More about PyCaret: PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of notebook environment.

PyCaret is a minimal code AutoML library, read more about PyCaret here: https://pycaret.org/

### Installing PyCaret

In [None]:
!pip install pycaret

### Importing Regression module from PyCaret library

In [None]:
from pycaret.regression import *

## Setting up PyCaret Regression model

In [None]:
exp_reg = setup(
                train,           # passing the pandas dataframe for training
                target="target", # telling which one is Target feature
                fold = 3         # number of KFold for cross-validation
               )

### Comparing all models using *compare_models()*

> More about compare_models(): This function trains and evaluates performance of all estimators available in the 
> model library using cross validation.

In [None]:
compare_models(sort = 'RMSE')

##### As we can see **CatBoost** is performing good, we will create a model with CatBoost!
> More about create_model: Creating a model in any module is as simple as writing create_model. It takes only one parameter i.e. the Model ID as a string. For supervised modules (classification and regression) this function returns a table with k-fold cross validated performance metrics along with the trained model object.

In [None]:
catboost = create_model("catboost")

# Lets do the Prediction!!!

> More about predict_model: This functions takes a trained model object and the dataset to predict. It will automatically apply the entire transformation pipeline created during the experiment

In [None]:
prediction = predict_model( catboost,data=test)

In [None]:
sample_submission = pd.read_csv('../input/tabular-playground-series-feb-2021/sample_submission.csv')
sample_submission['target'] = prediction['Label']
sample_submission.to_csv('pycaret_submission.csv', index=False)

Note: We can further improve accracy of our models with auto-hyperparameter tuning, but we might do that in some other notebook.
    
## End of Notenook    