# AutoML using H2O
## Tabular Playground Series - Jan 2021

### Description:
In this notebbok we are going to use H2O's AutoML. It is one of the largest used AutoML libraries and is known for giving very good results. For the sake of demonstration I am going to try only for 3 model search but you can always experiement with it and train it for longer duration.

The following notebook has been inspired from various tutorials and kernels that have used H2O's AutoML to secure good ranks. Personally I found the results quite satisfactory after using this kernel conisdering the amount of work and time I had to spend to achieve that score.

## IMPORTING DEPENDENCIES

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
import h2o
from h2o.automl import H2OAutoML
h2o.init()

## IMPORTING DATASET

### H2O has its own way of handling datasets and we will need to import them as a file rather than reading them as a csv.

In [None]:
train = h2o.import_file('/kaggle/input/tabular-playground-series-jan-2021/train.csv')
test = h2o.import_file('/kaggle/input/tabular-playground-series-jan-2021/test.csv')

In [None]:
# Let us also read the csv in case we need them in later.

train_df = pd.read_csv('/kaggle/input/tabular-playground-series-jan-2021/train.csv')
test_df = pd.read_csv('/kaggle/input/tabular-playground-series-jan-2021/test.csv')

In [None]:
train.describe()

In [None]:
# Prepare the data

y = 'target'
x = train.columns
x.remove(y)
x.remove('id')

In [None]:
# max_models can be played around with and seed as well. Greater the number of max_models greater is the time that its gonna take. The best part about it is that
# It even tries out various ensemble models.

aml = H2OAutoML(max_models = 3, seed = 1)
aml.train(x = x, y = y, training_frame = train)

In [None]:
# h20 saves the models in a table format where it has the model name and the various parameters such as rmse, mse, mae and more
lb = aml.leaderboard

In [None]:
# Let's have a look at some of the rows in the table.
lb.head()

In [None]:
# To view all the models and their scores we can use the rows function to display all of them.
lb.head(rows=lb.nrows)

In [None]:
# choose the best model which is the first record in the table as our model.
model = aml.leader

In [None]:
# use the leader model to predict on the test dataset. Note we are using the test file imported in h2o and not the dataframe/
preds = model.predict(test)

In [None]:
# convert the predicts into a list using the as_list function adn then create our final submission file.
final = h2o.as_list(preds)
final['predict']

In [None]:
sub = pd.DataFrame()
sub['id'] = test_df['id']
sub['target'] = final['predict']
sub.to_csv('submission.csv',index=False)

In [None]:
# If you are reading this thanks for dropping by. Please upvote if you find it useful.