![](https://raw.githubusercontent.com/Carl-McBride-Ellis/images_for_kaggle/main/H2O_ai_logo.png)
# H2O.ai Gradient boosting classifier
In this notebook we shall be using the Gradient boosting classifier ([`H2OGradientBoostingEstimator`](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gbm.html)) from [H2O.ai](https://www.h2o.ai/)

To learn more about H2O.ai see:
* [H2O.ai Overview](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html)
* [H2O.ai Tutorials](https://docs.h2o.ai/h2o-tutorials/latest-stable/index.html)

Firstly, import `h2o` and start a local H2O server

In [None]:
import h2o
h2o.init()

Read in the data as a [H2OFrame](https://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/frame.html), the primary data store for H2O. 

Note that 

> "*One of the critical distinction is that the data is generally not held in memory, instead it is located on a (possibly remote) H2O cluster*" 

so we see that our notebook RAM is hardly used at all, and we can read in the whole `train` and `test` datasets with ease:

In [None]:
train_data = h2o.import_file('../input/tabular-playground-series-oct-2021/train.csv')
test_data  = h2o.import_file('../input/tabular-playground-series-oct-2021/test.csv')

### Take a quick look at the data

In [None]:
train_data

Convert the `target` column in the `train_data` frame to be categorical, indicating to the estimator that this is a classification problem. As for the other categorical features, H2O automatically takes care of them.

In [None]:
train_data["target"] = train_data["target"].asfactor()

We shall be using the [H2O Gradient Boosting Machine](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gbm.html)

In [None]:
from h2o.estimators.gbm import H2OGradientBoostingEstimator

classifier  =  H2OGradientBoostingEstimator(nfolds =    5,
                                            ntrees =  100,
                                            seed   =    1,
                                            max_depth = 9, 
                                            stopping_rounds =   5,
                                            stopping_metric = "AUC")

### Train our classifier

In [None]:
classifier.train(training_frame=train_data, y="target")

### Produce our predictions

In [None]:
predictions    = classifier.predict(test_data)
predictions_df = predictions["p1"].as_data_frame()

### Insert the predictions into the `sample_submission.csv` provided

In [None]:
import pandas as pd
sample = pd.read_csv('../input/tabular-playground-series-oct-2021/sample_submission.csv')
sample["target"] = predictions_df["p1"]
sample.to_csv('submission.csv',index=False)