Thanks for the dataset (generated by TSFresh library) from [Alexander Lyubchenko](https://www.kaggle.com/carpediemamigo):
- https://www.kaggle.com/carpediemamigo/ingv-catboost-baseline-tsfresh/data
- https://www.kaggle.com/carpediemamigo/ingv-tsfresh-7730

Since this competition permits the use of automated machine learning tool(s) (“AMLT”), this notebook uses **h2o automl** without tuning hyperparameters, and keeps the same features as my previous notebook (features after resampling):
- https://www.kaggle.com/patrick0302/ingv-volcanic-eruption-prediction-add-resampling

Also, notebooks below are some very useful references:
- https://www.kaggle.com/ajcostarino/ingv-volcanic-eruption-prediction-lgbm-baseline
- https://www.kaggle.com/tunguz/lanl-earthquake-with-h2o-automl

Here are present results with different runtimes:

**Note: Even with the same settings (max_models, seed, and max_runtime_secs), the prediction results of each run seem to be somehow different.**


|Trial|   Runtime(mins) |   Public Score |   AutoML validation score|   |
|---:|----------:|---------------:|--------------------:|   |
|  0 |         1 |    ~7.65320e+06  |         ~5.18657e+06 |   |
|  1 |        10 |    ~6.16750e+06  |         ~3.74239e+06 |   |
|  2 |        30 |    ~6.01172e+06 |         ~3.36789e+06 |   |
|  3 |       120 |    ~5.81476e+06 |         ~3.27152e+06 |  **Best result !!!** |
|  4 |       360 |    ~5.95679e+06 |          ~3.11598e+06	|  **Overfitting :(** |





In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import lightgbm as lgb

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os

In [None]:
import h2o
print(h2o.__version__)
from h2o.automl import H2OAutoML

h2o.init(max_mem_size='16G')

In [None]:
train = pd.read_csv('/kaggle/input/predict-volcanic-eruptions-ingv-oe/train.csv')
test = pd.read_csv('/kaggle/input/predict-volcanic-eruptions-ingv-oe/sample_submission.csv')

In [None]:
scaled_feature_df = pd.read_csv('../input/ingv-tsfresh-7730/train.csv', sep = ';', index_col=0)
scaled_feature_df = scaled_feature_df.loc[train['segment_id']]
scaled_test_df = pd.read_csv('../input/ingv-tsfresh-7730/test.csv', sep = ';', index_col=0)
scaled_test_df = scaled_test_df.loc[test['segment_id']]

# Use lightgbm to select important features

In [None]:
from sklearn.feature_selection import SelectFromModel

In [None]:
sfm = SelectFromModel(estimator=lgb.LGBMRegressor())
X = scaled_feature_df.drop('time_to_eruption',axis=1).copy()
X.columns = list(np.arange(len(X.columns)))
y = scaled_feature_df['time_to_eruption']
sfm.fit(X, y)

In [None]:
selected_features = list(scaled_feature_df.drop('time_to_eruption',axis=1).columns[sfm.get_support()])
selected_features

In [None]:
print('Number of selected features: ' + str(len(selected_features)))

In [None]:
scaled_feature_df = scaled_feature_df[selected_features]
scaled_test_df = scaled_test_df[selected_features]

# Create model (h2o automl)

In [None]:
train_h2o = h2o.H2OFrame(scaled_feature_df)
train_label_h2o = h2o.H2OFrame(train[['time_to_eruption']])
train_h2o['time_to_eruption'] = train_label_h2o['time_to_eruption']

test_feature_h2o = h2o.H2OFrame(scaled_test_df)

In [None]:
print(train_h2o.shape)
print(test_feature_h2o.shape)

In [None]:
x = test_feature_h2o.columns
y = 'time_to_eruption'

In [None]:
aml = H2OAutoML(max_models=1000, seed=121, stopping_metric='MAE',
                max_runtime_secs=360*60) # set 360 minutes
aml.train(x=x, y=y, training_frame=train_h2o)

In [None]:
# View the AutoML Leaderboard
lb = aml.leaderboard
lb.head(rows=lb.nrows)  # Print all rows instead of default (10 rows)

In [None]:
# The leader model is stored here
aml.leader

In [None]:
# If you need to generate predictions on a test set, you can make
# predictions directly on the `"H2OAutoML"` object, or on the leader
# model object directly

preds = aml.predict(test_feature_h2o)

In [None]:
submission = pd.DataFrame()
submission['segment_id'] = test['segment_id']
submission['time_to_eruption'] = preds.as_data_frame().values.flatten()
submission.loc[submission['time_to_eruption']<0, 'time_to_eruption'] = 0 #make sure all prediction values are larger than 0
submission.to_csv('submission_recent.csv', header=True, index=False)

In [None]:
submission