# The Volcano and the Regularized Greedy Forest
This is a demonstration script using the ***Regularized Greedy Forest*** regressor (RGF)(see my notebook ["Introduction to the Regularized Greedy Forest"](https://www.kaggle.com/carlmcbrideellis/introduction-to-the-regularized-greedy-forest)) for the [INGV - Volcanic Eruption Prediction](https://www.kaggle.com/c/predict-volcanic-eruptions-ingv-oe) competition. The RGF performs as well as XGBoost, and is a very useful estimator to include when one is creating a [stacking ensemble](https://www.kaggle.com/carlmcbrideellis/stacking-ensemble-using-the-house-prices-data), which combines multiple estimators to produce one strong result. For the input I use the `train.csv` and `test.csv` produced by the excellent notebook ["INGV Volcanic Eruption Prediction - LGBM Baseline"](https://www.kaggle.com/ajcostarino/ingv-volcanic-eruption-prediction-lgbm-baseline) written by [Adam James](https://www.kaggle.com/ajcostarino). (For completeness I include these `train.csv` and `test.csv` files in the **Output** section of this notebook, as they take nearly three hours to produce). I have not undertaken any feature selection (for example using the [Boruta-SHAP](https://www.kaggle.com/carlmcbrideellis/feature-selection-using-the-borutashap-package) package), nor have I performed any cross validation, hyperparameter tuning, *etc.* so there is *plenty* of room for improvement.

I hope you find the RGF technique useful, and good luck!

In [None]:
import pandas  as pd
import numpy   as np

In [None]:
train  = pd.read_csv('../input/ingv-lgbm-baseline-the-train-test-csv-files/volcano_train.csv')
test   = pd.read_csv('../input/ingv-lgbm-baseline-the-train-test-csv-files/volcano_test.csv')
sample = pd.read_csv('../input/predict-volcanic-eruptions-ingv-oe/sample_submission.csv')

In [None]:
X_train       = train.drop(["segment_id","time_to_eruption"],axis=1)
y_train       = train["time_to_eruption"]
X_test        = test.drop("segment_id",axis=1)

In [None]:
from rgf.sklearn import RGFRegressor

regressor = RGFRegressor(max_leaf=2000, 
                         algorithm="RGF_Sib", 
                         test_interval=100, 
                         loss="LS",
                         verbose=False)

regressor.fit(X_train, y_train)
predictions = regressor.predict(X_test)

In [None]:
sample.iloc[:,1:] = predictions
sample.to_csv('submission.csv',index=False)

#### Appendix
Write out a copy of the `train.csv` and `test.csv` files used in this work.

In [None]:
train.to_csv('volcano_train.csv')
test.to_csv('volcano_test.csv')