## Imports

In [1]:
%load_ext lab_black
%load_ext autoreload
%autoreload 2

In [4]:
from supervised.automl import AutoML
from sklearn.model_selection import train_test_split
from pathlib import Path
import joblib
import pandas as pd
import numpy as np

# Fixing the random seed
np.random.seed(10)

In [5]:
import sys

sys.path.append("../")  # append parent dir to sys.path

In [6]:
import wind_constants as cst
import main
import feature_engineering as fe

In [7]:
import matplotlib.pyplot as plt
import seaborn as sns

In [8]:
plt.rcParams.update(cst.params)
sns.set_style("white")

In [9]:
pd.set_option("max_colwidth", 1500)
pd.set_option("display.width", None)

## Feature Engineering 

In [10]:
data = joblib.load("../data/processed/processed_uncleaned.joblib")

In [13]:
X_train, X_test, y_train, y_test = train_test_split(
    data[cst.FEATURES], data[cst.TARGET], test_size=0.5, shuffle=False
)

## MLjar Explain 

In [17]:
automl = AutoML(mode="Explain", explain_level=2)

In [18]:
automl.fit(X_train, y_train)

AutoML directory: AutoML_2
The task is regression with evaluation metric rmse
AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble availabe models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 3 models
1_Baseline rmse 0.17012 trained in 0.11 seconds
2_DecisionTree rmse 0.093554 trained in 6.14 seconds
3_Linear rmse 0.067575 trained in 3.05 seconds
* Step default_algorithms will try to check up to 3 models
4_Default_RandomForest rmse 0.069982 trained in 8.37 seconds
5_Default_Xgboost rmse 0.052356 trained in 10.73 seconds
6_Default_NeuralNetwork rmse 0.056 trained in 13.15 seconds
* Step ensemble will try to check up to 1 model
Ensemble rmse 0.051063 trained in 0.17 seconds
AutoML fit time: 50.47 seconds


AutoML(explain_level=2)

In [16]:
predictions = automl.predict(X_test)

## MLjar Perform

In [21]:
automl = AutoML(mode="Perform", explain_level=2)

In [22]:
automl.fit(X_train, y_train)

AutoML directory: AutoML_2
The task is regression with evaluation metric rmse
AutoML will use algorithms: ['Linear', 'Random Forest', 'LightGBM', 'Xgboost', 'CatBoost', 'Neural Network']
AutoML will ensemble availabe models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'not_so_random', 'golden_features', 'insert_random_feature', 'features_selection', 'hill_climbing_1', 'hill_climbing_2', 'ensemble']
* Step simple_algorithms will try to check up to 1 model
1_Linear rmse 0.068233 trained in 22.18 seconds
* Step default_algorithms will try to check up to 5 models
2_Default_RandomForest rmse 0.0697 trained in 52.73 seconds
3_Default_Xgboost rmse 0.051096 trained in 56.98 seconds
4_Default_LightGBM rmse 0.050538 trained in 70.45 seconds
5_Default_CatBoost rmse 0.048951 trained in 25.56 seconds
6_Default_NeuralNetwork rmse 0.056412 trained in 111.81 seconds
* Step not_so_random will try to check up to 20 models
7_Xgboost rmse 0.053575 trained in 419.58 seconds
8_Xgboost rmse 0.05

AutoML(explain_level=2, mode='Perform')