# **AIDI-1010**

# **Contents:**
**Module: Auto-Sklearn**

1.   Installing The Module (auto-sklearn)
2.   Example-A (Classification)
3.   Example-B (Regression)





# **1 - Installing auto-sklearn (Google Colab)**

In [None]:
#1.1 - Install Linux Dependencies & Module (approx 1min)
!sudo apt-get install build-essential swig
!pip install auto-sklearn
!pip install dask distributed
!pip install pipelineprofiler
!pip install scipy==1.7

Note: Restart Runtime & Re-Run the same commands above.

In [None]:
#1.2 - Re-Verify Installation of Linux Dependencies & Module (approx < 30secs)
!sudo apt-get install build-essential swig
!pip install auto-sklearn
!pip install dask distributed
!pip install pipelineprofiler
!pip install scipy==1.7

# **2- ExampleA (Classification)**

In [None]:
#2.1 Loading modules
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
from autosklearn.classification import AutoSklearnClassifier
import PipelineProfiler

In [None]:
#2.2 - Load the data in a dataframe using Pandas

# data-source: https://archive.ics.uci.edu/ml/datasets/wine+quality; P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

winedf = pd.read_csv('https://raw.githubusercontent.com/sharmaroshan/Wine-Quality-Predictions/master/winequality-red.csv')
winedf

In [None]:
#2.3 - Training/Testing sets; Features/Target; Split dataset

In [None]:
#2.3.a - Create dataset with dataframe values
dataset_wine = winedf.values
print("This is the shape of dataset: \n",dataset_wine.shape)

In [None]:
#2.3.b - Create features/target
ft, target = dataset_wine[:,:-1],dataset_wine[:,-1]
print("These are the features: \n",ft,ft.shape)
print("This is the target: \n",target,target.shape)

In [None]:
#2.3.c - Split dataset into Train/Test sets with default parameters
X_train, X_test, y_train, y_test = train_test_split(ft, target, test_size=0.2, random_state=1)

In [None]:
#2.4 - Build the classification model

In [None]:
#2.4.a - Instantiate the object
autosk1 = AutoSklearnClassifier(time_left_for_this_task=30,per_run_time_limit=5,n_jobs=1)

#time_left_for_this_task = how long you want the process to run for (by default - runs for an hour if not defined); seconds
#per_run_time_limit = time allocated to each model evaluation; seconds
#n_jobs = number of cores on the system to improve performance
#ensemble_size = to avoid overfitting of the data
#initial_configurations_via_meta_learning = to avoid overfitting

In [None]:
#2.4.b - Fit the model with the data (approx 2mins)
autosk1.fit(X_train,y_train)

In [None]:
#2.4.c - Print statistics
print(autosk1.sprint_statistics())

In [None]:
#2.4.d - Print Models
show_modes_str = autosk1.show_models()
print(show_modes_str)

In [None]:
#2.5 - Use pipelineprofiler to graph

In [None]:
#2.5.a - Uses the module, fit the model results inside pipeline
profiler_data = PipelineProfiler.import_autosklearn(autosk1)

In [None]:
#2.5.b - Plot the graph
PipelineProfiler.plot_pipeline_matrix(profiler_data)

In [None]:
#2.6 - Check accuracy of the model with new prediction using Test data
pred1 = autosk1.predict(X_test)
print("Accuracy score",accuracy_score(y_test,pred1))

# **3- ExampleB (Regression)**

In [None]:
#3.1 - Load the modules
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
import pandas as pd
from autosklearn.regression import AutoSklearnRegressor 
from autosklearn.metrics import mean_absolute_error 
import PipelineProfiler

In [None]:
#3.2 - Load data; Features/Targets; Set dataset; Split dataset

In [None]:
#3.2.a - Load data
from sklearn.datasets import load_boston
boston_data = load_boston()

In [None]:
#3.2.b - Features/Targets
ftboston = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
trgtboston = pd.DataFrame(boston_data.target,columns=['TARGET'])
print("These are the features: \n",ftboston,ftboston.shape)
print("This is the target: \n",trgtboston,trgtboston.shape)

In [None]:
#3.2.c - Set Dataset
bostondf = pd.concat([ftboston,trgtboston],axis=1)
print("This is the shape of the dataset: \n",bostondf.shape)

In [None]:
#3.2.d - Split dataset (train/test)
xtrain,xtest,ytrain,ytest = train_test_split(ftboston,trgtboston,test_size=0.2)

In [None]:
#3.3 - Build regression model

In [None]:
#3.3.a - Instantiate the object, create the model
autosk2 = AutoSklearnRegressor(time_left_for_this_task=90,per_run_time_limit=5,n_jobs=1,metric=mean_absolute_error)
#Note: By default, metric used here R^2, we can replace with MAE

In [None]:
#3.3.b - Fit the data
autosk2.fit(xtrain,ytrain)

In [None]:
#3.3.c - Print statistics
print(autosk2.sprint_statistics())

Watch this MAE guide on: https://youtu.be/K490SP-_H0U

In [None]:
#3.3.d - Print Models
show_model_sk2 = autosk2.show_models()
print(show_model_sk2)

In [None]:
#3.4 - Use pipeline to graph

In [None]:
#3.4.1 - Use pipelinprofiler to fit model results (training)
profiler_data = PipelineProfiler.import_autosklearn(autosk2)

In [None]:
#3.4.2 - Plot the graph
PipelineProfiler.plot_pipeline_matrix(profiler_data)

In [None]:
#3.5 - New predictions based on test data
pred2 = autosk2.predict(xtest)
mae = mean_absolute_error(ytest,pred2)
print("MAE: ",mae)

Additional examples: https://automl.github.io/auto-sklearn/master/examples/index.html