## auto-sklearn
- requires Linux operating system (wasn't able to test out since I'm using Windows, but looks simple enough)
- ideal to use if we want to get quick results on what models are best, then do our own feature selection and hyperparameter tuning
- https://automl.github.io/auto-sklearn/master/

## TPOT (Tree-based Pipeline Optimization Tool)
- unlike auto-sklearn, this library performs pre-processing steps and feature selection 
- downside is that can take few hours to 1 day to run, but very high accuracy
- considers the following ML models: naive bayes, tree-based models, logistic regression, linear SVC, k-neighbors classifier
- *it also outputs Python code for best models, so we can use that to further tune hyperparameters, if needed*

In [11]:
#!pip install tpot
#!pip install torch
from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

In [12]:
# sample code for classification problem
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),
    iris.target.astype(np.float64), train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42, scoring='accuracy')
#generations: Number of iterations to run optimization process
#verbosity: How much information TPOT communicates while it is running (0 = none, 1 = minimal, 2 = high, 3 = all)
#scoring: Can change the scoring function (recall, precision, etc.)
#can set an upper bound on run time with max_time_mins
## more details on parameters that can be changed: https://epistasislab.github.io/tpot/using/

tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))

tpot.export('tpot_iris_pipeline.py')

HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=300.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.9727272727272727

Generation 2 - Current best internal CV score: 0.9727272727272727

Generation 3 - Current best internal CV score: 0.9727272727272727

Generation 4 - Current best internal CV score: 0.9731225296442687

Generation 5 - Current best internal CV score: 0.9731225296442687

Best pipeline: RandomForestClassifier(PolynomialFeatures(MultinomialNB(input_matrix, alpha=100.0, fit_prior=True), degree=2, include_bias=False, interaction_only=False), bootstrap=True, criterion=gini, max_features=0.7000000000000001, min_samples_leaf=11, min_samples_split=10, n_estimators=100)
0.9736842105263158


## HyperOpt-sklearn
- bit more complicated that TPOT, but worthwhile if we care about tuning hyperparameters, as this one has a heavy focus on tuning hyperparameters
- also takes few hours to 1 day to run, but very high accuracy
- we can also specify which classifier to apply on the dataset. one idea could be is that we start off with TPOT and then use hyperopt-sklearn to tune hyperparameters (alternatively, use sklearn grid search)
- http://hyperopt.github.io/hyperopt-sklearn/

In [None]:
#!pip install hpsklearn

# sample code below. i kept getting an error, but can troubleshoot if we decide we're going to go with this library.

from pandas import read_csv
from sklearn import preprocessing
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
# minimally prepare dataset
X = X.astype('float32')
y = preprocessing.LabelEncoder().fit_transform(y.astype('str'))
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
# define search
model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), 
                          algo=tpe.suggest, max_evals=50, trial_timeout=30)
model.fit(X_train, y_train)
# summarize performance
acc = model.score(X_test, y_test)
print("Accuracy: %.3f" % acc)
# summarize the best model
print(model.best_model())