In [1]:
import pandas as pd

In [2]:
# reading the data
df_heart=pd.read_csv('../../statistics-and-machine-learning-learning/data/framingham.csv')
#df_heart.replace(np.nan,"NaN")
df_heart.dropna(axis=0,inplace=True)

##separation in X and y
X_heart = df_heart.drop( columns = "TenYearCHD" )
y_heart = df_heart[ "TenYearCHD" ]

In [3]:
##let's start by splitting the data into a train and a validaiton set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_heart, y_heart, stratify=y_heart, random_state=94)

# hyperopt-sklearn

We will access hyperopt through the [hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn), authored by some of the original hyperopt authors.

It offer a convenient interface for classical ML-learning algorithms, with a lot of pre-sets corresponding to sklearn routines and parameters.


Unfortunately, the documentation of this library is sparse, 
and spread between the [github readme](https://github.com/hyperopt/hyperopt-sklearn), the [github page](http://hyperopt.github.io/hyperopt-sklearn/), their [scipy2014 paper](http://conference.scipy.org.s3-website-us-east-1.amazonaws.com/proceedings/scipy2014/pdfs/komer.pdf), and the class and function `help()`.

So through a sery of example we will try to demonstrate some of the basic usage of this library, 
as well as how to do some of the less documented things.


## basic usage

the library implements "components" corresponding to sklearn (or sklearn-adjacent) objects with defined hyper-parameter search spaces

[list of available components](https://github.com/hyperopt/hyperopt-sklearn?tab=readme-ov-file#available-components)

In [57]:
## importing 
# * HyperoptEstimator -> basic hpsklearn object which wraps the optimization procedure
# * svc , standard_scaler -> components for the SVM classifier and standard_scaler

from hpsklearn import HyperoptEstimator, svc , standard_scaler 


In [5]:
%%time
estim = HyperoptEstimator(classifier=svc("mySVC"), ## the call to svc takes only a name, and will setup a default search space for it hyper-parameters
                         trial_timeout=120) ## sometimes the fitting/evaluating process gets stuck
## set a timeout to prevent being stuck for too long

estim.fit(X_train, y_train)

print(estim.score(X_test, y_test))

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10.10trial/s, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  6.36trial/s, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:21<00:00, 21.79s/trial, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 14.44trial/s, best loss: 0.1657559198542805]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  7.17trial/s, best loss: 0.16393442622950816]
100%|█████████████████████████████████████████████████████████████████



Important notes about what happened (ie, the "hidden" default parameters):

 * 80% of the data was used as a train set, 20% as validation set
 * the 20% validation set are the last elements of the given data
 * the score being optimized is 1-accuracy(validation set) (hyperopt always tries to minimize)
 * the optimization procedure ran for a fixed number of 10 rounds
 * `HyperoptEstimator` has added a random preprocessing step

    

In [7]:
## getting the best model
estim.best_model()

{'learner': SVC(C=0.8661828616263555, coef0=0.32310053227283486,
     decision_function_shape='ovo', degree=5, random_state=np.int64(3),
     shrinking=False, tol=0.0019357643730025579),
 'preprocs': (StandardScaler(with_mean=False),),
 'ex_preprocs': ()}

In [24]:
## the chosen kernel, is unfortunately, not shown in the summary above
## we can fetch it from the object itself
estim.best_model()['learner'].kernel

'rbf'

In [15]:
X_test.iloc[:5,:]

Unnamed: 0,male,age,education,currentSmoker,cigsPerDay,BPMeds,prevalentStroke,prevalentHyp,diabetes,totChol,sysBP,diaBP,BMI,heartRate,glucose
1441,0,45,2.0,0,0.0,0.0,0,0,0,262.0,116.0,66.0,21.56,66.0,76.0
4210,1,50,1.0,0,0.0,0.0,0,0,0,282.0,126.5,88.0,27.3,85.0,87.0
1732,0,52,2.0,0,0.0,0.0,0,0,0,221.0,124.0,69.0,23.37,58.0,81.0
2503,1,43,1.0,1,20.0,0.0,0,0,1,309.0,124.0,85.0,26.91,70.0,215.0
3222,0,44,1.0,0,0.0,0.0,0,0,0,200.0,128.0,82.0,23.24,80.0,73.0


In [16]:
## predicting on new data using the best model
estim.predict( X_test.iloc[:5,:] )



array([0, 0, 0, 0, 0])

In [18]:
## we can investigate the individual trials 
estim.trials.trials

[{'state': 2,
  'tid': 0,
  'spec': None,
  'result': {'loss': 0.1657559198542805,
   'loss_variance': 0.00025233739942982086,
   'status': 'ok',
   'duration': 0.08312249183654785},
  'misc': {'tid': 0,
   'cmd': ('domain_attachment', 'FMinIter_Domain'),
   'workdir': None,
   'idxs': {'mySVC.svc_C': [np.int64(0)],
    'mySVC.svc_coef0': [np.int64(0)],
    'mySVC.svc_decision_function_shape': [np.int64(0)],
    'mySVC.svc_degree': [np.int64(0)],
    'mySVC.svc_gamma': [np.int64(0)],
    'mySVC.svc_kernel': [np.int64(0)],
    'mySVC.svc_random_state': [np.int64(0)],
    'mySVC.svc_shrinking': [np.int64(0)],
    'mySVC.svc_tol': [np.int64(0)],
    'preprocessing': [np.int64(0)],
    'preprocessing.min_max_scaler.clip': [],
    'preprocessing.min_max_scaler.feature_min': [],
    'preprocessing.normalizer.norm': [np.int64(0)],
    'preprocessing.pca.n_components': [],
    'preprocessing.pca.whiten': [],
    'preprocessing.standard_scaler.with_mean': [],
    'preprocessing.standard_scaler.

In [19]:
## trials scores
estim.trials.results

[{'loss': 0.1657559198542805,
  'loss_variance': 0.00025233739942982086,
  'status': 'ok',
  'duration': 0.08312249183654785},
 {'loss': 0.1657559198542805,
  'loss_variance': 0.00025233739942982086,
  'status': 'ok',
  'duration': 0.14423751831054688},
 {'loss': 0.1657559198542805,
  'loss_variance': 0.00025233739942982086,
  'status': 'ok',
  'duration': 21.782816886901855},
 {'loss': 0.1657559198542805,
  'loss_variance': 0.00025233739942982086,
  'status': 'ok',
  'duration': 0.058792829513549805},
 {'loss': 0.16393442622950816,
  'loss_variance': 0.0002501093615443615,
  'status': 'ok',
  'duration': 0.12779521942138672},
 {'loss': 0.26047358834244083,
  'loss_variance': 0.000351509303135864,
  'status': 'ok',
  'duration': 0.09504461288452148},
 {'loss': 0.1657559198542805,
  'loss_variance': 0.00025233739942982086,
  'status': 'ok',
  'duration': 0.05449080467224121},
 {'loss': 0.2622950819672131,
  'loss_variance': 0.0003530955692390986,
  'status': 'ok',
  'duration': 0.155301

In [20]:
## trials tested values:
estim.trials.vals

{'mySVC.svc_C': [np.float64(1.0934211895658286),
  np.float64(1.2775781542665854),
  np.float64(0.8193660529240634),
  np.float64(0.9115521691616473),
  np.float64(0.8661828616263555),
  np.float64(1.4160813887147201),
  np.float64(0.7772917282338683),
  np.float64(0.7387686301816984),
  np.float64(1.2240695778932102),
  np.float64(1.0041042395355118)],
 'mySVC.svc_coef0': [np.float64(0.792540555539598),
  np.float64(0.7456018216312651),
  np.float64(0.4118912728864692),
  np.float64(0.9113288342746465),
  np.float64(0.32310053227283486),
  np.float64(0.5693480904234842),
  np.float64(0.8179969542125467),
  np.float64(9.774540855200797e-05),
  np.float64(0.8748165137647407),
  np.float64(0.42943127507683165)],
 'mySVC.svc_decision_function_shape': [np.int64(0),
  np.int64(1),
  np.int64(0),
  np.int64(0),
  np.int64(0),
  np.int64(0),
  np.int64(0),
  np.int64(1),
  np.int64(0),
  np.int64(1)],
 'mySVC.svc_degree': [np.int64(2),
  np.int64(4),
  np.int64(0),
  np.int64(1),
  np.int64(4

As far as one can tell, the categorical hyper-parameters indexes corresponds to the order in which they are cited in the sklearn-doc.

For example, for the SVC kernel if we look at the [sklearn SVC doc](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) we get `{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}`.

So in the trials:

```
mySVC.svc_kernel': [np.int64(1),
  np.int64(1),
  np.int64(0),
  np.int64(0),
  np.int64(2),
...
```

Corresponds to successive trials of  of poly, poly, linear, linear, and then rbf


## adding a preprocessing step

 * preprocessing steps are given in a list (so you can have multiple successive preprocessing steps)
 * use an empty list `[]` f you don't want any pre-processing
 * you can fix the value of any hyper-parameter by giving it as an argument. Any hyperparameter of the original sklearn object is valid

In [25]:
## help of the hpsklearn wrapper
help( standard_scaler )

Help on function standard_scaler in module hpsklearn.components.preprocessing._data:

standard_scaler(name: str, copy: bool = True, with_mean: Union[bool, hyperopt.pyll.base.Apply] = None, with_std: Union[bool, hyperopt.pyll.base.Apply] = None)
    Return a pyll graph with hyperparameters that will construct
    a sklearn.preprocessing.StandardScaler transformer.
    
    Args:
         name: name | str
         copy: perform inplace scaling or on copy | bool
         with_mean: center data before scaling | bool
         with_std: scale data to unit variance | bool



In [26]:
from sklearn.preprocessing import StandardScaler
## help of the sklearn StandardScaler
help( StandardScaler )

Help on class StandardScaler in module sklearn.preprocessing._data:

class StandardScaler(sklearn.base.OneToOneFeatureMixin, sklearn.base.TransformerMixin, sklearn.base.BaseEstimator)
 |  StandardScaler(*, copy=True, with_mean=True, with_std=True)
 |  
 |  Standardize features by removing the mean and scaling to unit variance.
 |  
 |  The standard score of a sample `x` is calculated as:
 |  
 |      z = (x - u) / s
 |  
 |  where `u` is the mean of the training samples or zero if `with_mean=False`,
 |  and `s` is the standard deviation of the training samples or one if
 |  `with_std=False`.
 |  
 |  Centering and scaling happen independently on each feature by computing
 |  the relevant statistics on the samples in the training set. Mean and
 |  standard deviation are then stored to be used on later data using
 |  :meth:`transform`.
 |  
 |  Standardization of a dataset is a common requirement for many
 |  machine learning estimators: they might behave badly if the
 |  individual feat

In [27]:
%%time
estim = HyperoptEstimator(preprocessing=[ standard_scaler('ssc', with_mean=True,with_std=True) ],
                          classifier=svc("mySVC"),
                          max_evals=20, ## increasing the number of trials
                          trial_timeout=120)
    
estim.fit(X_train, y_train)

print(estim.score(X_test, y_test))

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.18trial/s, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.52trial/s, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  6.03trial/s, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 13.27trial/s, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  4.34trial/s, best loss: 0.1657559198542805]
100%|█████████████████████████████████████████████████████████████████



Experience may vary, but fixing the preprocessing sped up the process.

## changing the cross-validation scheme

Remeber, the default is 80% train, 20% validation set.


In [61]:
hpsklearn.__file__

'/home/wandrille/Installed_software/anaconda3/envs/intermediateML/lib/python3.11/site-packages/hpsklearn/__init__.py'

In [28]:
help(HyperoptEstimator.fit)

Help on function fit in module hpsklearn.estimator.estimator:

fit(self, X, y, EX_list: Union[list, tuple] = None, valid_size: float = 0.2, n_folds: int = None, kfolds_group: Union[list, numpy.ndarray] = None, cv_shuffle: bool = False, warm_start: bool = False, random_state: numpy.random._generator.Generator = Generator(PCG64) at 0x7FA8267D4900) -> None
    Search the space of learners and preprocessing steps for a good
    predictive model of y <- X. Store the best model for predictions.
    
    Args:
        X:
            Input variables
    
        y:
            Output variables
    
        EX_list: list, default is None
            List of exogenous datasets. Each must have the same number of
            samples as X.
    
        valid_size: float, default is 0.2
            The portion of the dataset used as the validation set. If
            cv_shuffle is False, always use the last samples as validation.
    
        n_folds: int, default is None
            When n_folds is

In [29]:
%%time
estim = HyperoptEstimator(preprocessing=[ standard_scaler('ssc', with_mean=True,with_std=True) ],
                          classifier=svc("mySVC"),
                          trial_timeout=120)
    
estim.fit(X_train, y_train , n_folds=3, cv_shuffle = True)
    
print(estim.score(X_test, y_test))

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.46trial/s, best loss: 0.22858184469558873]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  3.30trial/s, best loss: 0.15238789646372586]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [02:00<00:00, 120.11s/trial, best loss: 0.15238789646372586]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  2.52trial/s, best loss: 0.15238789646372586]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  2.81trial/s, best loss: 0.15238789646372586]
100%|█████████████████████████████████████████████████████████████████



## changing the score to optimize

The default is accuracy, but we know it is far from ideal, in particular when there is imbalance


> in a regression problem, the default score is $R^2$

What we need is to give to `HyperoptEstimator` a function that takes:

 * true target values
 * predited target values
 
An returns a score that need to be **minimized** 


In [59]:
from sklearn.metrics import balanced_accuracy_score, accuracy_score

# balanced accuracy is nice, 
# we just have to adapt it so that it can be minimized 
# -1 * balanced_accuracy will work:

balanced_accuracy_loss = lambda y_target, y_prediction : -balanced_accuracy_score(y_target, y_prediction)

def balanced_accuracy_loss( y_target, y_prediction) : 
    print( y_target.shape , y_prediction.shape ) ## looking up shapes
    return -balanced_accuracy_score(y_target, y_prediction)

In [60]:
%%time
estim = HyperoptEstimator(preprocessing=[ standard_scaler('ssc', 
                                                          with_mean=True,
                                                          with_std=True) ],
                          classifier=svc("mySVC",
                                         class_weight='balanced'), ## fixing  balanced class weigth scheme
                          loss_fn = balanced_accuracy_loss,## we give our custom loss function
                          trial_timeout=120)
    
estim.fit(X_train, y_train , n_folds=3, cv_shuffle = True)
    
print(estim.score(X_test, y_test))

(2743,)                                                                                                                                                                                  
(2743,)                                                                                                                                                                                  
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.53trial/s, best loss: -0.6719977362761743]
(2743,)                                                                                                                                                                                  
(2743,)                                                                                                                                                                                  
100%|█████████████████████████████████████████████████████████████████



In [32]:
print("balanced accuracy on test" , balanced_accuracy_score( y_test , estim.predict(X_test) ) )
print("         accuracy on test" , accuracy_score( y_test , estim.predict(X_test) ) )

balanced accuracy on test 0.6628717644441149
         accuracy on test 0.653551912568306




Note that `estim.score()` still gives you the accuracy

If instead of the balanced accuracy, we go for a score that needs the predicted probabilities (or score),
in theory we need to:

 1. set `continuous_loss_fn = True` in `HyperoptEstimator` -> predictions will be made with `.predict_proba()` instead of `.predict()`
 2. that the tested classifier has a `.predict_proba()` method
 
BUT, it practice it gets more complex

In [47]:
from sklearn.metrics import roc_auc_score
def roc_auc_loss(y_target, y_prediction ):
    return -roc_auc_score(y_target, y_prediction)

In [48]:
%%time
estim = HyperoptEstimator(preprocessing=[ standard_scaler('ssc', with_mean=True,with_std=True) ],
                          classifier=svc("mySVC",probability=True),
                          loss_fn = roc_auc_loss,
                          continuous_loss_fn = True,
                          trial_timeout=10)
    
estim.fit(X_train, y_train , n_folds=3, cv_shuffle = True)
    
print(estim.score(X_test, y_test))

  0%|                                                                                                                                              | 0/1 [00:00<?, ?trial/s, best loss=?]

job exception: Found input variables with inconsistent numbers of samples: [2743, 5486]



  0%|                                                                                                                                              | 0/1 [00:01<?, ?trial/s, best loss=?]


ValueError: Found input variables with inconsistent numbers of samples: [2743, 5486]

We get a error: `job exception: Found input variables with inconsistent numbers of samples: [2743, 5486]`

It stems from out loss function; let's investigate further:

In [51]:
## incompatible dimensions: [2743, 5486]
y_train.shape

(2743,)

In [52]:
2743*2

5486

**Question:** What do you think happened?

---
<br><br><br><br><br><br><br><br><br><br><br>

scroll below for answer:

<br><br><br><br><br><br><br><br><br><br><br>
---

In [56]:
## have a look at the predict_proba() output from the sklearn_object:
from sklearn.svm import SVC

sklearn_svc = SVC(probability=True)
sklearn_svc.fit(X_train,y_train)
sklearn_svc.predict_proba(X_train)

array([[0.85130893, 0.14869107],
       [0.85236616, 0.14763384],
       [0.85213178, 0.14786822],
       ...,
       [0.85138043, 0.14861957],
       [0.85226136, 0.14773864],
       [0.8526272 , 0.1473728 ]])

`predict_proba()` returns one column per category, and our loss function got twice the expected amount of values : `hpsklearn` has flattened the whole output before sending it to the function.

Ideally, we would have the second columns (probability of being of the positive category).

The flattening worked in a way where these elements correspond to one every two elements now.

We can check this:

In [62]:
def roc_auc_loss(y_target, y_prediction ):
    print( y_target.shape , y_prediction.shape ) ## looking up shapes
    print( y_prediction[:10] ) ## looking up what's in the predictions
    p0 = y_prediction[::2] ## half of the elements
    p1 = y_prediction[1::2] ## other half of the elements
    print( (p0+p1)[:10] )  ## what do they sum to?
    
    return -roc_auc_score(y_target, y_prediction)

estim = HyperoptEstimator(preprocessing=[ standard_scaler('ssc', with_mean=True,with_std=True) ],
                          classifier=svc("mySVC",probability=True),
                          loss_fn = roc_auc_loss,
                          continuous_loss_fn = True,
                          trial_timeout=10)
    
estim.fit(X_train, y_train , n_folds=3, cv_shuffle = True)

(2743,)                                                                                                                                                                                  
(5486,)                                                                                                                                                                                  
[0.84783455 0.15216545 0.84783541 0.15216459 0.84783465 0.15216535                                                                                                                       
 0.84785148 0.15214852 0.84783282 0.15216718]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]                                                                                                                                                          
  0%|                                                                                                                                              | 0/1 [00:00<?, ?trial/s, best loss=?]

job exception: Found input variables with inconsistent numbers of samples: [2743, 5486]



  0%|                                                                                                                                              | 0/1 [00:00<?, ?trial/s, best loss=?]


ValueError: Found input variables with inconsistent numbers of samples: [2743, 5486]

the probabilities of being of category 0 and category 1 sum to 1.0 --> we have the correct formula

In [63]:
def roc_auc_loss_fixed( y_target, y_prediction ):
    p1 = y_prediction[1::2] ## half of the elements corresponding to proba of being category 1    
    return -roc_auc_score(y_target, p1)


In [64]:
%%time
estim = HyperoptEstimator(preprocessing=[ standard_scaler('ssc', with_mean=True,with_std=True) ],
                          classifier=svc("mySVC",probability=True),
                          loss_fn = roc_auc_loss_fixed,
                          continuous_loss_fn = True,
                          trial_timeout=10)
    
estim.fit(X_train, y_train , n_folds=3, cv_shuffle = True)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.09s/trial, best loss: -0.5224417348356227]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.83s/trial, best loss: -0.611014045377373]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.10s/trial, best loss: -0.611014045377373]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  1.46trial/s, best loss: -0.611014045377373]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  1.04s/trial, best loss: -0.611014045377373]
100%|█████████████████████████████████████████████████████████████████


And now it works. The issue has been reported a number of time in the library github page.

Perhaps by the time you go through this notebook this has been solved, but in the meantime you can use our quick fix.

## warm start : continue searching for better solutions

In [65]:
estim.fit(X_train, y_train , n_folds=3, cv_shuffle = True , warm_start=True)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00,  1.01trial/s, best loss: -0.611014045377373]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:01<00:00,  1.32s/trial, best loss: -0.6119493749035345]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00,  2.01trial/s, best loss: -0.6119493749035345]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:01<00:00,  1.35s/trial, best loss: -0.6119493749035345]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:01<00:00,  1.03s/trial, best loss: -0.6119493749035345]
100%|█████████████████████████████████████████████████████████████████

## specifying your own research space 

The default search space are OK, but you have the possibility of changing which hyper-parameter you tune, and how you explore them, maybe to point the algorithm in a direction you know is more likely to yield good results.

The way it works is that for each hyper-parameter you define a **prior distribution** (it is bayesian afterall) using one of hyperopt function.

For a more detailed documentation we refer you to the [hyperopt documentation](http://hyperopt.github.io/hyperopt/getting-started/search_spaces/)

In [70]:
from hyperopt import hp
import numpy as np

## kernel chosen between rbf, poly and linear, with probability of 0.5,0.25 and 0.25 resp.
svc_kernel = hp.pchoice("kernel", [(0.50, "rbf"), 
                                   (0.25, "poly"), 
                                   (0.25, "linear")])

## choosing C uniformly in the log space between 10**-5 and 10**1
svc_C = hp.loguniform("C", low=np.log(1e-5), high=np.log(10))

## choosing coef0 using a normal distribution with mean 0 and std dev 1
svc_coef0 = hp.normal("coef0", mu=0 , sigma=1)


In [74]:
%%time
estim = HyperoptEstimator(preprocessing=[ standard_scaler('ssc', 
                                                          with_mean=True,
                                                          with_std=True) ],
                          classifier=svc("mySVC",
                                         probability=True,
                                         kernel =svc_kernel,
                                         C =svc_C,
                                         coef0 =svc_coef0),
                          loss_fn = roc_auc_loss_fixed,
                          continuous_loss_fn = True,
                          trial_timeout=10)

estim.fit(X_train, y_train , n_folds=3, cv_shuffle = True)
    
print(estim.score(X_test, y_test))

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.60trial/s, best loss: -0.5930534547512476]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  1.26trial/s, best loss: -0.5930534547512476]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  2.06trial/s, best loss: -0.6445938159180943]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  2.26trial/s, best loss: -0.6445938159180943]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  1.06trial/s, best loss: -0.6445938159180943]
100%|█████████████████████████████████████████████████████████████████



It is possible nest `hp.choice()` or `hp.pchoice()` to specify more complex search spaces:

In [82]:
from hpsklearn import decision_tree_classifier 


space = hp.choice('classifier', [  decision_tree_classifier('myTree') , 
                                   svc('mySVC' , 
                                       kernel =svc_kernel,
                                       C =svc_C,
                                       coef0 =svc_coef0 ) ] )

estim = HyperoptEstimator(preprocessing=[ standard_scaler('ssc', 
                                                          with_mean=True,
                                                          with_std=True) ],
                          classifier=space,
                          trial_timeout=10)

estim.fit(X_train, y_train )


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.47trial/s, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 12.68trial/s, best loss: 0.1657559198542805]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 31.65trial/s, best loss: 0.1657559198542805]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 35.09trial/s, best loss: 0.16393442622950816]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 22.43trial/s, best loss: 0.16393442622950816]
100%|█████████████████████████████████████████████████████████████████

## exercise 

setup tuning with HyperoptEstimator for an XGBoost model

 * which hyper-parameters are tuned?
 * 

In [103]:
from hpsklearn import xgboost_classification

In [104]:
%%time
estim = HyperoptEstimator(preprocessing=[],
                          classifier=xgboost_classification('xgb'),
                          loss_fn = roc_auc_loss_fixed,
                          continuous_loss_fn = True,
                          trial_timeout=120)

estim.fit(X_train, y_train , n_folds=5, cv_shuffle = True)
    
print(estim.score(X_test, y_test))

  0%|                                                                                                                                              | 0/1 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.02s/trial, best loss: -0.7093327159541081]
 50%|███████████████████████████████████████████████████████████████████████                                                                       | 1/2 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.98s/trial, best loss: -0.7106672840458919]
 67%|██████████████████████████████████████████████████████████████████████████████████████████████▋                                               | 2/3 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  2.12s/trial, best loss: -0.7106672840458919]
 75%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                   | 3/4 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  3.83s/trial, best loss: -0.7106672840458919]
 80%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                            | 4/5 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:13<00:00, 13.22s/trial, best loss: -0.7106672840458919]
 83%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                       | 5/6 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:04<00:00,  4.98s/trial, best loss: -0.7106672840458919]
 86%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                    | 6/7 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:08<00:00,  8.74s/trial, best loss: -0.7106672840458919]
 88%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                 | 7/8 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00,  2.03s/trial, best loss: -0.7106672840458919]
 89%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏               | 8/9 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00,  1.59s/trial, best loss: -0.7106672840458919]
 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉              | 9/10 [00:00<?, ?trial/s, best loss=?]

Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.


Parameters: { "use_label_encoder" } are not used.




100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  2.32s/trial, best loss: -0.7106672840458919]


Parameters: { "use_label_encoder" } are not used.



0.8491803278688524
CPU times: user 715 ms, sys: 78.1 ms, total: 793 ms
Wall time: 44.4 s


In [None]:
model = estim.best_model()['learner']
roc_auc_score( y_test , model.predict_proba(X_test)[:,1] ) 

In [None]:
estim.best_model()