## Search Algorithms

So far we have been using Ray's basic/default Variant Generator in order to perform either:

 - Grid Search
 - Random Search
 
However, Raytune has an expanding suite of options available for different [Search Algortihms](https://docs.ray.io/en/latest/tune-searchalg.html):

 - BayesOpt
 - HyperOpt
 - SigOpt
 - Nevergrad
 - Scikit-Optimize
 - Ax
 - BOHB
 
These bolt in functionality from other well known libraries and make them avilable to us in Raytune. 

Depending on the library that we use there are some customisations needed to use it but these are not in our training code, but centered around how we define the `search space` and call `ray.run`.

### Switch to Bayesian Optimisation

Ray's BayesOpt uses the [bayesian-optimisation](https://github.com/fmfn/BayesianOptimization) package which is already installed in our conda environment





In [None]:
%load_ext autoreload
%autoreload 2

from dependencies import *
from seg_setup_code import *

Rather than loading bayesopt directly we pull in the tune search algorithm tha wraps it

In [None]:
from ray.tune.suggest.bayesopt import BayesOptSearch

Next we have copied across the same training function from earlier notebooks.

In [None]:
from sklearn.model_selection import LeavePGroupsOut
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from scipy.signal import medfilt
from filelock import FileLock

def e2e_train_and_test(config, **kwargs):
    
    # threadsafe
#     with FileLock("./data.lock"):
    X, y, groups, X_test, y_test, group_test, well_names = setup(kwargs['filepath'])
    
    # chose your CV strategy
    splitter = LeavePGroupsOut(1)
    
    # run k fold training and validation
    f1_scores = [] # keep hold of all individual scores
    for train_ind, val_ind in splitter.split(X, y, groups=groups):
        pipeline = make_pipeline(RobustScaler(),
                                  XGBClassifier())

        pipeline.set_params(**config)
        pipeline.fit(X[train_ind], y[train_ind])
        
        y_pred = pipeline.predict(X[val_ind])
        
        f1_scores.append(f1_score(y_pred, y[val_ind], average='micro'))
    
        # Clean isolated facies for each well
        y_pred = medfilt(y_pred, kernel_size=5)
    
    # use tunes reporter
    tune.track.log(mean_f1_score=np.array(f1_scores).mean(),
                std_f1_score=np.array(f1_scores).std(),
                # and we can actually add any metrics we like
                done=True)

#### Defining the Search Space

In previous examples we have creates the search space for our tuning job using Raytune's distribution functions.

In [None]:
ray_tuning_config = {
    'xgbclassifier__learning_rate': tune.loguniform(0.001, 0.5),
    'xgbclassifier__max_depth': tune.randint(1, 10),
    'xgbclassifier__min_child_weight': tune.loguniform(0.1, 10),
    'xgbclassifier__n_estimators': tune.randint(5,200),
    'xgbclassifier__colsample_bytree': tune.choice([0.4, 0.6, 0.8, 1.0]),
    'xgbclassifier__lambda': tune.choice([0,1]),
    'xgbclassifier__seed': 42
}

But when using BayesOpt we need to change this and specify:
 - the bounds of the parameter space `pbounds`
 - the form / parameters of a UtilityFunction
 - a modified config object
 
To start with we convert our tuning config to align with pbounds, which uses simple tuples and only suppports *continuous* spaces

In [None]:
pbounds = {
    "xgbclassifier__learning_rate": (0.001, 0.5),
    "xgbclassifier__max_depth": (1, 10),                # Needs to be discrete!
    "xgbclassifier__min_child_weight": (0.1, 100),
    "xgbclassifier__n_estimators": (5, 200),            # Needs to be discrete!
    "xgbclassifier__colsample_bytree": (0.4, 1.0),
    "xgbclassifier__lambda": (0, 1)                     # Needs to be true/false 1/0
}

def parse_config(config):
    config["xgbclassifier__max_depth"] = int(round(config["xgbclassifier__max_depth"]))
    config["xgbclassifier__n_estimators"] = int(round(config["xgbclassifier__n_estimators"]))
    config["xgbclassifier__lambda"] = int(round(config["xgbclassifier__lambda"]))
    return config

In [None]:
utility_fn_kwargs={
    "kind": "ucb",
    "kappa": 2.5,
    "xi": 0.0
}

In [None]:
search_algo = BayesOptSearch(
                pbounds,
                metric="mean_f1_score",
                mode="max",
                utility_kwargs=utility_fn_kwargs)

In [None]:
config = {
    # controls the number of trials
    "num_samples": 100,
    "config": {
        "xgbclassifier__seed": 42,
    },
    "stop": {
        "timesteps_total": 100
    }
}

In [None]:
ray.shutdown()
ray.init(num_cpus=6, num_gpus=0, include_webui=True)

In [None]:
from os import path
filepath = path.abspath('../datasets/seg_2016_facies/la_team_5_data.h5py')

def e2e_seg(config):
    return e2e_train_and_test(config, filepath=filepath)

def e2e_seg_w_discrete(config):  
    return e2e_seg(
        parse_config(config)
    )

analysis = tune.run(e2e_seg_w_discrete,
                    name="seg_facies_bayes",
                    search_alg=search_algo,
                    **config)

In [None]:
from pprint import pprint
print("Best config: ")
pprint(analysis.get_best_config(metric="mean_f1_score"))

In [None]:
df = analysis.dataframe()
top_n_df = df.nlargest(10, "mean_f1_score")

In [None]:
top_n_df.head()

In [None]:
plot_some_tune_results(top_n_df, (0.5, 1.0))

In [None]:
%load_ext tensorboard
from tensorboard import notebook 
%tensorboard --logdir "~/ray_results/seg_facies"
notebook.display(height=1000)

In [None]:
ray.shutdown()