# Bayes Search CV Example

The `BayesSearchCV` class is used to search for the set of hyperparameters that produce the best decision engine performance for a given Iguanas Pipeline, whilst also reducing the likelihood of overfitting.

The process is as follows:

* Generate k-fold stratified cross validation datasets. 
* For each of the training and validation datasets:
    * Fit the pipeline on the training set using a set of parameters chosen by the Bayesian Optimiser from a given set of ranges.
    * Apply the pipeline to the validation set to return a prediction.
    * Use the provided `scorer` to calculate the score of the prediction.
* Return the parameter set which generated the highest mean overall score across the validation datasets.

In this example, we'll consider the following workflow:

<center><img src="images/workflow_example.png"/></center>

We'll use the `BayesSearchCV` class to optimise the hyperparameters of the steps in this workflow, **ensuring that we acquire the maximum F1 score for our decision engine.**

---

## Import packages

In [1]:
from iguanas.rule_generation import RuleGeneratorDT
from iguanas.rule_selection import SimpleFilter, CorrelatedFilter, BayesSearchCV
from iguanas.metrics import FScore, JaccardSimilarity
from iguanas.rbs import RBSOptimiser, RBSPipeline
from iguanas.correlation_reduction import AgglomerativeClusteringReducer
from iguanas.pipeline import LinearPipeline
from iguanas.pipeline.class_accessor import ClassAccessor
from iguanas.space import UniformFloat, UniformInteger, Choice

import pandas as pd
from sklearn.model_selection import train_test_split
from category_encoders.one_hot import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier

## Read in data

Let's read in the famous Titanic data set and split it into training and test sets:

In [2]:
df = pd.read_csv('../../../examples/dummy_data/titanic.csv', index_col='PassengerId')
target_col = 'Survived'
cols_to_drop = ['Name', 'Ticket', 'Cabin']
X = df.drop([target_col] + cols_to_drop, axis=1)
y = df[target_col]

In [3]:
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.33,
    random_state=42
)

## Data processing

Let's apply the following simple steps to process the data:

* One hot encode categorical variables (accounting for nulls)
* Impute numeric features with -1

In [4]:
# OHE
encoder = OneHotEncoder(
    use_cat_names=True
)
X_train = encoder.fit_transform(X_train)
X_test = encoder.transform(X_test)

# Impute
X_train.fillna(-1, inplace=True)
X_test.fillna(-1, inplace=True)

  elif pd.api.types.is_categorical(cols):


----

## Set up pipeline

Before we can apply the `BayesSearchCV` class, we need to set up our pipeline. Let's first instantiate the classes we'll be using the our pipeline:

In [5]:
# Optimisation metric
f1 = FScore(beta=1)

# Rule generation
generator = RuleGeneratorDT(
    metric=f1.fit,
    n_total_conditions=4,
    tree_ensemble=RandomForestClassifier(
        n_estimators=10,
        random_state=0
    )
)

# Rule filter (performance-based)
simple_filterer = SimpleFilter(
    threshold=0.1, 
    operator='>=', 
    metric=f1.fit
)

# Rule filter (correlation-based)
js = JaccardSimilarity()
corr_filterer = CorrelatedFilter(
    correlation_reduction_class=AgglomerativeClusteringReducer(
        threshold=0.9, 
        strategy='top_down', 
        similarity_function=js.fit, 
        metric=f1.fit
    )
)

# Decision engine (to be optimised)
rbs_pipeline = RBSPipeline(
    config=[],
    final_decision=0
)

# Decision engine optimiser
rbs_optimiser = RBSOptimiser(
    pipeline=rbs_pipeline,
    metric=f1.fit, 
    pos_pred_rules=ClassAccessor(
        class_tag='corr_filterer', 
        class_attribute='rules_to_keep'
    ),
    rules=ClassAccessor(
        class_tag='generator',
        class_attribute='rules'
    ),
    n_iter=10
)

**Note:** The arguments passed to the `pos_pred_rules` and `rules` parameters in the `RBSOptimiser` class are `ClassAccessor` objects. This object extracts the specified attribute from the given class in the pipeline. This allows users to pass attributes from earlier steps in the pipeline as parameters of later steps in the pipeline.

In this example, the names of the rules that are present after the `corr_filterer` step are passed to the `pos_pred_rules` parameter of the `RBSOptimiser` class - this is so the `RBSOptimiser` knows which rules predict positive cases (which, one might argue, doesn't need to be specified in this example, as we only have one type of rule set. However, when you have a set of rules - some that predict positive cases and some that predict negative cases - you must specify which rules predict what case, using the `pos_pred_rules` and `neg_pred_rules` parameters).

Also, the `rules` attribute created in the `generator` step are passed to the `rules` parameter of the `RBSOptimiser` class. This is so the rules remaining after the decision engine optimisation can be easily extracted from the trained pipeline.

Now we can create the steps of our pipeline. Each step should be a tuple of two elements:

1. The first element should be a string which refers to the step.
2. The second element should be the instantiated class which is run as part of the pipeline.

In [6]:
steps = [
    ('generator', generator),
    ('simple_filterer', simple_filterer),
    ('corr_filterer', corr_filterer),
    ('rbs_optimiser', rbs_optimiser)
]

Finally, we can instantiate our pipeline:

In [7]:
lp = LinearPipeline(steps=steps)

## Define the search space

Now we need to define the search space for each of the relevant parameters of our pipeline. To do this, we create a dictionary, where each key corresponds to the tag used for the relevant pipeline step. Each value should be a dictionary of the parameters (keys) and their search spaces (values). Search spaces should be defined using the classes in the `iguanas.space` module:

In [8]:
search_spaces = {
    'generator': {
        'n_total_conditions': UniformInteger(1, 5),
        'target_feat_corr_types': Choice([
            'Infer',
            None
        ])
    },
    'simple_filterer': {
        'threshold': UniformFloat(0, 1),
    },
    'corr_filterer': {
        'correlation_reduction_class': Choice([
            AgglomerativeClusteringReducer(
                threshold=0.5, 
                strategy='top_down', 
                similarity_function=js.fit, 
                metric=f1.fit
            ),
            AgglomerativeClusteringReducer(
                threshold=0.9, 
                strategy='top_down', 
                similarity_function=js.fit, 
                metric=f1.fit
            )
        ])
    },    
}

Based on the search spaces above, we'll be optimising the following parameters across the following ranges:

* **generator**
    * `n_total_conditions`: Integers from 1 to 5
    * `target_feat_corr_types`: Either 'Infer' or None.
* **simple_filterer**
    * `threshold`: Floats from 0 to 1
* **corr_filterer**
    * `correlation_reduction_class`: `AgglomerativeClusteringReducer` classes with either a `threshold` of 0.5 or 0.9.

## Optimise the pipeline hyperparameters

Now that we have our pipeline and search spaces defined, we can instantiate the `BayesSearchCV` class. We'll split our data into 3 cross-validation datasets and try 20 different parameter sets:

In [9]:
bs = BayesSearchCV(
    pipeline=lp, 
    search_spaces=search_spaces, 
    metric=f1.fit, 
    cv=3, 
    n_iter=20,
    num_cores=3,
    error_score=0,
    verbose=1    
)

Finally, we can run the `fit` method to optimise the hyperparameters of the pipeline:

In [10]:
bs.fit(X_train, y_train)

--- Optimising pipeline parameters ---
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:23<00:00,  1.16s/trial, best loss: -0.6965755602560381]
--- Refitting on entire dataset with best pipeline ---


### Outputs

The `fit` method doesn't return anything. See the `Attributes` section in the class docstring for a description of each attribute generated:

In [11]:
bs.best_score

0.6965755602560381

In [12]:
bs.best_params

{'corr_filterer': {'correlation_reduction_class': AgglomerativeClusteringReducer(threshold=0.5, strategy=top_down, similarity_function=<bound method JaccardSimilarity.fit of JaccardSimilarity>, metric=<bound method FScore.fit of FScore with beta=1>, print_clustermap=False)},
 'generator': {'n_total_conditions': 2.0, 'target_feat_corr_types': 'Infer'},
 'simple_filterer': {'threshold': 0.4860473230215504}}

In [13]:
bs.best_index

0

In [14]:
bs.cv_results.head()

Unnamed: 0,Params,corr_filterer__correlation_reduction_class,generator__n_total_conditions,generator__target_feat_corr_types,simple_filterer__threshold,FoldIdx,Scores,MeanScore,StdDevScore
0,{'corr_filterer': {'correlation_reduction_clas...,"AgglomerativeClusteringReducer(threshold=0.5, ...",2.0,Infer,0.486047,"[0, 1, 2]","[0.7092198581560283, 0.6842105263157895, 0.696...",0.696576,0.010212
14,{'corr_filterer': {'correlation_reduction_clas...,"AgglomerativeClusteringReducer(threshold=0.5, ...",5.0,Infer,0.644417,"[0, 1, 2]","[0.7092198581560283, 0.6842105263157895, 0.696...",0.696576,0.010212
19,{'corr_filterer': {'correlation_reduction_clas...,"AgglomerativeClusteringReducer(threshold=0.5, ...",2.0,,0.497632,"[0, 1, 2]","[0.7092198581560283, 0.6615384615384615, 0.691...",0.687496,0.019695
13,{'corr_filterer': {'correlation_reduction_clas...,"AgglomerativeClusteringReducer(threshold=0.9, ...",2.0,Infer,0.388521,"[0, 1, 2]","[0.6783625730994152, 0.6153846153846154, 0.682...",0.658794,0.030745
18,{'corr_filterer': {'correlation_reduction_clas...,"AgglomerativeClusteringReducer(threshold=0.5, ...",4.0,,0.30485,"[0, 1, 2]","[0.6511627906976745, 0.5925925925925927, 0.694...",0.646224,0.041919


To see the final optimised decision engine configuration and rule set, we first return the parameters of the trained pipeline (stored in the attribute `pipeline_`):

In [15]:
pipeline_params = bs.pipeline_.get_params()

Then, to see the final optimised decision engine configuration, we filter to the `config` parameter of the `rbs_optimiser` step:

In [16]:
final_config = pipeline_params['rbs_optimiser']['config']
final_config

[(1, ['RGDT_Rule_20220301_18'])]

This shows us which rules should be used for the rejection step (decision `1`).

To see the logic of our final set of rules, we filter to the `rules` parameter of the `rbs_optimiser` step:

In [17]:
final_rules = bs.pipeline_.get_params()['rbs_optimiser']['rules']

Then extract the `rule_strings` attribute:

In [18]:
final_rules.rule_strings

{'RGDT_Rule_20220301_18': "(X['Sex_male']==False)"}

## Apply the optimised pipeline

We can apply our optimised pipeline to a new data set and make a prediction using the `predict` method:

In [19]:
y_pred_test = bs.predict(X_test)

### Outputs

The `predict` method returns the prediction generated by class in the final step of the pipeline - in this case, the `RBSOptimiser`:

In [20]:
y_pred_test

PassengerId
710    0
440    0
841    0
721    1
40     1
      ..
716    0
526    0
382    1
141    1
174    0
Length: 295, dtype: int64

We can now calculate the F1 score of our optimised pipeline using the test data:

In [21]:
f1_opt = f1.fit(y_pred_test, y_test)

Comparing this to our original, unoptimised pipeline:

In [22]:
lp.fit(X_train, y_train, None)
y_pred_test_init = lp.predict(X_test)

In [23]:
f1_init = f1.fit(y_pred_test_init, y_test)

In [24]:
print(f'F1 score of original, unoptimised pipeline: {round(f1_init, 2)}')
print(f'F1 score of optimised pipeline: {round(f1_opt, 2)}')
print(f'Percentage improvement in F1 score: {round(100*(f1_opt-f1_init)/f1_init, 2)}%')

F1 score of original, unoptimised pipeline: 0.57
F1 score of optimised pipeline: 0.74
Percentage improvement in F1 score: 30.39%


---