# Grid search

**Runs k-fold cross-validation on a grid of provided input parameters (all combinations) and returns the combiation that results in the best performance, selected by a given scoring function**

In [1]:
import numpy as np
import math

import log_metrics
from logistic_regressor import LogisticRegressor

from grid_search import grid_search

**Generate dataset**

In [2]:
size =100

coefficients = [0,-1.4,2.1,-3,10.4,-8]
X = np.ones((size,len(coefficients)))
for i in range(1,len(coefficients)):
    X[:,i]=np.random.rand(size)
y = np.vectorize(lambda x: round(1/(1+math.exp(-x))))
y = y((X*coefficients).sum(axis=1) + np.random.normal(size=size))

**Estimator**
<br>
No need to fit

In [3]:
log = LogisticRegressor()

### Grid search

**Dictionary of input parameters to test**
<br>
keys are the parameters names (must match the estimator's input) and the value is a list of all parameters to try

In [4]:
args = {'method':['batch'],'learning_rate':[0.2,0.1,0.01],'epochs':[100,200,500]}

**Perform fitting, using all possible combinations:**
<br>
In this example, it is 1x3x2 combinations, each running 3 times (for the CV)
<br>
Note: Default scoring function is the estimator's default (total accuracy for Logistic Regressor)

In [5]:
grid_search(log,X,y,args,cv_k=3)

{'method': 'batch', 'learning_rate': 0.2, 'epochs': 500}

A single value can be given as a 1-element list (as above) or simply as a value:

In [6]:
args = {'method':'batch','learning_rate':[0.2,0.1,0.01],'epochs':[100,200,500]}

grid_search(log,X,y,args,cv_k=3)

{'method': 'batch', 'learning_rate': 0.2, 'epochs': 500}

### Scoring functions

**Any scoring function that processes real_y and predictions will work**

In [7]:
def rate_of_false_positives(real_y,predictions):
    return (real_y[real_y!=predictions]==0).sum()/len(real_y)

`grid_search(log,X,y,args,cv_k=3,scoring=rate_of_false_positives)` will give a wrong result, because the rate of false positives should be minimized (accuracy, which was the default should be maximized

**User can specify how to evaluate the output of the scoring function**

In [8]:
grid_search(log,X,y,args,cv_k=3,scoring=rate_of_false_positives,maximize=False)

{'method': 'batch', 'learning_rate': 0.2, 'epochs': 500}