# Scikit-learn

## Grid Search

Learning how to implement grid search from [Source code: scikit-learn's model selection](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/model_selection/_search.py).

In [1]:
# Grid search
from itertools import product
from collections import Mapping
from sklearn.model_selection import ParameterGrid

params_grid = {'a': [1, 2], 'b': [True, False]}

# ensures that it also supports list of dictionary,
# Mapping ensures a object has keys, values, items, etc. methods
# which matches a dictionary
# https://docs.python.org/3/library/collections.abc.html
if isinstance(params_grid, Mapping):
    params_grid = [params_grid]
    
for p in params_grid:
    # for reproducibility, always sort the keys of a dictionary
    # this will become a list of paired tuples
    items = sorted(p.items())
    print('sorted parameters, values: ', items)
    print()
    
    # unpack the list of tuples into two lists tuples, so what's originally 
    # a list of items [('a', [1, 2]), ('b', [True, False])], becomes
    # two lists ('a', 'b'), ([1, 2], [True, False]), with all the keys being the parameter
    # and the value being the list of possible values that the parameter can take
    # http://stackoverflow.com/questions/7558908/unpacking-a-list-tuple-of-pairs-into-two-lists-tuples
    key, value = zip(*items)
    print('parameters: ', key)
    print('values', value)
    print()
    
    # unpack the list of values to compute the cartesian product
    # [(1, True), (1, False), (2, True), (2, False)], and zip it
    # back to the original key
    print('grid search parameters')
    cartesian = product(*value)
    for v in cartesian:
        params = dict(zip(key, v))
        print(params)

sorted parameters, values:  [('a', [1, 2]), ('b', [True, False])]

parameters:  ('a', 'b')
values ([1, 2], [True, False])

grid search parameters
{'b': True, 'a': 1}
{'b': False, 'a': 1}
{'b': True, 'a': 2}
{'b': False, 'a': 2}


In [2]:
# confirm with scikit-learn's output
list( ParameterGrid(params_grid) )

[{'a': 1, 'b': True},
 {'a': 1, 'b': False},
 {'a': 2, 'b': True},
 {'a': 2, 'b': False}]

In [3]:
# making our function
def _get_params_grid(params_grid):
    """
    create cartesian product of parameters (grid search),
    this will be a generator that will allow looping through
    all possible parameter combination, note if we want to
    expand this to cross validation we'll have to turn it to a list
    """
    # for reproducibility, always sort the keys of a dictionary
    items = sorted(params_grid.items())
    
    # unpack parameter and the range of values
    # into separate list; then unpack the range 
    # of values to compute the cartesian product
    # and zip it back to the original key
    key, value = zip(*items)
    cartesian = product(*value)
    for v in cartesian:
        params = dict(zip(key, v))
        yield params

params_grid = {'a': [1, 2], 'b': [True, False]}
params = _get_params_grid(params_grid)
for p in params:
    print(p)

{'b': True, 'a': 1}
{'b': False, 'a': 1}
{'b': True, 'a': 2}
{'b': False, 'a': 2}


- https://zacharyst.com/2016/03/31/parallelize-a-multifunction-argument-in-python/
- https://pythonhosted.org/joblib/parallel.html

In [4]:
from math import sqrt
from joblib import Parallel, delayed
Parallel(n_jobs = 2, verbose = 1)( delayed(sqrt)(i ** 2) for i in range(10) )

[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished


[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

## Reference

- [Source code: scikit-learn's model selection](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/model_selection/_search.py)