# Model Evaluation
The terms model, class, estimator, and predictor have been used interchangeably, but never introduced formally. A model is any formula or algorithm designed to represent the mechanics of your data. SciKit-Learn implements many machine learning models as Python language classes, so they can be instantiated and used as objects. To keep the API clean, most of these classes follow a similar paradigm and interface, as a result of inheriting from a single estimator base-class. This is why SciKit-Learn's documentation referrers to all of the machine learning methods in their library as estimators. This is why SciKit-Learn's documentation referrers to all of the machine learning methods in their library as estimators. Lastly, when dealing with data that come with attributes you want to learn to predict, such as supervised learning problems, the estimators designed to handle this are called predictors.

Typically, algorithm choice is dictated by a balance of factors

- The dimensionality of your data
- The geometric nature of your data
- The types of features used to represent your data
- The number of training samples you have at your disposal
- The required training and prediction speeds needed for your purposes
- The predictive accuracy level desired
- How configurable you need your model to be
- And much more

In the wild, the best process to use depending on how many samples you have at your disposal and the machine learning algorithms you are using, is either of the following:

1. Split your data into training, validation, and testing sets.
2. Setup a pipeline, and fit it with your training set
3. Access the accuracy of its output using your validation set
4. Fine tune this accuracy by adjusting the hyper-paramters of your pipeline
5. when you're comfortable with its accuracy, finally evaluate your pipeline with the testing set

OR

1. Split your data into training and testing sets.
2. Setup a pipeline with CV and fit / score it with your training set
3. Fine tune this accuracy by adjusting the hyper-paramters of your pipeline
4. When you're comfortable with its accuracy, finally evaluate your pipeline with the testing set

## Multiprocessing

### Simple multiprocessing example
```python
import multiprocessing

def worker():
    """worker function"""
    print 'Worker'
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()
```
### Benchmark several models at the same time
```python
knn_job = multiprocessing.Process(target=benchmark, args=(knn, X_train, X_test, y_train, y_test, 'KNeighbors',))
svc_job = multiprocessing.Process(target=benchmark, args=(svc, X_train, X_test, y_train, y_test, 'SVC',))
tr_job = multiprocessing.Process(target=benchmark, args=(tr, X_train, X_test, y_train, y_test, 'Tree',))
knn_job.start()
svc_job.start()
tr_job.start()
```

## Accuracy
```python 
# Returns an array of predictions:
>>> predictions = my_model.predict(data_test) 
>>> predictions
[0, 0, 0, 1, 0]

# The actual answers:
>>> label_test
[1, 1, 0, 0, 0]

>>> accuracy_score(label_test, predictions)
0.4000000000000000

>>> accuracy_score(label_test, predictions, normalize=False)
2

>>> my_model.score(data_test, label_test)
0.4000000000000000
```

## Recall
To calculate the recall score, or the ratio of true_positives / (true_positives + false_negatives):
```python
>>> metrics.recall_score(y_true, y_pred, average='weighted')
0.5

>>> metrics.recall_score(y_true, y_pred, average=None)
array([ 1. ,  0. ,  0.5])
```

## Precision
You can also calculated the precision score. It is defined very similarly: true_positives / (true_positives + false_positives). The only difference is the very last term in the equation:
```python
>>> metrics.precision_score(y_true, y_pred, average='weighted')
0.38888888888888884

>>> metrics.precision_score(y_true, y_pred, average=None)
array([ 0.66666667,  0.        ,  0.5       ])
```

## F1 Score
The F1 Score is a weighted average of the precision and recall. Defined as 2 * (precision * recall) / (precision + recall), the best possible result is 1 and the worst possible score is 0:

```python
>>> metrics.f1_score(y_true, y_pred, average='weighted')
0.43333333333333335

>>> metrics.f1_score(y_true, y_pred, average=None)
array([ 0.8,  0. ,  0.5])
```

## Full Report
```python
>>> target_names = ['Fruit 1', 'Fruit 2', 'Fruit 3']
>>> metrics.classification_report(y_true, y_pred, target_names=target_names)
```

## Confusion Matrix
```python
>>> import sklearn.metrics as metrics
>>> y_true = [1, 1, 2, 2, 3, 3]  # Actual, observed testing dataset values
>>> y_pred = [1, 1, 1, 3, 2, 3]  # Predicted values from your model
```
An important thing to realize is that all of the non-diagonal elements of the matrix correspond to misclassified targets. Given all this information, you're able to derive probabilities relating to how accurate your answers are. The true labels are encoded data representing cats, dogs, and monkeys, for the three values. You can compute a confusion matrix using SciKit-Learn as follows:

```python
>>> metrics.confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [1, 0, 1],
       [0, 1, 1]])
```
### Visualize Confusion Matrix
```python
>>> import matplotlib.pyplot as plt

>>> columns = ['Cat', 'Dog', 'Monkey']
>>> confusion = metrics.confusion_matrix(y_true, y_pred)

>>> plt.imshow(confusion, cmap=plt.cm.Blues, interpolation='nearest')
>>> plt.xticks([0,1,2], columns, rotation='vertical')
>>> plt.yticks([0,1,2], columns)
>>> plt.colorbar()

>>> plt.show()
```

## Random Split Train / Test Data
```python
>>> from **sklearn.cross_validation** import train_test_split   
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)   

>>>  Test how well your model can recall its training data:   
>>> model.fit(X_train, y_train).score(X_train, y_train)   
0.943262278808

>>>  Test how well your model can predict unseen data:   
>>> model.fit(X_test, y_test).score(X_test, y_test)   
0.894716422024   
```

## Cross validation
The creators of SciKit-Learn totally got that, and have put together your new favorite method: cross_val_score(). This method takes as input your model along with your training dataset and performs K-fold cross validations on it. In other words, your training data is first cut into a number of "K" sets. Then, "K" versions of your model are trained, each using an independent K-1 number of the "K" available sets. Each model is evaluated with the last set, it's out-of-bag set. If this sounds super familiar to you, it's because this is the same bootstrapping technique used in random forest.

```python
# 10-Fold Cross Validation on your training data   
>>> from sklearn import cross_validation as cval   
>>> cval.cross_val_score(model, X_train, y_train, cv=10)   
array([ 0.93513514,  0.99453552,  0.97237569,  0.98888889,  0.96089385,   
        0.98882682,  0.99441341,  0.98876404,  0.97175141,  0.96590909])   

>>> cval.cross_val_score(model, X_train, y_train, cv=10).mean()   
0.97614938602520218
```

Cross validation allows you to use all the data you provide as both training and testing, so many resources online will recommend you don't even do the extra step of splitting your data into a training and testing set and just feed the lot directly into your cross validator. There are advantages and disadvantages of this. The main advantage is the overall simplicity of your process. The disadvantage is that it still is possible for some information to leak into your training dataset, as we discussed above with the SVC example. This information leak might even occur prior to you fitting your model, for example it might be at the point of transforming your data using isomap or principle component analysis.

For these reasons, the SciKit-Learn documentation recommends you still keep a completely separate testing set to conduct scoring, after cross validating your models. Make sure you read through the documentation before going through the knowledge checks, particularly for the cv parameter and the different types of cross validator iterators. For example, it's absolutely critical that you know the difference between KFold and StratifiedKFold before you start using cross_val_score(), to avoid introducing a serious bug into your machine learning!

## Hyperparameter Tuning

### Naive Best Parameter Search

This piece can be reused for SVC:
```python
C_l = np.arange(0.05,2,0.05)
gamma_l = np.arange(0.001,0.1,0.001)

best_score = 0

for C in C_l:
    for gamma in gamma_l:
        model = SVC(kernel='rbf', C=C, gamma=gamma)
        model.fit(X_train, y_train)
        tmp_score = model.score(X_test, y_test)
        if tmp_score > best_score:
            best_score = tmp_score
            print('Score: %f C: %f gamma: %f' % (best_score,C,gamma))
```

Includes PCA or Isomap:

```python
best_score = 0

for n_components in iso_n_components_l:
    for n_neighbors in iso_n_neighbors_l:
        f_eng = manifold.Isomap(n_neighbors=n_neighbors, n_components=n_components)
        #f_eng = PCA(n_components=n_components)
        f_eng.fit(X_train)
        X_train = f_eng.transform(X_train)
        X_test = f_eng.transform(X_test)
    
        for C in C_l:
            for gamma in gamma_l:
                model = SVC(kernel='rbf', C=C, gamma=gamma)
                model.fit(X_train, y_train)
                tmp_score = model.score(X_test, y_test)
                if tmp_score > best_score:
                    best_score = tmp_score
                    print('Score: %f C: %f gamma: %f n_components: %d n_neighbors: %d' % (best_score,C,gamma,n_components,n_neighbors))
```

## GridSearchCV
GridSearchCV takes care of your parameter tuning and also tacks on end-to-end cross validation. This results in more precisely tuned parameter than depending on simple model accuracy scores, and is why the algorithm is name Grid-Search-CV.
```python
>>> from sklearn import svm, grid_search, datasets  

>>> iris = datasets.load_iris()   
>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 5, 10]}   
>>> model = svm.SVC()    

>>> classifier = grid_search.GridSearchCV(model, parameters)   
>>> classifier.fit(iris.data, iris.target)    
GridSearchCV(cv=None, error_score='raise',    
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,   
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',    
  max_iter=-1, probability=False, random_state=None, shrinking=True,   
  tol=0.001, verbose=False),    
       fit_params={}, iid=True, n_jobs=1,   
       param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 5, 10]},    
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)
```

## RandomizedSearchCV
 First, instead of passing a list of grid objects (with GridSearchCV, you can actually perform multiple grid optimizations, consecutively), this time you pass in a your parameters as a single dictionary that holds either possible, discrete parameter values or distribution over them. 
```python
>>> parameter_dist = {  
  'C': scipy.stats.expon(scale=100),   
  'kernel': ['linear'],   
  'gamma': scipy.stats.expon(scale=.1),   
}   

>>> classifier = grid_search.RandomizedSearchCV(model, parameter_dist)   
>>> classifier.fit(iris.data, iris.target)   

RandomizedSearchCV(cv=None, error_score='raise',   
          estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,   
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',    
  max_iter=-1, probability=False, random_state=None, shrinking=True,   
  tol=0.001, verbose=False),   
          fit_params={}, iid=True, n_iter=10, n_jobs=1,    
          param_distributions={'kernel': ['linear'], 'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x110345c50>,     
 'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x110345d90>},   
          pre_dispatch='2*n_jobs', random_state=None, refit=True,   
          scoring=None, verbose=0)
```

RandomizedSearchCV also takes in an optional n_iter parameter you can use to control the number of parameter settings that are sampled. Regardless of the cross validation search tool you end up using, after all of the methods exposed by the class are ran using the estimator that maximized the score of the out-of-bag data. So in the examples above, the .fit() method along with any subsequent methods, such as .predict(), .score(), .transform(), .predict() are all executed and return values as-if they were called on the best found estimator directly.

# Pipeline

If you don't want to encounter errors, there are a few rules you must abide by while using SciKit-Learn's pipeline:

- Every intermediary model, or step within the pipeline must be a transformer. That means its class must implement both the .fit() and the .transform() methods. This is rather important, as the output from each step will serve as the input to the subsequent step! Every algorithm you've learned about in this class implements .fit() so you're good there, but not all of them implement .transform(). Be sure to take a look at the SciKit-Learn documentation for each algorithm to learn if it qualifier as a transformer, and make note of that on your course map.
- The very last step in your analysis pipeline only needs to implement the .fit() method, since it will not be feeding data into another step

```python
>>> from sklearn.pipeline import Pipeline

>>> svc = svm.SVC(kernel='linear')
>>> pca = RandomizedPCA()

>>> pipeline = Pipeline([
  ('pca', pca),
  ('svc', svc)
])
>>> pipeline.set_params(pca__n_components=5, svc__C=1, svc__gamma=0.0001)
>>> pipeline.fit(X, y)
```
Notice that when you define parameters, you have to lead with the name you specified for that parameter when you added it to your pipeline, followed by two underscores and the parameter name. This is important because there are many estimators that share the same parameter names within SciKit-Learn's API. Without this, there would be ambiguity.

Many of the predictors you learned about in the last few chapters don't actually implement .transform()! Due to this, by default, you won't be able to use SVC, Linear Regression, or Decision Trees, etc. as intermediary steps within your pipeline. A very nifty hack you should be aware of to circumvent this is by writing your own transformer class, which simply wraps a predictor and masks it as a transformer:


```python
from sklearn.base import TransformerMixin

class ModelTransformer(TransformerMixin):
  def __init__(self, model):
    self.model = model

  def fit(self, *args, **kwargs):
    self.model.fit(*args, **kwargs)
    return self

  def transform(self, X, **transform_params):
    # This is the magic =)
    return DataFrame(self.model.predict(X))
```