## Cross-Validation

* e.g.
```python
    scores = -1 * cross_val_score(my_pipeline, X, y,
                                cv=5,
                                scoring='neg_mean_absolute_error')
    avg_score = scores.mean()
```
&nbsp; &nbsp; &nbsp; &nbsp; : scikit-learn considers a high number is better. <br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Using *negatives* here allows them to be consistent with that convention.

* Format
```python
    sklearn.model_selection.cross_val_score(estimator, X, y=None, *, groups=None,           
                scoring=None, cv=None, n_jobs=None, verbose=0, fit_params=None, 
                pre_dispatch='2*n_jobs', error_score=nan)
```
* estimator : The object to use to fit the data.
* More Scoring options : https://scikit-learn.org/stable/modules/model_evaluation.html
* cv : int. Determines the cross-validation splitting strategy.


## Exercise Notes

* A function that reports the average (w/ cv=3) MAE of a machine learning pipeline.
```python
        def get_score(n_estimators):    # n_estimators : # of trees in random forest
            my_pipeline = Pipeline(steps=[
                ('preprocessor', SimpleImputer()),
                ('model', RandomForestRegressor(n_estimators, random_state=0))])
                
            scores = -1 * cross_val_score(my_pipeline, X, y, cv=3,
                                        scoring='neg_mean_absolute_error')
            return scores.mean()
```

* Stores the mean scores corresponding to 50, 100, ..., 350, 400 trees in a dictionary.
```python
    results = {i:get_score(i) for i in range(50, 401, 50)}
```