# Exploring the usage of EstimatorCV Objects

In [13]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)


In [14]:
from sklearn.feature_selection import RFE 
from sklearn.linear_model import LogisticRegression
"""Given an external estimator that assigns weights to features 
(e.g., the coefficients of a linear model), the goal of recursive 
feature elimination (RFE) is to select features by recursively 
considering smaller and smaller sets of features.
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE
"""
feature_elimination_lr = RFE(LogisticRegression(C=100), n_features_to_select=2)


In [16]:
# Train and measure
feature_elimination_lr.fit(X_train, y_train)
feature_elimination_lr.score(X_test, y_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


0.9736842105263158

# GridSearchCV to find the right amount of features
You noticed that we are using RFE to train a model with a reduced number features.
The question is, how? many features `n_features_to_select` is the right amount.
We can use `GridSearchCV` to find out.

In [17]:
from sklearn.model_selection import GridSearchCV
param_grid = {'n_features_to_select':range(1,5)}
grid_search = GridSearchCV(feature_elimination_lr, param_grid, cv=5)
grid_search.fit(X_train, y_train)
grid_search.score(X_test, y_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

0.9736842105263158

In [18]:
# Get the best params
grid_search.best_params_

{'n_features_to_select': 4}