In [13]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target
knn = KNeighborsClassifier(n_neighbors=4)

We start by selection the "best" 3 features from the Iris dataset via Sequential Forward Selection (SFS). Here, we set forward=True and floating=False. By choosing cv=0, we don't perform any cross-validation, therefore, the performance (here: 'accuracy') is computed entirely on the training set.

In [4]:
!pip install xgboost
!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.17.3-py2.py3-none-any.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 761 kB/s eta 0:00:01
Installing collected packages: mlxtend
Successfully installed mlxtend-0.17.3


In [5]:
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

sfs1 = SFS(knn, 
           k_features=3, 
           forward=True, 
           floating=False, 
           verbose=2,
           scoring='accuracy',
           cv=0)

sfs1 = sfs1.fit(X, y)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.0s finished

[2020-11-08 18:59:37] Features: 1/3 -- score: 0.96[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s finished

[2020-11-08 18:59:37] Features: 2/3 -- score: 0.9733333333333334[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s finished

[2020-11-08 18:59:37] Features: 3/3 -- score: 0.9733333333333334

In [7]:
# Or
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

sfs1 = SFS(knn, 
           k_features=3, 
           forward=True, 
           floating=False, 
           verbose=2,
           scoring='accuracy',
           cv=3,
          n_jobs=-1)

sfs1 = sfs1.fit(X, y)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    1.1s finished

[2020-11-08 19:02:13] Features: 1/3 -- score: 0.96[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    0.0s finished

[2020-11-08 19:02:13] Features: 2/3 -- score: 0.9733333333333333[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:    0.0s finished

[2020-11-08 19:02:13] Features: 3/3 -- score: 0.9733333333333333

Via the subsets_ attribute, we can take a look at the selected feature indices at each step:

In [8]:
sfs1.subsets_

{1: {'feature_idx': (3,),
  'cv_scores': array([0.98, 0.94, 0.96]),
  'avg_score': 0.96,
  'feature_names': ('3',)},
 2: {'feature_idx': (2, 3),
  'cv_scores': array([0.98, 0.96, 0.98]),
  'avg_score': 0.9733333333333333,
  'feature_names': ('2', '3')},
 3: {'feature_idx': (1, 2, 3),
  'cv_scores': array([0.98, 0.98, 0.96]),
  'avg_score': 0.9733333333333333,
  'feature_names': ('1', '2', '3')}}

Note that the 'feature_names' entry is simply a string representation of the 'feature_idx' in this case. Optionally, we can provide custom feature names via the fit method's custom_feature_names parameter:

In [9]:
feature_names = ('sepal length', 'sepal width', 'petal length', 'petal width')
sfs1 = sfs1.fit(X, y, custom_feature_names=feature_names)
sfs1.subsets_

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.1s finished

[2020-11-08 19:05:10] Features: 1/3 -- score: 0.96[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    0.0s finished

[2020-11-08 19:05:10] Features: 2/3 -- score: 0.9733333333333333[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:    0.0s finished

[2020-11-08 19:05:10] Features: 3/3 -- score: 0.9733333333333333

{1: {'feature_idx': (3,),
  'cv_scores': array([0.98, 0.94, 0.96]),
  'avg_score': 0.96,
  'feature_names': ('petal width',)},
 2: {'feature_idx': (2, 3),
  'cv_scores': array([0.98, 0.96, 0.98]),
  'avg_score': 0.9733333333333333,
  'feature_names': ('petal length', 'petal width')},
 3: {'feature_idx': (1, 2, 3),
  'cv_scores': array([0.98, 0.98, 0.96]),
  'avg_score': 0.9733333333333333,
  'feature_names': ('sepal width', 'petal length', 'petal width')}}

Furthermore, we can access the indices of the 3 best features directly via the k_feature_idx_ attribute:

In [10]:
sfs1.k_feature_idx_

(1, 2, 3)

And similarly, to obtain the names of these features, given that we provided an argument to the custom_feature_names parameter, we can refer to the sfs1.k_feature_names_ attribute:

In [11]:
sfs1.k_feature_names_

('sepal width', 'petal length', 'petal width')

Finally, the prediction score for these 3 features can be accesses via k_score_:

In [12]:
sfs1.k_score_

0.9733333333333333