<a href="https://colab.research.google.com/github/sureshmecad/Google-Colab/blob/master/2_SelectKBest_iris_boston.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The SelectKBest method selects the features according to the k highest score. By changing the 'score_func' parameter we can apply the method for both classification and regression data. 

In [1]:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2, f_regression
from sklearn.datasets import load_boston
from sklearn.datasets import load_iris
from numpy import array 

### **1) SelectKBest for classification**

In [2]:
iris = load_iris()
x = iris.data
y = iris.target
 
print("Feature data dimension: ", x.shape) 

Feature data dimension:  (150, 4)


- For **classification** we'll set **'chi2'**  method as a **scoring function.**

- The **target number of features** is defined by **k** parameter.

In [3]:
select = SelectKBest(score_func=chi2, k=3)
z = select.fit_transform(x,y)
 
print("After selecting best 3 features:", z.shape)

After selecting best 3 features: (150, 3)


- To **identify** the selected features we use **get_support()** function

In [4]:
filter = select.get_support()
features = array(iris.feature_names)
 
print("All features:", features)
 
print("\n\nSelected best 3:", features[filter])
print("\n\n", z) 

All features: ['sepal length (cm)' 'sepal width (cm)' 'petal length (cm)'
 'petal width (cm)']


Selected best 3: ['sepal length (cm)' 'petal length (cm)' 'petal width (cm)']


 [[5.1 1.4 0.2]
 [4.9 1.4 0.2]
 [4.7 1.3 0.2]
 [4.6 1.5 0.2]
 [5.  1.4 0.2]
 [5.4 1.7 0.4]
 [4.6 1.4 0.3]
 [5.  1.5 0.2]
 [4.4 1.4 0.2]
 [4.9 1.5 0.1]
 [5.4 1.5 0.2]
 [4.8 1.6 0.2]
 [4.8 1.4 0.1]
 [4.3 1.1 0.1]
 [5.8 1.2 0.2]
 [5.7 1.5 0.4]
 [5.4 1.3 0.4]
 [5.1 1.4 0.3]
 [5.7 1.7 0.3]
 [5.1 1.5 0.3]
 [5.4 1.7 0.2]
 [5.1 1.5 0.4]
 [4.6 1.  0.2]
 [5.1 1.7 0.5]
 [4.8 1.9 0.2]
 [5.  1.6 0.2]
 [5.  1.6 0.4]
 [5.2 1.5 0.2]
 [5.2 1.4 0.2]
 [4.7 1.6 0.2]
 [4.8 1.6 0.2]
 [5.4 1.5 0.4]
 [5.2 1.5 0.1]
 [5.5 1.4 0.2]
 [4.9 1.5 0.2]
 [5.  1.2 0.2]
 [5.5 1.3 0.2]
 [4.9 1.4 0.1]
 [4.4 1.3 0.2]
 [5.1 1.5 0.2]
 [5.  1.3 0.3]
 [4.5 1.3 0.3]
 [4.4 1.3 0.2]
 [5.  1.6 0.6]
 [5.1 1.9 0.4]
 [4.8 1.4 0.3]
 [5.1 1.6 0.2]
 [4.6 1.4 0.2]
 [5.3 1.5 0.2]
 [5.  1.4 0.2]
 [7.  4.7 1.4]
 [6.4 4.5 1.5]
 [6.9 4.9 1.5]
 [5.5 4.  1.3]
 [6.5 4.6 1.

### **2) SelectKBest for regression**

In [5]:
boston = load_boston()
X = boston.data
y = boston.target

print("Feature data dimension: ", X.shape)

Feature data dimension:  (506, 13)


- For **regression**, we'll set **'f_regression'**  method as a **scoring function.**

- The **target number of features** to select is **8**.

In [7]:
select1 = SelectKBest(score_func=f_regression, k=8)
z1 = select1.fit_transform(X, y) 
 
print("After selecting best 8 features:", z1.shape)

After selecting best 8 features: (506, 8)


In [11]:
filter1 = select1.get_support()
features1 = array(boston.feature_names)
 
print("All features:", features1)
 
print("\n\nSelected best 8:", features1[filter1])
print("\n\n", z1)

All features: ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']


Selected best 8: ['CRIM' 'INDUS' 'NOX' 'RM' 'RAD' 'TAX' 'PTRATIO' 'LSTAT']


 [[6.3200e-03 2.3100e+00 5.3800e-01 ... 2.9600e+02 1.5300e+01 4.9800e+00]
 [2.7310e-02 7.0700e+00 4.6900e-01 ... 2.4200e+02 1.7800e+01 9.1400e+00]
 [2.7290e-02 7.0700e+00 4.6900e-01 ... 2.4200e+02 1.7800e+01 4.0300e+00]
 ...
 [6.0760e-02 1.1930e+01 5.7300e-01 ... 2.7300e+02 2.1000e+01 5.6400e+00]
 [1.0959e-01 1.1930e+01 5.7300e-01 ... 2.7300e+02 2.1000e+01 6.4800e+00]
 [4.7410e-02 1.1930e+01 5.7300e-01 ... 2.7300e+02 2.1000e+01 7.8800e+00]]
