Forward elimination is a stepwise feature selection technique used in machine learning to select the most significant features for a predictive model. Here's a step-by-step description of the process:

Start with No Features: Begin with an empty model that includes no features.

Add Features Iteratively:

At each step, add the feature that improves the model the most according to a chosen criterion (e.g., p-value, AIC, BIC, adjusted R-squared).
Evaluate the model's performance with the new feature included.
Evaluate Model Performance: Use a performance metric (e.g., accuracy, R-squared, F1 score) to assess the model's performance after adding each feature.

Repeat Until No Improvement: Continue adding features one by one until adding more features does not significantly improve the model's performance.

Final Model: The process stops when no additional features improve the model significantly. The final model includes only the selected features.

Forward Elimination

![Image](https://miro.medium.com/v2/resize:fit:1400/1*N105in3SvDixK_mbWpXFZg.png)

In [15]:
import pandas as pd
from mlxtend.feature_selection import SequentialFeatureSelector

In [16]:
# Load the diabatis.csv file
dataset = pd.read_csv('diabetes.csv')
dataset.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [17]:
x = dataset.iloc[:,:-1]
y = dataset["Outcome"]

In [18]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()

In [None]:
# forward elimination
fs = SequentialFeatureSelector(lr,k_features=5,forward=True)
fs1 = SequentialFeatureSelector(lr,k_features=4,forward=True)
fs2 = SequentialFeatureSelector(lr,k_features=3,forward=True)
fs3 = SequentialFeatureSelector(lr,k_features=2,forward=True)
# backward elimination
bs = SequentialFeatureSelector(lr,k_features=5,forward=False)
bs1 = SequentialFeatureSelector(lr,k_features=4,forward=False)
bs2 = SequentialFeatureSelector(lr,k_features=3,forward=False)
bs3 = SequentialFeatureSelector(lr,k_features=2,forward=False)

#forward elimination fit
fs.fit(x,y)
fs1.fit(x,y)
fs2.fit(x,y)
fs3.fit(x,y)

#backward elimination fit
bs.fit(x,y)
bs1.fit(x,y)
bs2.fit(x,y)
bs3.fit(x,y)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

In [21]:
print(f"fs selected feature: {fs.k_feature_names_}")
print(f"fs1 selected feature: {fs1.k_feature_names_}")
print(f"fs2 selected feature: {fs2.k_feature_names_}")
print(f"fs3 selected feature: {fs3.k_feature_names_}")
print(f"bs selected feature: {bs.k_feature_names_}")
print(f"bs1 selected feature: {bs1.k_feature_names_}")
print(f"bs2 selected feature: {bs2.k_feature_names_}")
print(f"bs3 selected feature: {bs3.k_feature_names_}")

fs selected feature: ('Pregnancies', 'Glucose', 'Insulin', 'BMI', 'Age')
fs1 selected feature: ('Glucose', 'Insulin', 'BMI', 'Age')
fs2 selected feature: ('Glucose', 'BMI', 'Age')
fs3 selected feature: ('Glucose', 'BMI')
bs selected feature: ('Glucose', 'BloodPressure', 'Insulin', 'BMI', 'DiabetesPedigreeFunction')
bs1 selected feature: ('Glucose', 'BloodPressure', 'BMI', 'DiabetesPedigreeFunction')
bs2 selected feature: ('Glucose', 'BMI', 'DiabetesPedigreeFunction')
bs3 selected feature: ('Glucose', 'BMI')


In [26]:
print(f"fs selected feature accuracy: {fs.k_score_}")
print(f"fs1 selected feature accuracy: {fs1.k_score_}")
print(f"fs2 selected feature accuracy: {fs2.k_score_}")
print(f"fs3 selected feature accuracy: {fs3.k_score_}")

print(f"bs selected feature accuracy: {bs.k_score_}")
print(f"bs1 selected feature accuracy: {bs1.k_score_}")
print(f"bs2 selected feature accuracy: {bs2.k_score_}")
print(f"bs3 selected feature accuracy: {bs3.k_score_}")

fs selected feature accuracy: 0.7708768355827178
fs1 selected feature accuracy: 0.7682964094728801
fs2 selected feature accuracy: 0.7683048977166624
fs3 selected feature accuracy: 0.7591206179441474
bs selected feature accuracy: 0.7708683473389355
bs1 selected feature accuracy: 0.7721330956625074
bs2 selected feature accuracy: 0.7682454800101859
bs3 selected feature accuracy: 0.7591206179441474


https://miro.medium.com/v2/resize:fit:1108/1*QLJIAU2bT92WN6Mpln-tqQ.png