In [1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

##### Feature Selection


Feature selection is the process of selecting a subset of relevant and informative features from a larger set of available features for use in machine learning algorithms. The aim is to reduce the dimensionality of the data and improve the accuracy and efficiency of the model.



There are several techniques of feature selection. Let's take a look into a two most popular techniques.



##### Forward Feature Selection


Forward feature selection involves starting with an empty set of features and iteratively adding one feature at a time based on their individual performance in predicting the outcome variable. This process continues until a stopping criterion is met, such as reaching a predefined number of features or a specific level of accuracy.



In [5]:
# creating custom datasets for testing
X,y=make_classification(
     n_samples=800,#total rows
    n_features=10,# total column
    
    n_informative=2,#informative features
    n_redundant=2,
    random_state=42

)


In [6]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,stratify=y,random_state=42)

In [7]:
selected_features=[]
for i in range(X_train.shape[1]):
    best_acc=0
    best_feature=None

    for j in range(X_train.shape[1]):
        if j not in selected_features:
            features=selected_features+[j]

            model=LogisticRegression()
            model.fit(X_train[:,features],y_train)
            accuracy=model.score(X_test[:,features],y_test)

            if accuracy>best_acc:
                best_acc=accuracy
                best_feature=j

    selected_features.append(best_feature)
    print("Selected feature (forward):",selected_features,"Score: ",accuracy)

Selected feature (forward): [9] Score:  0.88125
Selected feature (forward): [9, 0] Score:  0.89375
Selected feature (forward): [9, 0, 2] Score:  0.89375
Selected feature (forward): [9, 0, 2, 8] Score:  0.90625
Selected feature (forward): [9, 0, 2, 8, 1] Score:  0.90625
Selected feature (forward): [9, 0, 2, 8, 1, 4] Score:  0.90625
Selected feature (forward): [9, 0, 2, 8, 1, 4, 5] Score:  0.90625
Selected feature (forward): [9, 0, 2, 8, 1, 4, 5, 3] Score:  0.9
Selected feature (forward): [9, 0, 2, 8, 1, 4, 5, 3, 6] Score:  0.9
Selected feature (forward): [9, 0, 2, 8, 1, 4, 5, 3, 6, 7] Score:  0.9


##### Backward Feature selection

Backward feature selection, on the other hand, starts with all available features and iteratively removes one feature at a time based on their individual performance in predicting the outcome variable. This process continues until a stopping criterion is met, such as reaching a predefined number of features or a specific level of accuracy.



In [8]:
# Implement backward feature elimination
selected_features = list(range(X_train.shape[1]))

for i in range(X_train.shape[1] - 1):
    
    worst_accuracy = 1
    worst_feature = None
    
    for j in selected_features:
        
        features = selected_features.copy()
        features.remove(j)
        
        model = LogisticRegression()
        model.fit(X_train[:, features], y_train)
        accuracy = model.score(X_test[:, features], y_test)
        
        if accuracy < worst_accuracy:
            worst_accuracy = accuracy
            worst_feature = j
            
    selected_features.remove(worst_feature)
    print("Selected Features (Backward):", selected_features, "Score:", accuracy)

Selected Features (Backward): [1, 2, 3, 4, 5, 6, 7, 8, 9] Score: 0.9
Selected Features (Backward): [2, 3, 4, 5, 6, 7, 8, 9] Score: 0.8875
Selected Features (Backward): [3, 4, 5, 6, 7, 8, 9] Score: 0.8875
Selected Features (Backward): [4, 5, 6, 7, 8, 9] Score: 0.89375
Selected Features (Backward): [5, 6, 7, 8, 9] Score: 0.8875
Selected Features (Backward): [6, 7, 8, 9] Score: 0.8875
Selected Features (Backward): [6, 7, 8] Score: 0.4375
Selected Features (Backward): [6, 8] Score: 0.5
Selected Features (Backward): [8] Score: 0.45625
