# Backward Elimination Algorithm for Feature Selection

**Backward Elimination** is a classic **wrapper** method used for feature selection in machine learning. The main idea is to start with all available features and iteratively remove the least significant feature at each step until the optimal subset of features is obtained.

## How Backward Elimination Works:

1. **Start with all features** in the dataset.
2. **Train the model** (e.g., Naïve Bayes classifier) using the current set of features.
3. **Evaluate the model's performance** using a chosen metric (such as accuracy).
4. **Identify the least significant feature** (usually the one whose removal improves or least degrades performance).
5. **Remove that feature** from the feature set.
6. **Repeat steps 2-5** until removing more features does not improve the model or a stopping criterion is met (e.g., a predefined number of features or performance threshold).

## Advantages:
- Considers feature interactions because it evaluates subsets with the actual classifier.
- Often leads to a smaller, more relevant feature set.

## Disadvantages:
- Computationally expensive because it requires training the model multiple times.
- Can get stuck in local optima.

In this project, we will implement **Backward Elimination** manually (without using any ready-made libraries) to select features on the TinyMNIST dataset. For classification, we will use the Naïve Bayes optimal classifier from existing packages.


In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import VarianceThreshold

In [20]:
train_data = np.loadtxt('/mnt/e/Term 3/Machin-Learning/Projects/07 pro/data/trainData.csv', delimiter=',', dtype=np.float32)
train_labels = np.loadtxt('/mnt/e/Term 3/Machin-Learning/Projects/07 pro/data/trainLabels.csv', delimiter=',', dtype=np.float32)
test_data = np.loadtxt('/mnt/e/Term 3/Machin-Learning/Projects/07 pro/data/testData.csv', delimiter=',', dtype=np.float32)
test_labels = np.loadtxt('/mnt/e/Term 3/Machin-Learning/Projects/07 pro/data/testLabels.csv', delimiter=',', dtype=np.float32)

print(f'Number of training samples: {train_data.shape[0]}')
print(f'Number of features: {train_data.shape[1]}')

Number of training samples: 5000
Number of features: 196


### 🔍 Variance Threshold Feature Selection

In this section, we apply **feature selection** using the `VarianceThreshold` method from `sklearn.feature_selection`.

The idea is to remove features (columns) that have very low variance, as they provide little to no information for classification.

#### Steps:
1. **Concatenate** the training and test data vertically using `np.vstack()` to ensure that feature selection is performed globally across the entire dataset.
2. **Apply VarianceThreshold** with a threshold of `0.09` (which is `0.90 * (1 - 0.90)`). This means:
   - Any feature with variance **less than 0.09** will be removed.
   - This threshold assumes binary data, where maximum variance is `0.25` (for 50-50 split), so 0.09 is a reasonable cutoff.
3. **Transform** the combined data using the fitted selector, which returns only the selected features.
4. **Split** the filtered data back into training and testing sets.

This helps in reducing dimensionality and keeping only the most informative features for classification.



In [28]:
all_data = np.vstack((train_data, test_data))
var_selctor = VarianceThreshold(threshold=0.09)
all_data = var_selctor.fit_transform(X=all_data)

train_data_with_sel = all_data[:train_data.shape[0]]
test_data_with_sel = all_data[train_data.shape[0]:]

tr_samples_size, feature_size = train_data_with_sel.shape
te_samples_size, _ = test_data_with_sel.shape

print('Train Data Samples:',tr_samples_size,
      ', Test Data Samples',te_samples_size,
      ', Feature Size(after feature-selection):', feature_size)

Train Data Samples: 5000 , Test Data Samples 2500 , Feature Size(after feature-selection): 62


In [32]:
def backward_elimination(train_data, train_labels, test_data, test_labels):
    selected_features = list(range(train_data.shape[1]))
    accuracies = []
    num_features = []
    model = GaussianNB()
    model.fit(train_data, train_labels)
    predictions = model.predict(test_data)
    acc = accuracy_score(y_true=test_labels, y_pred=predictions)
    num_features.append(train_data.shape[1])
    best_accuracy = acc
    accuracies.append(acc)

    while len(selected_features) > 1:
        feature_to_remove = None
        
        for feature in selected_features:
            candidate_features = [f for f in selected_features if f != feature]
            model = GaussianNB()
            model.fit(train_data[:, candidate_features], train_labels)
            predictions = model.predict(test_data[:, candidate_features])
            acc = accuracy_score(test_labels, predictions)

            if acc >= best_accuracy:
                best_accuracy = acc
                feature_to_remove = feature

        if feature_to_remove is not None:
            selected_features.remove(feature_to_remove)
            accuracies.append(best_accuracy)
            num_features.append(len(selected_features))
            print(f"Removed feature: {feature_to_remove}, Accuracy: {best_accuracy:.4f}")
        else:
            break

    return selected_features, num_features, accuracies

# Forward Selection Algorithm for Feature Selection

**Forward Selection** is a simple and intuitive **wrapper** method for feature selection. Unlike Backward Elimination, it starts with an empty set of features and iteratively adds the most significant features one by one until the best subset is found.

## How Forward Selection Works:

1. **Start with an empty feature set.**
2. For each feature not yet selected, **train the model** (e.g., Naïve Bayes classifier) using the current selected features plus this candidate feature.
3. **Evaluate the model's performance** using a chosen metric (such as accuracy).
4. **Select the feature** that improves the model's performance the most.
5. **Add this feature** to the selected feature set.
6. **Repeat steps 2-5** until adding more features does not improve performance or a stopping criterion is met (e.g., maximum number of features).

## Advantages:
- Simple and efficient when the number of features is large.
- Builds up the model incrementally, which can be easier to interpret.

## Disadvantages:
- May miss interactions between features since it only adds one feature at a time.
- Still computationally expensive for very large feature sets.

In this project, we will implement **Forward Selection** manually (without using any ready-made libraries) on the TinyMNIST dataset. For classification, we will use the Naïve Bayes optimal classifier from existing packages.
