# WRAPPER METHOD

Wrapper methods treat feature selection as a search problem.
They use a predictive model (like Logistic Regression, Decision Tree, or SVM) to test different subsets of features and evaluate performance (accuracy, F1 score, etc.).

üëâ Unlike filter methods (independent of ML model), wrapper methods depend on the model.

Example to Understand

Suppose you want to predict whether a person has diabetes or not using these features:

- Age
- BMI
- Blood Pressure
- Insulin Level
- Glucose Level

A wrapper method will:

- Take different combinations of these features.
- Train a model (say Logistic Regression).
- Evaluate which combination gives the best accuracy.

## Main Wrapper Methods

### 1. Forward Selection

Start with no features.

Add one feature at a time (the one that improves model performance the most).

Keep adding until no significant improvement.

üìå Example:

Step 1: Start with nothing ‚Üí add Glucose Level (highest accuracy).

Step 2: Add BMI ‚Üí accuracy improves.

Step 3: Add Blood Pressure ‚Üí no improvement ‚Üí stop.

‚úÖ Final features = {Glucose, BMI}.

### 2. Backward Elimination

Start with all features.

Remove the least important feature (based on performance drop).

Repeat until removing features no longer improves accuracy.

üìå Example:

Step 1: Start with {Age, BMI, BP, Insulin, Glucose}.

Step 2: Remove Age (least effect).

Step 3: Remove Insulin (least effect).

‚úÖ Final features = {BMI, BP, Glucose}.

### 3. Recursive Feature Elimination (RFE)

Uses the model itself (like Logistic Regression or SVM) to rank features by importance.

Removes the least important feature(s) step by step until only the desired number remains.

üìå Example with RFE + Logistic Regression:

Model ranks features ‚Üí {Glucose > BMI > BP > Insulin > Age}.

Removes Age, then Insulin.

Keeps {Glucose, BMI, BP}.

‚ö° Most popular wrapper method because it‚Äôs systematic.

### 4. Exhaustive Feature Selection (Brute Force)

Tries all possible combinations of features.

Picks the one with best performance.

Very accurate but computationally expensive (not practical for large datasets).

üìå Example:

With 5 features ‚Üí 2‚Åµ = 32 subsets to check.

Choose the best-performing subset.

# Code: Wrapper Methods with Breast Cancer Dataset

In [None]:
# Import libraries
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SequentialFeatureSelector, RFE
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
X, y = load_breast_cancer(return_X_y=True)

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Logistic Regression model
model = LogisticRegression(max_iter=5000, solver='liblinear')

In [None]:
X.shape

In [None]:
y.shape

## 1Ô∏è‚É£ Forward Selection

In [None]:
# Forward Selection
forward_selector = SequentialFeatureSelector(
    model, n_features_to_select=5, direction='forward'
)
forward_selector.fit(X_train, y_train)  # its only learning the best 5 features

# Selected feature indices
print("Forward Selection Features:", forward_selector.get_support(indices=True))

# Evaluate
y_pred = forward_selector.transform(X_test)
model.fit(forward_selector.transform(X_train), y_train)
print("Forward Selection Accuracy:", accuracy_score(y_test, model.predict(y_pred)))

In [None]:
new_x = forward_selector.transform(X_train)

In [None]:
new_x.shape

In [None]:
X_train.shape

In [None]:
y_pred.shape

## üîé Two Steps: fit() vs transform()

### 1. forward_selector.fit(X_train, y_train)

Here the search process happens.

Forward selection goes like this:

- Start with no features.
- Train Logistic Regression with each single feature ‚Üí pick the one with best performance.
- Add one more feature at a time, test all combinations, keep the best.
- Repeat until 5 features are chosen.

At the end of fit(), the selector remembers which 5 features were selected.

‚ö° Important: Nothing in X_train changes yet ‚Äî the selector just figures out the "winning subset".

### 2. forward_selector.transform(X_test)

Now that the selector knows the 5 best features, transform() reduces the dataset to only those columns.

If original data had 30 features ‚Üí transform() outputs only the selected 5 features.

Example:

`
X_test.shape before: (114, 30)
X_test.shape after transform: (114, 5)
`

### 3. Final Training & Evaluation

`
model.fit(forward_selector.transform(X_train), y_train)
`

We train Logistic Regression again, but now only on the 5 selected features (reduced dataset).

`
y_pred = model.predict(forward_selector.transform(X_test))
`

Predictions are also done using the reduced feature set.

Accuracy is then calculated on this optimized dataset.

üåü In Short

- fit() ‚Üí searches and decides which features are best.
- transform() ‚Üí actually applies that decision to shrink your dataset to the chosen features.

Together = fit_transform() (common shortcut), but we separate them here because we need to fit on X_train and then transform X_test separately.

## 2Ô∏è‚É£ Backward Elimination

In [None]:
# Backward Elimination
backward_selector = SequentialFeatureSelector(
    model, n_features_to_select=5, direction='backward'
)
backward_selector.fit(X_train, y_train) # its only learning the best 5 features 

# Selected feature indices
print("Backward Elimination Features:", backward_selector.get_support(indices=True))

# Evaluate
y_pred = backward_selector.transform(X_test)
model.fit(backward_selector.transform(X_train), y_train)
print("Backward Elimination Accuracy:", accuracy_score(y_test, model.predict(y_pred)))


## 3Ô∏è‚É£ Recursive Feature Elimination (RFE)

In [None]:
# RFE
rfe = RFE(model, n_features_to_select=5)
rfe.fit(X_train, y_train) # its only learning the best 5 features

print("RFE Features:", rfe.get_support(indices=True))

# Evaluate
model.fit(rfe.transform(X_train), y_train)
y_pred = model.predict(rfe.transform(X_test))
print("RFE Accuracy:", accuracy_score(y_test, y_pred))


## üîé Code Explanation

### 1. Initializing RFE
`
rfe = RFE(model, n_features_to_select=5)
`

- RFE = Recursive Feature Elimination.
- model = Logistic Regression.
- n_features_to_select=5 ‚Üí we want to keep only 5 features at the end.

üëâ What happens behind the scenes:

- RFE trains the model on all features.
- The model provides a feature importance ranking (coefficients in Logistic Regression, or weights in SVM, or feature_importances_ in Tree-based models).
- RFE removes the least important feature.
- Then it retrains with the reduced set, removes the least important again‚Ä¶
- Repeats this until only 5 features remain.

So it‚Äôs like an iterative elimination tournament.

### 2. Fitting on Training Data
`
rfe.fit(X_train, y_train)
`

Runs the RFE process:

- Train model on all features.
- Drop the least important one.
- Retrain.
- Repeat until 5 features are left.

üëâ After this, rfe remembers which 5 features survived.

### 3. Getting the Selected Features
`
print("RFE Features:", rfe.get_support(indices=True))
`

get_support(indices=True) ‚Üí returns the column indices of the chosen 5 features.

Example: [1, 7, 20, 25, 27].

### 4. Training the Final Model
`
model.fit(rfe.transform(X_train), y_train)
`

rfe.transform(X_train) ‚Üí reduces training data to only those 5 selected features.

Then Logistic Regression is trained on this reduced dataset.

üëâ Now the model is simpler, using fewer features.

### 5. Making Predictions & Accuracy
`
y_pred = model.predict(rfe.transform(X_test))
`

`
print("RFE Accuracy:", accuracy_score(y_test, y_pred))
`

rfe.transform(X_test) ‚Üí keeps only the 5 selected features from the test set.

Predictions are made with those features.

Accuracy is compared against actual labels y_test.

### üåü Analogy

Think of RFE like eliminating weakest players step by step:

Start with full cricket squad.

Remove the weakest player based on performance.

Play again, see who is now weakest, remove them.

Repeat until only 5 best players remain.

This is why it‚Äôs called Recursive Feature Elimination ‚Äî elimination happens step by step, recursively.

## 4Ô∏è‚É£ Exhaustive Feature Selection (‚ö†Ô∏è Heavy for many features)

In [None]:
!pip install mlxtend

In [None]:
from mlxtend.feature_selection import ExhaustiveFeatureSelector

# Exhaustive Feature Selection
efs = ExhaustiveFeatureSelector(
    model,
    min_features=3,
    max_features=5,
    scoring='accuracy',
    cv=3
)
efs.fit(X_train, y_train)

print("Exhaustive Selection Features:", efs.best_idx_)

# Evaluate
model.fit(X_train[:, efs.best_idx_], y_train)
y_pred = model.predict(X_test[:, efs.best_idx_])
print("Exhaustive Selection Accuracy:", accuracy_score(y_test, y_pred))


| Method                                  | Best Situation to Use                                                                                                              | Why Choose It?                                                                            | When to Avoid                                                                                                |
| --------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| **Forward Selection**                   | Large dataset with **many features**, but you expect only a **few are useful**                                                     | Efficient, starts small and adds gradually. Works well when irrelevant features are many. | If there are **strong feature interactions** (two weak features together might be useful), it may miss them. |
| **Backward Elimination**                | Dataset with a **moderate number of features** (‚â§ 20) and most features are likely useful                                          | Considers all features from the beginning, so captures feature interactions.              | Very slow if feature count is **large (100+)**, since it starts with everything.                             |
| **Recursive Feature Elimination (RFE)** | You want a **systematic, model-based method**; especially good when model provides feature importance (LogReg, SVM, Decision Tree) | Widely used in practice, balances performance and interpretability, can rank features.    | Computationally expensive if dataset is huge (many features + many rows).                                    |
| **Exhaustive Feature Selection**        | Very **small feature set (<15 features)** and you want the **absolute best subset**                                                | Tries all possible combinations, guarantees optimal result.                               | Becomes impossible for large feature sets (time grows exponentially: 2^n).                                   |
