# Feature Selection

------------------------------------------------------------------------

### **Filter Methods**

### Code for Filter Methods on Iris

``` python
python
CopyEdit
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, f_classif, chi2
from sklearn.preprocessing import LabelEncoder
import seaborn as sns
import matplotlib.pyplot as plt

# Load Iris dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Let's check correlation first
corr_matrix = X.corr()
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()

# 1. ANOVA F-test (for numeric target)
anova_selector = SelectKBest(score_func=f_classif, k='all')  # Select all features for now
anova_selector.fit(X, y)
anova_scores = pd.DataFrame(anova_selector.scores_, index=X.columns, columns=["ANOVA Score"])
print("ANOVA F-test scores:")
print(anova_scores.sort_values(by="ANOVA Score", ascending=False))

# 2. Chi-Square (usually for categorical data but let's run it for example)
# First, we need to discretize the data since chi-square works with discrete categories
X_discretized = pd.cut(X['sepal length (cm)'], bins=5)  # Example: discretizing 'sepal length'
chi2_selector = SelectKBest(score_func=chi2, k='all')
chi2_selector.fit(X_discretized.values.reshape(-1, 1), y)
chi2_scores = pd.DataFrame(chi2_selector.scores_, index=["Discretized Sepal Length"], columns=["Chi2 Score"])
print("\nChi-Square scores:")
print(chi2_scores)

# 3. Correlation (use absolute correlation with the target)
correlation_with_target = X.apply(lambda x: x.corr(pd.Series(y)))
print("\nCorrelation with target:")
print(correlation_with_target.sort_values(ascending=False))
```

------------------------------------------------------------------------

### Breakdown of the Code:

1.  **Correlation Matrix**: Visualizes correlations between all features
    (but works best with continuous data).

2.  **ANOVA F-test**: Scores features for their relevance with the
    target.

3.  **Chi-Square**: We discretize one of the features for demonstration
    and check feature relevance with the target.

    (Note: Chi-square typically works better with **categorical
    features**.)

4.  **Correlation with Target**: Finds how much each feature correlates
    with the target.

### **Wrapper Methods**

## 1. Recursive Feature Elimination (RFE)

> “Start with all features, recursively drop the least important one.”

### Use case: Works with any model like Logistic Regression, SVM, RandomForest.

``` python
python
CopyEdit
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
import pandas as pd

# Load data
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Logistic Regression model
model = LogisticRegression(max_iter=200)

# RFE with 2 features
rfe = RFE(model, n_features_to_select=2)
rfe.fit(X, y)

# Print results
print("Selected Features (RFE):")
print(X.columns[rfe.support_])
print("Ranking of features:")
print(dict(zip(X.columns, rfe.ranking_)))
```

------------------------------------------------------------------------

## 2. Forward Selection

> “Start with nothing, add features one by one that improve
> performance.”

### Use case: Good when you suspect *few features matter*.

We’ll use `SequentialFeatureSelector` from `mlxtend`:

``` python
python
CopyEdit
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Forward selection
sfs = SFS(LogisticRegression(max_iter=200),
          k_features=2,
          forward=True,
          floating=False,
          scoring='accuracy',
          cv=5)

sfs = sfs.fit(X_train.values, y_train)

print("Selected Features (Forward):", [X.columns[i] for i in sfs.k_feature_idx_])
```

------------------------------------------------------------------------

## 3. Backward Elimination

> “Start with all features, drop the worst one at a time.”

### Same idea as RFE, but this version removes based on performance drop.

``` python
python
CopyEdit
# Backward Selection (change forward=False)
sbs = SFS(LogisticRegression(max_iter=200),
          k_features=2,
          forward=False,
          floating=False,
          scoring='accuracy',
          cv=5)

sbs = sbs.fit(X_train.values, y_train)

print("Selected Features (Backward):", [X.columns[i] for i in sbs.k_feature_idx_])
```

------------------------------------------------------------------------

## 4. Exhaustive Search

> “Try every possible combo of features. YES, ALL OF THEM.”

**Slow as heck**. Good only for small datasets like Iris.

``` python
python
CopyEdit
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

efs = EFS(LogisticRegression(max_iter=200),
          min_features=2,
          max_features=2,
          scoring='accuracy',
          cv=5)

efs = efs.fit(X_train.values, y_train)

print("Best feature combo (Exhaustive):", [X.columns[i] for i in efs.best_idx_])
```

### **Embedded Methods**

------------------------------------------------------------------------

## 1. **L1 Regularization (Lasso)**

> Shrinks irrelevant feature coefficients to zero — automatic feature
> kill switch 🔪

``` python
python
CopyEdit
from sklearn.linear_model import Lasso
from sklearn.datasets import load_iris
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Load dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Convert to binary classification (0 vs 1 only for Lasso)
scalar=StandardScaler()
x_scaled=scalar.fit_transform(x)

# Fit Lasso
lasso = Lasso(alpha=0.1)
lasso.fit(x_scaled, y_bin)

# Coefficients
coef = pd.Series(lasso.coef_, index=X.columns)
print(" Lasso Coefficients:")
print(coef)

print("\n Selected Features (non-zero):")
print(coef[coef != 0].index.tolist())
```

------------------------------------------------------------------------

## 2. **Tree-Based Models (Random Forest)**

> Use built-in feature importance to rank & select features.

``` python
python
CopyEdit
from sklearn.ensemble import RandomForestClassifier

# Fit Random Forest
rf = RandomForestClassifier()
rf.fit(X, y)

# Feature importance
importances = pd.Series(rf.feature_importances_, index=X.columns)
print(" Random Forest Feature Importances:")
print(importances.sort_values(ascending=False))

print("\n Selected Features (importance > 0.15):")
print(importances[importances > 0.15].index.tolist())
```

------------------------------------------------------------------------

## 3. **ElasticNet**

> Mix of L1 (sparse) and L2 (smooth) — best of both worlds

``` python
python
CopyEdit
from sklearn.linear_model import ElasticNet

# Fit ElasticNet on binary
enet = ElasticNet(alpha=0.1, l1_ratio=0.5)  # l1_ratio = mix level
enet.fit(X_bin, y_bin)

enet_coef = pd.Series(enet.coef_, index=X.columns)
print(" ElasticNet Coefficients:")
print(enet_coef)

print("\n Selected Features (non-zero):")
print(enet_coef[enet_coef != 0].index.tolist())
```

------------------------------------------------------------------------

## 4. **Linear Model Coefficients (with regularization)**

> You can also peek at coefficients directly from a Logistic Regression!

``` python
python
CopyEdit
from sklearn.linear_model import LogisticRegression

# Binary again
logreg = LogisticRegression(penalty='l2', solver='liblinear')
logreg.fit(X_bin, y_bin)

log_coef = pd.Series(logreg.coef_[0], index=X.columns)
print(" Logistic Regression Coefficients:")
print(log_coef)

print("\n Selected Features (abs coef > 0.5):")
print(log_coef[log_coef.abs() > 0.5].index.tolist())
```

------------------------------------------------------------------------

### **Hybrid Methods**

## Iris Dataset – Hybrid in Action

``` python
python
CopyEdit
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Load dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# STEP 1: Filter Method (ANOVA)
filter_selector = SelectKBest(score_func=f_classif, k=3)
X_filter = filter_selector.fit_transform(X, y)

# Get selected feature names
mask = filter_selector.get_support()
filtered_features = X.columns[mask]
print("Filter Selected Features:", filtered_features.tolist())

# STEP 2: Wrapper Method (RFE with Logistic Regression)
model = LogisticRegression(max_iter=200)
rfe_selector = RFE(model, n_features_to_select=2)
X_rfe = rfe_selector.fit_transform(X_filter, y)

final_mask = rfe_selector.get_support()
final_features = filtered_features[final_mask]
print(" Final Hybrid Selected Features:", final_features.tolist())
```