
# Lab 6: Feature Extraction and Selection Techniques

## Objectives
1. Understand the importance of feature extraction and selection in Data Science.
2. Experiment with encoding methods like one-hot encoding.
3. Learn to create new features based on domain expertise.

---



### What is Feature Extraction?
Feature extraction involves transforming raw data into a format suitable for modeling. It includes methods like:
- Encoding categorical variables
- Creating new features from existing data

### What is Feature Selection?
Feature selection focuses on selecting the most relevant features for training models. This can help improve:
- Model performance
- Computational efficiency


## Differences Between Feature Extraction and Feature Selection

### 1. Definition

### **Feature Extraction**
- **Purpose**: Creates new features by transforming the original data.
- **Approach**: Combines or derives features from the existing dataset using mathematical or statistical transformations.
- **Outcome**: Generates a new feature space, often in reduced dimensions, while retaining important information.
- **Examples**:
  - Principal Component Analysis (PCA)
  - Creating new features like BMI from weight and height.
  - Text vectorization (e.g., TF-IDF, word embeddings).

### **Feature Selection**
- **Purpose**: Selects the most relevant features from the original dataset.
- **Approach**: Removes redundant, irrelevant, or noisy features while keeping the important ones.
- **Outcome**: Reduces the dimensionality of the dataset without altering the existing feature space.
- **Examples**:
  - Correlation analysis.
  - Recursive Feature Elimination (RFE).
  - Chi-square test.

---

### 2. Primary Goal

| **Aspect**          | **Feature Extraction**                              | **Feature Selection**                            |
|----------------------|----------------------------------------------------|-------------------------------------------------|
| **Goal**            | Transform data into a new feature space.           | Identify and keep only the most relevant features. |

---

### 3. Changes to Data

| **Aspect**          | **Feature Extraction**                              | **Feature Selection**                            |
|----------------------|----------------------------------------------------|-------------------------------------------------|
| **Feature Space**   | Alters the original feature space (e.g., reduces or combines features). | Retains the original feature space. |
| **Data Transformation** | Yes, involves transforming or combining features. | No, simply selects features without modification. |

---

### 4. Techniques Used

| **Feature Extraction**                              | **Feature Selection**                            |
|----------------------------------------------------|-------------------------------------------------|
| Dimensionality reduction (e.g., PCA, t-SNE)         | Filter methods (e.g., correlation, Chi-square)   |
| Signal processing techniques                        | Wrapper methods (e.g., Recursive Feature Elimination) |
| Feature engineering (e.g., creating features using domain knowledge) | Embedded methods (e.g., Lasso regression)        |

---

### 5. Complexity

- **Feature Extraction**: More computationally intensive as it involves creating or transforming features.
- **Feature Selection**: Generally less computationally intensive since it works with existing features.

---

### 6. Use Cases

### **Feature Extraction**
- When raw data has complex relationships or is unstructured (e.g., text, images, audio).
- When the goal is to reduce the dimensionality of the data for visualization or modeling.

### **Feature Selection**
- When the dataset contains irrelevant, redundant, or noisy features.
- When interpretability is important (e.g., identifying which features are most important for predictions).

---

### Analogy

- **Feature Extraction**: Like creating a summary or an abstract from a large document; you retain essential information in a new form.
- **Feature Selection**: Like choosing the most relevant paragraphs from the document without rewriting or modifying the content.

---

### Example in Context

### Dataset: A housing dataset with features like `area`, `number_of_rooms`, and `location`.

- **Feature Extraction**: Create a new feature `price_per_sq_ft` by combining `price` and `area`.
- **Feature Selection**: Use correlation to find that `number_of_rooms` is less correlated with house price compared to `area` and remove it.

---

### Conclusion
In practice, feature extraction and feature selection can complement each other to achieve the best model performance.


## Encoding Methods in Machine Learning

When dealing with categorical data, encoding methods are used to transform categories into numerical formats suitable for machine learning algorithms. Below are the most commonly used encoding methods:

---

### **1. One-Hot Encoding**
- **Description**: Converts each category into a binary vector.
- **When to Use**: For **nominal data** (no inherent order) with a small number of categories.
- **Example**: 
  - Input: `["Red", "Green", "Blue"]`
  - Output:
    ```
    Red  Green  Blue
    1    0      0
    0    1      0
    0    0      1
    ```

---

### **2. Label Encoding**
- **Description**: Assigns a unique integer to each category.
- **When to Use**: For **ordinal data** with a natural order or ranking.
- **Example**: 
  - Input: `["Red", "Green", "Blue"]`
  - Output: `[0, 1, 2]`

---

### **3. Target Encoding**
- **Description**: Replaces each category with the mean of the target variable for that category.
- **When to Use**: For regression or classification tasks where the relationship between the category and target is important.
- **Example**: 
  - Input: `City = ["A", "B", "C"]`, Target: `[10, 20, 30]`
  - Output: `City A: 10, City B: 20, City C: 30`

---

### **4. Frequency Encoding**
- **Description**: Replaces categories with their frequency of occurrence in the dataset.
- **When to Use**: When frequency provides meaningful information.
- **Example**: 
  - Input: `["A", "B", "B", "C", "A", "A"]`
  - Output: `[3, 2, 2, 1, 3, 3]`

---

### **5. Binary Encoding**
- **Description**: Combines label encoding and binary conversion. Each label is converted into an integer and then represented in binary.
- **When to Use**: For **high-cardinality categorical data**.
- **Example**:
  - Input: `["A", "B", "C", "D"]`
  - Label Encoding: `[0, 1, 2, 3]`
  - Binary Conversion:
    ```
    0 -> 0
    1 -> 1
    2 -> 10
    3 -> 11
    ```

---

### **Choosing the Right Encoding Method**
- **One-Hot Encoding**: Best for nominal data with few categories.
- **Label Encoding**: Ideal for ordinal data with a natural order.
- **Target Encoding**: Effective in supervised learning tasks.
- **Frequency/Binary Encoding**: Suitable for high-cardinality data.


### One-Hot Encoding
One-hot encoding is used to convert categorical variables into a binary matrix, making them usable in machine learning models.

#### Example: Encoding a `Color` column


In [1]:

import pandas as pd

# Sample dataset
data = {'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red']}
df = pd.DataFrame(data)

# One-hot encoding
encoded_df = pd.get_dummies(df, columns=['Color'])

print("Original DataFrame:")
print(df)

print("\nOne-Hot Encoded DataFrame:")
print(encoded_df)


Original DataFrame:
   Color
0    Red
1   Blue
2  Green
3   Blue
4    Red

One-Hot Encoded DataFrame:
   Color_Blue  Color_Green  Color_Red
0       False        False       True
1        True        False      False
2       False         True      False
3        True        False      False
4       False        False       True



### Creating New Features
Domain expertise can be used to create meaningful features. For example:
- Calculating a person's age based on their birth year.
- Deriving a "total price" feature by multiplying quantity and unit price.

#### Example: Creating `Total_Sales` from `Quantity` and `Unit_Price`


In [2]:

# Sample dataset
sales_data = {'Quantity': [10, 20, 30], 'Unit_Price': [50, 100, 200]}
sales_df = pd.DataFrame(sales_data)

# Creating a new feature
sales_df['Total_Sales'] = sales_df['Quantity'] * sales_df['Unit_Price']

print("Sales Data with New Feature:")
print(sales_df)


Sales Data with New Feature:
   Quantity  Unit_Price  Total_Sales
0        10          50          500
1        20         100         2000
2        30         200         6000



### Feature Selection
Correlation can help determine relationships between features and the target variable. Features with high correlation to the target are often useful.

#### Example: Correlation Matrix


### **Dataset**
Consider a small dataset with 3 features, `Feature1`, `Feature2`, and `Feature3`:

| Feature1 | Feature2 | Feature3 | Target |
|----------|----------|----------|--------|
| 1        | 5        | 1        | 0      |
| 2        | 4        | 2        | 1      |
| 3        | 3        | 3        | 0      |
| 4        | 2        | 4        | 1      |
| 5        | 1        | 5        | 0      |

We will compute the correlation between the features and remove features that are highly correlated with each other. Features that are highly correlated don't provide much new information, so they can be removed.

### Step 1: Compute Correlation Between Features
We can compute the correlation between each pair of features using a simple method. If the correlation value between two features is greater than 0.9 or less than -0.9, we will remove one of them.

### Code Example:

In [4]:
import pandas as pd

# Sample dataset
data = {
    "Feature1": [1, 2, 3, 4, 5],
    "Feature2": [5, 4, 3, 2, 1],
    "Feature3": [1, 2, 3, 4, 5],
    "Target": [0, 1, 0, 1, 0]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Compute the correlation matrix
corr_matrix = df.corr()

# Display correlation matrix
print("Correlation Matrix:")
print(corr_matrix)

# If two features are highly correlated (correlation > 0.9 or < -0.9), we will drop one of them.

Correlation Matrix:
          Feature1  Feature2  Feature3  Target
Feature1       1.0      -1.0       1.0     0.0
Feature2      -1.0       1.0      -1.0     0.0
Feature3       1.0      -1.0       1.0     0.0
Target         0.0       0.0       0.0     1.0


### Step 2: Select Features
From the correlation matrix, we can see that `Feature1`, `Feature2`, and `Feature3` are highly correlated with each other. If we remove one of them, the model will still perform well with fewer features.

For example, we can remove either `Feature2` or `Feature3`, as they have a perfect negative correlation with `Feature1`. Let's remove `Feature2` and keep `Feature1` and `Feature3`.


### Step 3: Drop the Highly Correlated Feature

In [7]:
# Remove Feature2 because it's highly correlated with Feature1 and Feature3
df_selected = df.drop('Feature2', axis=1)

print("Dataset after Feature Selection:")
print(df_selected)


Dataset after Feature Selection:
   Feature1  Feature3  Target
0         1         1       0
1         2         2       1
2         3         3       0
3         4         4       1
4         5         5       0


## Practice Questions: Feature Extraction and Selection

---

### Feature Extraction: Handling Categorical Data with One-Hot Encoding**
- **Task:** 
  - You are given a dataset with a categorical column `Color` containing values `Red`, `Blue`, and `Green`.
  - Apply one-hot encoding to the `Color` column and add it as new columns to the dataset.
  
  **Dataset Example:**
  ```python
  data = {'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red']}
  ```


### Feature Extraction: Normalizing Numerical Data

#### Task:
Given the dataset below, normalize the numerical features **Age** and **Salary** using Min-Max scaling.

**Dataset Example:**
```python
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [25, 30, 35, 40, 45],
        'Salary': [50000, 60000, 70000, 80000, 90000]}

# Min-Max scaling function
def min_max_scaling(column):
    min_value = min(column)
    max_value = max(column)
    return [(x - min_value) / (max_value - min_value) for x in column]
 ```


### Feature Selection: Removing Highly Correlated Features
- **Task:** 
  - You are given a dataset with three features, **Feature1**, **Feature2**, and **Feature3**. Compute the correlation matrix and drop one feature that is highly correlated with the others (correlation > 0.9 or < -0.9).
  
- **Dataset Example:**
  ```python
  data = {'Feature1': [1, 2, 3, 4, 5],
          'Feature2': [5, 4, 3, 2, 1],
          'Feature3': [1, 2, 3, 4, 5],
          'Target': [0, 1, 0, 1, 0]}


### Feature Extraction: Creating New Features from Existing Data**
- **Task:** 
  - Given the dataset, create a new feature **AgeGroup** based on the **Age** column, where:
    - If **Age** is less than 30, the **AgeGroup** is "Young".
    - If **Age** is between 30 and 50 (inclusive), the **AgeGroup** is "Adult".
    - If **Age** is greater than 50, the **AgeGroup** is "Senior".
  
- **Dataset Example:**
  ```python
  data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
          'Age': [25, 30, 35, 40, 55]}


