# **Feature Selection and Extraction**

## Feature Selection
Feature selection is the process of identifying and selecting a subset of the most relevant input variables (features) to use in building a machine learning model. By removing redundant or irrelevant data, it helps improve model accuracy, reduces training time, and prevents overfitting.

### Ways to Select features
- **Covariance**
    Covariance measures the direction of a linear relationship between two variables. It calculates the extent to which two variables vary together from their respective means.

    $$ Cov(x, y) = \frac{\sum_{i=1}^{n} (x_{i} - x')*(y_{i} - y')}{n-1} $$
    $$ x', y' = \text{Mean of x and y features} $$
    $$ n = \text{number of data points} $$

    - Positive Covariance: Both variables tend to increase or decrease together.
    - Negative Covariance: One variable tends to increase as the other decreases.

- **Correlation**
    Correlation is a standardised version of covariance that measures both the direction and strength of a linear relationship. It divides the covariance of two variables by the product of their standard deviations 1.1.7. This "normalises" the result to a fixed range between -1 and +1.

    $$ \text{Pearson Correlation of x and y} = \frac{Cov(x, y)}{σ(x) * σ(y)} $$
    $$ σ(x) = \text{standard deviation of x} $$

If a feature have a correlation or covariance with the output, nearer to zero then it is safe to eleminate that feature.

## Feature Extraction

Feature Extraction is the process of combining or transforming existing features to create **new, more meaningful features**. Instead of feeding raw inputs directly to the model, we create smarter inputs.

#### Example: House Data

Suppose we have two features:

- `Number_of_Rooms`
- `Room_Size`

Now think practically:

If a house has:
- 5 rooms
- Each room is 200 sq.ft

Total usable space = 5 × 200 = **1000 sq.ft**

So instead of giving:

- Number_of_Rooms = 5
- Room_Size = 200
We can create a new feature:

- Total_House_Size = Number_of_Rooms × Room_Size

Now we feed:
- Total_House_Size = 1000