# **Features Engineering**

- In machine learning algoritms, the performance of the model is depending on data preprocessing and data handling, so if we use feature engineering we can improve the performance of our model.
- The main goal of Feature Engineering is to get the best results from the algorithms

## **Variable Transformation**

Predictions from linear regression models assume residualsare **normally distributed**

Features and predicte data areoften **skewed** (distored away from the center).

**Data transformation** can solve this issue

<div align=center> <img src='./skeweddata.jpg'/> </div>

**Statiscally**

- **Positively Skewed:** Median < Mean
- **Neagtively skewed:** Mean < Median
- **Simetric distribution**: Mean = Median

### **Transformations: Log Features**

**Log transformation** can be useful for linear regression.

$$
    y_\beta(x)\ =\ \beta_0+\beta_1log(x)
$$

The linear regression model involves linear combinations offeatures.

To apply this transformation we can use numpy library with the log method, the transformation can be applied on each column with skewed data.

### **Transformations: Polynomial Features**

We can estimate higher-order relationships in this data by adding polynomial features
$$
    y_\beta(x)\ =\ \beta_0+\beta_1x+\beta_2x^2
$$
This allow us to use the same linear model

### **Polynomial Feature Syntax**

```python
    from sklearn.preprocessing import PolynomialFeatures
    
    polyFeat = PolynomialFeatures()
    
    polyfeat = polyFeat.fit(X_data)
    X_poly = polyfeat.ttransform(X_data)
```

## **Feature Encoding**

### **Variable Selection**

**Variable selection** invilves choosing the set of features to include in the model.

Variables must often be transfored before they can be included in models.

In addition to log and polynomial tranformations, this caninvolve:
- **Encoding:** converting non-numeric features to numeric features
- **Scaling:** converting the scale of numeric data so they are comparable.

The correct method of scaling or encoding depends on the type of feature

### **Feature Encoding: Types of FEatures**
**Encoding** is often applied to **categorical data**, that take non-numeric values, two primary types:

- **Nominal:** categorical variables take values inunordered categories (eg: Red, blue, green; true, false)
- **Ordinal:** categorical variables take values in ordered categories (eg: High, Medium, Low)

### **Features Encoding: Approaches**

- **Binary encoding:** converts variables to either 0 or 1
- **One-Hot encoding:** converts variables that take multiple values into binary (0 or 1) variables
- **Ordinal encoding:** involves converting ordered categories to numerical values

## **Features Scaling**

## **Why should we use feature scaling?**

Some ML algorithms are sensitive to feature scaling, and others are invariant to it.

**Gradient Descent Based Algorithms**

Algorithms like linear regression, logistic regression that use gradient descent as an optimization technique require data to be scaled.

**Distance Based Algorithm**

Algorithms like KNN, Support Vector MAchine use distances between data point to determine their similarity, for this reason when we use this type of algorithms is important that we use feature scaling

## **Normalization**

Normalization is a technique for scaling data in a range between 0 and 1, this is also know as **Min-Max Scaling**

***Formula:***
$$
 \tilde X= \frac{X - X_{min}}{X_{max} - X_{min} }
$$

- When the value of X is the minimun in the columns, the numerator will be 0 and $\tilde X$ is 0
- When X is the maximun value in  the column , the numerator is equal to the denominator and thus the value of $\tilde X$ is 1

## **Standarization**

It is the other technique of scaling where the values are centered around the mean with a unit standard deviation.

***Formula:***

$$
    \grave X = \frac{X - \nu }{\alpha}
$$

*Where:*

- **$\nu$** represents the mean of the feature values

- **$\alpha$** represents the standard deviation of te feature values

### **When should we use Normalization or Standarization?**

- **Normalization** is used when you know that the distribution of your datadoes not follow a Gaussian distribution, in algorithms like K-Nearest Neighbors and Neural Networks.
- **Standarization** is used when you know that your data follows a Gaussian distribution, standarization can help us specially when our data have outliers