# 🚀 Day-11: Feature Engineering

---

## 🌟 What is Feature Engineering?

**Feature Engineering** is the process of **creating new features**, **modifying existing ones**, or **removing irrelevant features** to improve the performance of Machine Learning models.

It helps to:
- Improve the **quality of input data**
- Enhance **model accuracy and performance**
- Reduce **noise and irrelevant information**

---

## 🔥 Why Feature Engineering?

A Machine Learning model is only as good as the data you feed into it.  
**Feature Engineering bridges the gap between raw data and meaningful patterns.**

---

## 🎯 Common Feature Engineering Techniques

| Technique                          | Description                                              |
|------------------------------------|----------------------------------------------------------|
| **Creating New Features**         | Deriving new variables from existing data                |
| **Removing Irrelevant Features**  | Dropping unnecessary columns                             |
| **Feature Transformation**        | Applying mathematical functions (Log, Square, etc.)      |
| **Binning/Bucketing**             | Converting continuous variables into categorical bins    |
| **Date/Time Feature Extraction**  | Extracting year, month, day, etc., from date columns     |
| **Handling Categorical Variables**| Encoding techniques (One-Hot, Label Encoding, etc.)      |
| **Feature Interaction**           | Creating features by combining two or more variables     |

---

In [14]:
#feature creation
import pandas as pd
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'dob': ['2000-05-15', '1998-08-22', '1995-12-10']
    })

#convert dob to datetime
df['dob'] = pd.to_datetime(df['dob'])

#create new feaature age
df['age'] = 2025 - df['dob'].dt.year

df['age']

0    25
1    27
2    30
Name: age, dtype: int32

In [17]:
#feature transformation
from sklearn.preprocessing import MinMaxScaler

df = pd.DataFrame({'salary': [10000, 20000, 30000, 40000, 50000]})

#apply
scaler = MinMaxScaler()
df['salary_sscaled'] = scaler.fit_transform(df[['salary']])

df


Unnamed: 0,salary,salary_sscaled
0,10000,0.0
1,20000,0.25
2,30000,0.5
3,40000,0.75
4,50000,1.0


In [None]:
#feature encoding
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({'city': ['Hyderabad', 'Delhi', 'Bangalore', 'Mumbai']})

#ohe
encoder = OneHotEncoder(sparse_output=False,drop='first')
encoded = encoder.fit_transform(df)

#convert to df
df_encoded = pd.DataFrame(encoded,columns=encoder.get_feature_names_out(['city']))
df_encoded

Unnamed: 0,city_Delhi,city_Hyderabad,city_Mumbai
0,0.0,1.0,0.0
1,1.0,0.0,0.0
2,0.0,0.0,0.0
3,0.0,0.0,1.0


In [35]:
#feature selection
df = pd.DataFrame({'ID': [1, 2, 3], 
                   'Name': ['Alice', 'Bob', 'Charlie'],
                     'Salary': [50000, 60000, 70000]})

#drop id column
df = df.drop(columns=['ID'])
df

Unnamed: 0,Name,Salary
0,Alice,50000
1,Bob,60000
2,Charlie,70000
