* Feature Engineering 
    - A feature is a measurable property of a data point.
    - It can be:-
        - Numerical
        - Categorical
        - Text-based

    - Why is Feature Engineering important?

* Feature Engineering Process in loop: 
    1. Feature Creation
    2. Feature Transformation
    3. Feature Extraction
    4. Feature Selection
    5. Feature Scaling

1. Feature Creation:-

In [None]:
import pandas as pd 
import numpy as np 

# Sample dataset
data = {'Height' : [150, 160, 170, 180], 'Weight' : [55, 65, 75, 85]}
df = pd.DataFrame(data)

# Creating a new feature: Body Mass Index(BMI) 
df['BMI'] = df['Weight'] / (df['Height'] / 100) ** 2
print("Feature Creation - BMI: \n", df)

2. Feature Transformation - Modifying existing features to improve their suitability for machine learning, such as normalization or encoding.

In [None]:
from sklearn.preprocessing import LabelEncoder 

# Categorical feature transformation
categories = ['low', 'medium', 'high']
df['Category'] = np.random.choice(categories, size = len(df))
label_encoder = LabelEncoder()
df['Category_Encoder'] = label_encoder.fit_transform(df['Category'])
print("\nFeature Transformation - Encoded Categories: \n", df)

3. Feature Extraction - Deriving useful information from raw data, often reducing dimensionality while retaining key patterns.

In [None]:
from sklearn.decomposition import PCA

# Generating some artificial features
df['Feature1'] = df['Height'] * 0.5
df['Feature2'] = df['Weight'] * 0.3
features = ['Feature1', 'Feature2']
pca = PCA(n_components = 1)
df['Extracted_Feature'] = pca.fit_transform(df[features])
print("\nFeature Extraction - PCA Reduced Feature: \n", df)

4. Feature Selection - Choosing the most relevant features to improve performance and reduce overfitting.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel

# Generating target variable
df['Target'] = np.random.choice([0, 1], size = len(df))
x = df[['Height', 'Weight', 'BMI', 'Feature1', 'Feature2']]
y = df['Target']
selector = SelectFromModel(RandomForestClassifier(n_estimators = 100))
selector.fit(x, y)
selected_features = x.columns[selector.get_support()]
print("\nFeature Selection - Selected Features: ", selected_features)

5. Feature Scaling - Standardizing feature values to a common scale to ensure balanced model training and better numerical stability.

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(x), columns = x.columns)
print("\nFeature Scaling - Standardized Features: \n", df_scaled)