### Various Feature Engineering Techniques, each relevant to the banking sector:

### 1. **Imputation**

**Scenario:** A bank's dataset has missing values in the `Credit_Score` column, and we need to handle these missing values.

**Example:**

- **Original Data:**
  - `Customer_ID`
  - `Credit_Score`

- **Feature Engineering:**
  - **Imputation:** Use the mean of the `Credit_Score` column to fill missing values.

In [1]:
import pandas as pd
from sklearn.impute import SimpleImputer

# Sample DataFrame
df = pd.DataFrame({
    'Customer_ID': [1, 2, 3, 4],
    'Credit_Score': [650, None, 700, None]
})

# Imputation
imputer = SimpleImputer(strategy='mean')
df['Credit_Score'] = imputer.fit_transform(df[['Credit_Score']])
print(df)

   Customer_ID  Credit_Score
0            1         650.0
1            2         675.0
2            3         700.0
3            4         675.0


### 2. **Binning**

**Scenario:** A bank wants to categorize `Annual_Income` into income brackets.

**Example:**

- **Original Feature:**
  - `Annual_Income`

- **Feature Engineering:**
  - **Binning:** Create income brackets (e.g., `Low`, `Medium`, `High`).

In [3]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Annual_Income': [30000, 50000, 70000, 90000]
})

# Feature Binning
bins = [0, 40000, 60000, 80000, float('inf')]
labels = ['Low', 'Medium', 'High', 'Very High']
df['Income_Bracket'] = pd.cut(df['Annual_Income'], bins=bins, labels=labels)
print(df)

   Annual_Income Income_Bracket
0          30000            Low
1          50000         Medium
2          70000           High
3          90000      Very High


### 3. **Ordinary Encoding**

**Scenario:** Encode the `Employment_Status` column with ordinal values.

**Example:**

- **Original Feature:**
  - `Employment_Status` (e.g., `Self-Employed`, `Employed`, `Unemployed`)

- **Feature Engineering:**
  - **Ordinary Encoding:** Assign numerical values to ordinal categories.

In [5]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Employment_Status': ['Self-Employed', 'Employed', 'Unemployed']
})

# Ordinal Encoding
ordinal_map = {'Unemployed': 0, 'Employed': 1, 'Self-Employed': 2}
df['Employment_Status_Encoded'] = df['Employment_Status'].map(ordinal_map)
print(df)

  Employment_Status  Employment_Status_Encoded
0     Self-Employed                          2
1          Employed                          1
2        Unemployed                          0


### 4. **One-Hot Encoding**

**Scenario:** Encode the `Loan_Type` column into binary features.

**Example:**

- **Original Feature:**
  - `Loan_Type` (e.g., `Home Loan`, `Auto Loan`, `Personal Loan`)

- **Feature Engineering:**
  - **One-Hot Encoding:** Convert categorical feature into binary columns.


In [7]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Loan_Type': ['Home Loan', 'Auto Loan', 'Personal Loan']
})

# One-Hot Encoding
df_encoded = pd.get_dummies(df, columns=['Loan_Type'])
print(df_encoded)

   Loan_Type_Auto Loan  Loan_Type_Home Loan  Loan_Type_Personal Loan
0                False                 True                    False
1                 True                False                    False
2                False                False                     True


### 5. **Feature Splitting**

**Scenario:** Split `Transaction_Date` into `Year` and `Month`.

**Example:**

- **Original Feature:**
  - `Transaction_Date`

- **Feature Engineering:**
  - **Feature Splitting:** Extract year and month from date.

In [8]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Transaction_Date': pd.to_datetime(['2024-01-01', '2024-02-15', '2024-03-20'])
})

# Feature Splitting
df['Year'] = df['Transaction_Date'].dt.year
df['Month'] = df['Transaction_Date'].dt.month
print(df)

  Transaction_Date  Year  Month
0       2024-01-01  2024      1
1       2024-02-15  2024      2
2       2024-03-20  2024      3


### 6. **Handling Outliers**

**Scenario:** Identify and handle outliers in the `Transaction_Amount` column.

**Example:**

- **Original Feature:**
  - `Transaction_Amount`

- **Feature Engineering:**
  - **Handling Outliers:** Remove outliers using the IQR method.

In [10]:
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Transaction_Amount': [100, 200, 300, 10000, 500]
})

# Handling Outliers using IQR
Q1 = df['Transaction_Amount'].quantile(0.25)
Q3 = df['Transaction_Amount'].quantile(0.75)
IQR = Q3 - Q1
df_no_outliers = df[(df['Transaction_Amount'] >= (Q1 - 1.5 * IQR)) & (df['Transaction_Amount'] <= (Q3 + 1.5 * IQR))]
print(df_no_outliers)

   Transaction_Amount
0                 100
1                 200
2                 300
4                 500


### 7. **Transformations**

**Scenario:** Apply transformations to `Transaction_Amount` to handle skewness.

**Example:**

- **Original Feature:**
  - `Transaction_Amount`

- **Feature Engineering:**
  - **Transformation:** Apply log transformation to reduce skewness.

In [12]:
import pandas as pd
import numpy as np

# Sample DataFrame
df = pd.DataFrame({
    'Transaction_Amount': [100, 200, 300, 10000, 500]
})

# Feature Transformation
df['Log_Transaction_Amount'] = np.log(df['Transaction_Amount'] + 1)
print(df)

   Transaction_Amount  Log_Transaction_Amount
0                 100                4.615121
1                 200                5.303305
2                 300                5.707110
3               10000                9.210440
4                 500                6.216606


### 8. **Scaling (Normalization & Standardization)**

**Scenario:** Normalize and standardize features for machine learning models.

**Example:**

- **Original Features:**
  - `Annual_Income`
  - `Total_Debt`

- **Feature Engineering:**
  - **Normalization:** Scale features to a range [0, 1].
  - **Standardization:** Scale features to have zero mean and unit variance.

In [13]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Sample DataFrame
df = pd.DataFrame({
    'Annual_Income': [30000, 50000, 70000],
    'Total_Debt': [10000, 20000, 30000]
})

# Normalization
scaler = MinMaxScaler()
df_normalized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print("Normalized Data:")
print(df_normalized)

# Standardization
scaler = StandardScaler()
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print("\nStandardized Data:")
print(df_standardized)

Normalized Data:
   Annual_Income  Total_Debt
0            0.0         0.0
1            0.5         0.5
2            1.0         1.0

Standardized Data:
   Annual_Income  Total_Debt
0      -1.224745   -1.224745
1       0.000000    0.000000
2       1.224745    1.224745


### Summary

- **Imputation:** Filling missing values using the mean.
- **Binning:** Categorizing continuous variables into bins.
- **Ordinary Encoding:** Encoding ordinal categorical variables.
- **One-Hot Encoding:** Converting categorical variables into binary columns.
- **Feature Splitting:** Breaking down features into more granular parts.
- **Handling Outliers:** Removing outliers using the IQR method.
- **Transformations:** Applying transformations like log to reduce skewness.
- **Scaling:** Normalizing and standardizing features for model training.

These examples cover a range of feature engineering techniques applicable to banking and finance data, helping to prepare data for better model performance and insights.