# **What is Feature Engineering?**
 
            Feature engineering is the process of creating new features or modifying existing ones to improve the performance of machine learning models. It involves techniques like feature extraction, transformation, encoding, and scaling to make data more useful for predictions.
            
            
# **Why Do We Need Feature Engineering?**

1.**Improves Model Performance** – Good fea... by Shaik Suhel (Unverified)
Shaik Suhel (Unverified)
2:29 PM
1.**Improves Model Performance** – Good features help models make better predictions.
 
2.**Reduces Overfitting** – Helps eliminate noise and irrelevant data.
 
3.**Handles Missing Data** – Creates meaningful replacements for missing values.
 
4.**Enables Better Interpretability** – Makes features more understandable and useful.
5.**Reduces Dimensionality** – Helps remove unnecessary data points, making the model efficient.
has context menu         
            
            
            

In [3]:
import pandas as pd

# Sample dataset with 'TransactionDate'
data = {'TransactionDate': ['2025-02-01 14:30:00', '2025-02-02 18:45:00', '2025-02-03 08:15:00']}
df = pd.DataFrame(data)

# Convert 'TransactionDate' to datetime
df['TransactionDate'] = pd.to_datetime(df['TransactionDate'])

# Extract day of the week (Monday=0, Sunday=6)
df['DayOfWeek'] = df['TransactionDate'].dt.dayofweek

# Extract the hour of the day
df['Hour'] = df['TransactionDate'].dt.hour

# Create a Weekend flag (1 for weekend, 0 for weekdays)
df['IsWeekend'] = df['DayOfWeek'].apply(lambda x: 1 if x >= 5 else 0)  # 5 and 6 represent Saturday and Sunday

print(df)


      TransactionDate  DayOfWeek  Hour  IsWeekend
0 2025-02-01 14:30:00          5    14          1
1 2025-02-02 18:45:00          6    18          1
2 2025-02-03 08:15:00          0     8          0


In [4]:
df_transactions = pd.DataFrame({
    'UserID': [101, 102, 103, 104, 105],
    'TransactionAmount': [500, 300, 700, 1000, 400]
})

df_user_avg = df_transactions.groupby('UserID')['TransactionAmount'].mean().reset_index()
df_user_avg.rename(columns={'TransactionAmount': 'AvgTransactionAmount'}, inplace=True)

# Print the result
print(df_user_avg)

   UserID  AvgTransactionAmount
0     101                 500.0
1     102                 300.0
2     103                 700.0
3     104                1000.0
4     105                 400.0


In [8]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({'Product_list': ['electronics', 'clothing', 'clothing', 'grocery']})
encoder = OneHotEncoder(sparse_output=False)
encoded_features = encoder.fit_transform(df[['Product_list']])
df_encoded = pd.DataFrame(encoded_features, columns=encoder.get_feature_names_out(['Product_list']))
print(df_encoded)


   Product_list_clothing  Product_list_electronics  Product_list_grocery
0                    0.0                       1.0                   0.0
1                    1.0                       0.0                   0.0
2                    1.0                       0.0                   0.0
3                    0.0                       0.0                   1.0


In [10]:
df = pd.DataFrame({
    'TransactionAmount': [100, 2000, 5000, 100000, 2500]
})

# Apply log transformation, adding a small constant (1) to avoid log(0) issues
df['LogTransactionAmount'] = np.log(df['TransactionAmount'] + 1)

# Print the transformed DataFrame
print(df)


   TransactionAmount  LogTransactionAmount
0                100              4.615121
1               2000              7.601402
2               5000              8.517393
3             100000             11.512935
4               2500              7.824446


In [12]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
df = pd.DataFrame({
    'TransactionAmount': [100, 2000, 5000, 100000, 2500]
})

# Min-Max Scaling (Normalization)
min_max_scaler = MinMaxScaler()
df['NormalizedTransactionAmount'] = min_max_scaler.fit_transform(df[['TransactionAmount']])

# Standard Scaling (Z-score normalization)
standard_scaler = StandardScaler()
df['StandardizedTransactionAmount'] = standard_scaler.fit_transform(df[['TransactionAmount']])

print(df)

   TransactionAmount  NormalizedTransactionAmount  \
0                100                     0.000000   
1               2000                     0.019019   
2               5000                     0.049049   
3             100000                     1.000000   
4               2500                     0.024024   

   StandardizedTransactionAmount  
0                      -0.558466  
1                      -0.509837  
2                      -0.433055  
3                       1.998398  
4                      -0.497040  
