# Feature Engineering

In [5]:
import sys
import os
import pandas as pd

# Dynamically add the src/ folder to the system path
sys.path.append(os.path.abspath(os.path.join("..", "src")))

from preprocess_data import gpa_to_grade_class

# Define the output path
output_path = "../data/processed/engineered_data.csv"

# Load the cleaned dataset
df = pd.read_csv("../data/processed/cleaned_data.csv")

## 1. Applying GPA to GradeClass Transformation

We have already defined the `gpa_to_grade_class()` function in our `feature_engineering.py` script. Here, we import and apply it to convert the continuous `GPA` column into a categorical `GradeClass` variable:

- **0: 'A'** (GPA ‚â• 3.5)  
- **1: 'B'** (3.0 ‚â§ GPA < 3.5)  
- **2: 'C'** (2.5 ‚â§ GPA < 3.0)  
- **3: 'D'** (2.0 ‚â§ GPA < 2.5)  
- **4: 'F'** (GPA < 2.0)  

This target variable will be used in our classification models.

In [6]:
# üîÅ Apply the function to the GPA column
df['GradeClass'] = gpa_to_grade_class(df['GPA'])

# ‚úÖ Preview the new target column
df[['GPA', 'GradeClass']].head()

Unnamed: 0,GPA,GradeClass
0,2.929196,2
1,3.042915,1
2,0.112602,4
3,2.054218,2
4,1.288061,3


## 2)  Removing GPA
We must remove GPA from our training data, because we used it to directly calculate our target variable with the transformation step 2 to not let the model learn the wrong pattern.

In [7]:
# Dropping GPA
df.drop(columns=['GPA'], inplace=True)
print("üóëÔ∏è GPA column dropped.")

üóëÔ∏è GPA column dropped.


## 3) Engagement Score

In [None]:
flags = ['Tutoring', 'Extracurricular', 'Sports', 'Music', 'Volunteering']
df['Engagement'] = df[flags].sum(axis=1)

## 4) Family Support Index

In [None]:
df['FamilySupport'] = df['ParentalEducation'] * df['ParentalSupport']

## 5) One-hot encode all remaining categorical columns

In [None]:
 # 4) One-hot encode all remaining categorical columns
cat_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()
if cat_cols:
    df = pd.get_dummies(df, columns=cat_cols, drop_first=True)

## üíæ Save the Engineered Dataset

We save the final engineered DataFrame to the `data/processed/` folder for use in the modeling phase.

In [8]:
# Save the DataFrame
df.to_csv(output_path, index=False)

print(f"‚úÖ Engineered data saved to: {output_path}")

‚úÖ Engineered data saved to: ../data/processed/engineered_data.csv
