# Feature Engineering (Step 8)

In [1]:
import sys
import os

# Dynamically add the src/ folder to the system path
sys.path.append(os.path.abspath(os.path.join("..", "src")))

import pandas as pd

# Load the cleaned dataset
df = pd.read_csv("../data/processed/cleaned_data.csv")

### 🎯 Applying GPA to GradeClass Transformation

We have already defined the `gpa_to_grade_class()` function in our `feature_engineering.py` script. Here, we import and apply it to convert the continuous `GPA` column into a categorical `GradeClass` variable:

- **0: 'A'** (GPA ≥ 3.5)  
- **1: 'B'** (3.0 ≤ GPA < 3.5)  
- **2: 'C'** (2.5 ≤ GPA < 3.0)  
- **3: 'D'** (2.0 ≤ GPA < 2.5)  
- **4: 'F'** (GPA < 2.0)  

This target variable will be used in our classification models and GPA will be removed because it was directly used to create the target variable.

In [2]:
# 📥 Import the function from the local feature engineering file
from feature_engineering import gpa_to_grade_class

# 🔁 Apply the function to the GPA column
df['GradeClass'] = gpa_to_grade_class(df['GPA'])

# ✅ Preview the new target column
df[['GPA', 'GradeClass']].head()

Unnamed: 0,GPA,GradeClass
0,2.929196,2
1,3.042915,1
2,0.112602,4
3,2.054218,3
4,1.288061,4


In [4]:
# Dropping GPA
df.drop(columns=['GPA'], inplace=True)
print("🗑️ GPA column dropped.")

🗑️ GPA column dropped.


### 💾 Save the Engineered Dataset

We save the final engineered DataFrame to the `data/processed/` folder for use in the modeling phase.

In [5]:
# Define the output path
output_path = "../data/processed/engineered_data.csv"

# Save the DataFrame
df.to_csv(output_path, index=False)

print(f"✅ Engineered data saved to: {output_path}")


✅ Engineered data saved to: ../data/processed/engineered_data.csv
