This project demonstrates how to analyze and visualize a large, realistic student dataset using Python libraries such as Pandas, NumPy, and Matplotlib.
It includes data cleaning, feature engineering, encoding, and multiple visualizations to uncover insights into student performance.
- Create a synthetic dataset for 1000 students.
- Clean and preprocess data (handle duplicates, missing values).
- Generate meaningful new columns for better analysis.
- Encode categorical data for numerical processing.
- Visualize performance trends across different departments.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt