# Student Performance Exploratory Data Analysis (EDA)

## Project Overview
This project explores a student performance dataset to identify patterns,
relationships, and insights that can support data-driven decisions in education.

**Internship:** CodeAlpha Data Analytics  
**Author:** Samuel


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


## Dataset Description

The dataset contains academic and behavioral information for 200 students,
including demographics, study habits, attendance, and exam scores.


In [None]:
df = pd.read_csv("student_performance_dataset.csv")
df.head()


## Questions Before Analysis

1. Does study time affect academic performance?
2. How does attendance impact exam scores?
3. Do students with internet access perform better?
4. Are there performance differences by gender?
5. Which subject has the highest average score?


## Data Understanding

In this section, we explore the structure and basic characteristics of the dataset.


In [None]:
df.shape
df.info()
df.describe()


## Data Quality Checks

We check for missing values and duplicate records to ensure data reliability.


In [None]:
df.isnull().sum()
df.duplicated().sum()


## Univariate Analysis

This section analyzes individual variables to understand their distributions.


In [None]:
plt.hist(df["Study_Hours_Per_Week"], bins=10)
plt.xlabel("Study Hours Per Week")
plt.ylabel("Number of Students")
plt.title("Distribution of Study Hours")
plt.show()


## Bivariate Analysis

This section explores relationships between two variables.


In [None]:
plt.scatter(df["Study_Hours_Per_Week"], df["Math_Score"])
plt.xlabel("Study Hours Per Week")
plt.ylabel("Math Score")
plt.title("Study Hours vs Math Score")
plt.show()


## Correlation Analysis

Correlation helps identify linear relationships between numerical variables.


In [None]:
sns.heatmap(df.select_dtypes(include="number").corr(), annot=True)
plt.title("Correlation Heatmap")
plt.show()


## Insights & Conclusion

- Higher study hours are associated with better performance.
- Attendance percentage positively impacts exam scores.
- Internet access provides a slight advantage in academic performance.
- The dataset shows no major anomalies affecting analysis.
