Skip to content

syogesh160/codealpha-Tasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

📊 Student Performance – Exploratory Data Analysis (EDA) 📌 Project Overview

This project performs Exploratory Data Analysis (EDA) on a Student Performance dataset to understand the factors that influence students’ academic performance. The analysis uses Python, Pandas, Matplotlib, and Seaborn to identify patterns, trends, relationships, and anomalies in the data.

🎯 Objectives

The main objectives of this EDA project are:

To understand the structure and quality of the dataset

To identify key factors affecting student performance

To detect missing values, duplicates, and outliers

To analyze relationships between academic, lifestyle, and performance variables

To validate basic statistical assumptions using visualizations

📂 Dataset Description

The dataset contains information related to students’ academic background and habits.

🔑 Columns Used: Column Name Description Data Type Previous Score Student’s previous academic score Numerical Extracurricular Activities Participation in extracurricular activities (Yes/No) Categorical Sleep Hours Average sleep hours per day Numerical Sample Question Papers Practiced Number of practice papers solved Numerical Performance Index Overall performance score Numerical (Target Variable) 🛠️ Tools & Libraries

Python

Pandas

NumPy

Matplotlib

Seaborn

🔍 Exploratory Data Analysis Steps 1️⃣ Data Understanding

Dataset shape (rows & columns)

Data types and column meanings

Summary statistics

2️⃣ Data Cleaning

Missing value detection

Duplicate row identification

Data consistency checks

3️⃣ Univariate Analysis

Distribution of numerical variables using histograms

Frequency analysis of categorical variables

4️⃣ Outlier Detection

Box plots used to detect extreme values

5️⃣ Bivariate Analysis

Correlation analysis between numerical variables

Group-wise comparison using categorical variables

6️⃣ Hypothesis Testing & Assumption Validation

Mean comparison across categories

Correlation coefficient analysis

Normality and linearity checks using visualizations

📈 Sample Analysis Code Group-wise Mean Analysis df.groupby('Extracurricular Activities')['Performance Index'].mean()

Correlation Analysis df['Sleep Hours'].corr(df['Performance Index'])

Distribution Check sns.histplot(df['Performance Index'], kde=True) plt.show()

Linearity Check sns.regplot(x='Sleep Hours', y='Performance Index', data=df) plt.show()

📊 Key Insights

Students with extracurricular activities show variation in performance

Sleep hours and previous scores have a noticeable relationship with performance

Outliers are present and can affect modeling results

Data is suitable for further statistical analysis and predictive modeling

🚀 Conclusion

This EDA provides a strong foundation for:

Feature selection

Predictive modeling

Academic performance analysis

The insights gained help understand how study habits and lifestyle factors influence student outcomes.

👤 Author

Yogesh S Aspiring Data Analyst 📌 Skills: Python, Data Analysis, EDA, Visualization

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors