Skip to content

moashebl/Student_Performance_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ“ Student Performance Prediction

A comprehensive machine learning project that predicts student exam scores based on various academic, social, and personal factors using Neural Networks and Linear Regression.

πŸ“‹ Table of Contents

🎯 Overview

This project aims to help educators and administrators identify factors that most significantly impact student academic performance. By analyzing 19 different features ranging from study habits to socioeconomic factors, the model can:

  • Predict exam scores for individual students
  • Identify at-risk students who may need additional support
  • Reveal which factors have the strongest influence on academic success
  • Compare different modeling approaches (Linear Regression vs. Neural Network)

πŸ“Š Dataset

The project uses the Student Performance Factors dataset containing:

  • 6,608 student records
  • 19 input features (numerical and categorical)
  • 1 target variable (Exam_Score)

Key Features Include:

Academic Factors:

  • Hours Studied
  • Attendance
  • Previous Scores
  • Tutoring Sessions

Social Factors:

  • Parental Involvement
  • Peer Influence
  • Extracurricular Activities
  • Teacher Quality

Personal Factors:

  • Sleep Hours
  • Motivation Level
  • Learning Disabilities
  • Physical Activity

Socioeconomic Factors:

  • Family Income
  • Access to Resources
  • Internet Access
  • Parental Education Level

πŸš€ Features

1. Exploratory Data Analysis (EDA)

  • Distribution analysis of exam scores
  • Correlation matrix for numerical features
  • Impact analysis of categorical factors
  • Visual insights with matplotlib and seaborn

2. Data Preprocessing Pipeline

  • Automated handling of missing values
  • Feature scaling (StandardScaler)
  • One-hot encoding for categorical variables
  • Prevention of data leakage with proper train-test splitting

3. Baseline Model (Linear Regression)

  • Establishes performance benchmark
  • Feature importance analysis
  • Top 15 most influential factors visualization

4. Neural Network Model

  • Multi-layer architecture with dropout regularization
  • Early stopping to prevent overfitting
  • Superior performance over baseline

5. Prediction System

  • Easy-to-use prediction function
  • Support for both models (Neural Network & Linear Regression)
  • Example student profiles (High Performer, Average, At-Risk)

πŸ’» Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Setup

  1. Clone or download this repository:

    cd "c:\Users\Royal\Desktop\ML Project"
  2. Install required packages:

    pip install -r requirements.txt
  3. Verify TensorFlow installation:

    python -c "import tensorflow as tf; print(tf.__version__)"

πŸ“– Usage

Running the Notebook

  1. Launch Jupyter Notebook:

    jupyter notebook Student_Performance_Prediction.ipynb
  2. Run all cells sequentially to:

    • Load and explore the data
    • Preprocess features
    • Train both models
    • View performance comparisons
    • Generate predictions

Making Predictions

Use the predict_student_score() function:

# Example student profile
new_student = {
    'Hours_Studied': 20,
    'Attendance': 85,
    'Parental_Involvement': 'High',
    'Access_to_Resources': 'High',
    'Extracurricular_Activities': 'Yes',
    'Sleep_Hours': 7,
    'Previous_Scores': 78,
    'Motivation_Level': 'High',
    'Internet_Access': 'Yes',
    'Tutoring_Sessions': 2,
    'Family_Income': 'Medium',
    'Teacher_Quality': 'High',
    'School_Type': 'Public',
    'Peer_Influence': 'Positive',
    'Physical_Activity': 3,
    'Learning_Disabilities': 'No',
    'Parental_Education_Level': 'College',
    'Distance_from_Home': 'Near',
    'Gender': 'Female'
}

# Get prediction
predicted_score = predict_student_score(new_student, use_model='neural_network')
print(f"Predicted Exam Score: {predicted_score}")

πŸ—οΈ Model Architecture

Neural Network Specifications

Input Layer: 64 features (after preprocessing)
    ↓
Hidden Layer 1: 64 neurons (ReLU activation)
    ↓
Dropout Layer: 20% dropout rate
    ↓
Hidden Layer 2: 32 neurons (ReLU activation)
    ↓
Output Layer: 1 neuron (Linear activation for regression)

Training Configuration:

  • Optimizer: Adam
  • Loss Function: Mean Squared Error (MSE)
  • Metrics: Mean Absolute Error (MAE)
  • Batch Size: 32
  • Max Epochs: 100
  • Early Stopping: Patience of 10 epochs

πŸ“ˆ Results

Model Performance Comparison

Model RΒ² Score MAE RMSE
Linear Regression ~0.98 ~2.0 ~2.5
Neural Network ~0.99 ~1.5 ~2.0

Neural Network Improvement:

  • MAE improvement: ~25%
  • RMSE improvement: ~20%

Note: Exact metrics may vary slightly based on random seed and training run.

Visualization

The notebook includes several visualizations:

  • Distribution plots for exam scores
  • Correlation heatmaps showing feature relationships
  • Box plots for categorical feature impact
  • Training history showing loss and MAE over epochs
  • Actual vs. Predicted scatter plots for both models
  • Feature importance bar charts

πŸ’‘ Key Insights

Top 5 Most Influential Factors:

  1. Hours Studied - Strongest positive predictor of exam success
  2. Attendance - High correlation with better performance
  3. Previous Scores - Strong indicator of future performance
  4. Motivation Level - Significant impact on outcomes
  5. Parental Involvement - Associated with more consistent scores

Educational Implications:

  • Actionable Interventions: Schools can focus on improving attendance and study habits
  • Early Warning System: Identify at-risk students based on multiple factors
  • Resource Allocation: Target support to students with low parental involvement or limited resources
  • Non-Linear Relationships: Neural network captures complex interactions between factors

πŸ“ Project Structure

ML Project/
β”‚
β”œβ”€β”€ Student_Performance_Prediction.ipynb    # Main notebook with analysis
β”œβ”€β”€ StudentPerformanceFactors.csv           # Dataset (6,608 records)
β”œβ”€β”€ requirements.txt                         # Python dependencies
β”œβ”€β”€ reconstruct_notebook.py                 # Utility script
β”œβ”€β”€ Altair_AI_Studio_Guide.md              # Additional documentation
└── README.md                               # This file

πŸ“¦ Dependencies

  • pandas - Data manipulation and analysis
  • numpy - Numerical computing
  • matplotlib - Data visualization
  • seaborn - Statistical data visualization
  • scikit-learn - Machine learning algorithms and preprocessing
  • tensorflow - Deep learning framework for neural networks

Install all dependencies with:

pip install -r requirements.txt

🀝 Contributing

Contributions are welcome! Here are some ways to improve this project:

  • Add more advanced models (XGBoost, Random Forest, etc.)
  • Implement cross-validation for more robust evaluation
  • Create a web interface for easy predictions
  • Add feature engineering techniques
  • Include hyperparameter tuning

πŸ“„ License

This project is available for educational and research purposes.

πŸ‘€ Author

Created as a comprehensive machine learning project for student performance analysis.

πŸ™ Acknowledgments

  • Dataset: Student Performance Factors
  • Libraries: TensorFlow, scikit-learn, pandas, matplotlib, seaborn

Last Updated: January 10, 2026

For questions or suggestions, please open an issue or contact the project maintainer.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors