A comprehensive machine learning project that predicts student exam scores based on various academic, social, and personal factors using Neural Networks and Linear Regression.
- Overview
- Dataset
- Features
- Installation
- Usage
- Model Architecture
- Results
- Key Insights
- Project Structure
- Dependencies
- Contributing
This project aims to help educators and administrators identify factors that most significantly impact student academic performance. By analyzing 19 different features ranging from study habits to socioeconomic factors, the model can:
- Predict exam scores for individual students
- Identify at-risk students who may need additional support
- Reveal which factors have the strongest influence on academic success
- Compare different modeling approaches (Linear Regression vs. Neural Network)
The project uses the Student Performance Factors dataset containing:
- 6,608 student records
- 19 input features (numerical and categorical)
- 1 target variable (Exam_Score)
Academic Factors:
- Hours Studied
- Attendance
- Previous Scores
- Tutoring Sessions
Social Factors:
- Parental Involvement
- Peer Influence
- Extracurricular Activities
- Teacher Quality
Personal Factors:
- Sleep Hours
- Motivation Level
- Learning Disabilities
- Physical Activity
Socioeconomic Factors:
- Family Income
- Access to Resources
- Internet Access
- Parental Education Level
- Distribution analysis of exam scores
- Correlation matrix for numerical features
- Impact analysis of categorical factors
- Visual insights with matplotlib and seaborn
- Automated handling of missing values
- Feature scaling (StandardScaler)
- One-hot encoding for categorical variables
- Prevention of data leakage with proper train-test splitting
- Establishes performance benchmark
- Feature importance analysis
- Top 15 most influential factors visualization
- Multi-layer architecture with dropout regularization
- Early stopping to prevent overfitting
- Superior performance over baseline
- Easy-to-use prediction function
- Support for both models (Neural Network & Linear Regression)
- Example student profiles (High Performer, Average, At-Risk)
- Python 3.8 or higher
- pip package manager
-
Clone or download this repository:
cd "c:\Users\Royal\Desktop\ML Project"
-
Install required packages:
pip install -r requirements.txt
-
Verify TensorFlow installation:
python -c "import tensorflow as tf; print(tf.__version__)"
-
Launch Jupyter Notebook:
jupyter notebook Student_Performance_Prediction.ipynb
-
Run all cells sequentially to:
- Load and explore the data
- Preprocess features
- Train both models
- View performance comparisons
- Generate predictions
Use the predict_student_score() function:
# Example student profile
new_student = {
'Hours_Studied': 20,
'Attendance': 85,
'Parental_Involvement': 'High',
'Access_to_Resources': 'High',
'Extracurricular_Activities': 'Yes',
'Sleep_Hours': 7,
'Previous_Scores': 78,
'Motivation_Level': 'High',
'Internet_Access': 'Yes',
'Tutoring_Sessions': 2,
'Family_Income': 'Medium',
'Teacher_Quality': 'High',
'School_Type': 'Public',
'Peer_Influence': 'Positive',
'Physical_Activity': 3,
'Learning_Disabilities': 'No',
'Parental_Education_Level': 'College',
'Distance_from_Home': 'Near',
'Gender': 'Female'
}
# Get prediction
predicted_score = predict_student_score(new_student, use_model='neural_network')
print(f"Predicted Exam Score: {predicted_score}")Input Layer: 64 features (after preprocessing)
β
Hidden Layer 1: 64 neurons (ReLU activation)
β
Dropout Layer: 20% dropout rate
β
Hidden Layer 2: 32 neurons (ReLU activation)
β
Output Layer: 1 neuron (Linear activation for regression)
Training Configuration:
- Optimizer: Adam
- Loss Function: Mean Squared Error (MSE)
- Metrics: Mean Absolute Error (MAE)
- Batch Size: 32
- Max Epochs: 100
- Early Stopping: Patience of 10 epochs
| Model | RΒ² Score | MAE | RMSE |
|---|---|---|---|
| Linear Regression | ~0.98 | ~2.0 | ~2.5 |
| Neural Network | ~0.99 | ~1.5 | ~2.0 |
Neural Network Improvement:
- MAE improvement: ~25%
- RMSE improvement: ~20%
Note: Exact metrics may vary slightly based on random seed and training run.
The notebook includes several visualizations:
- Distribution plots for exam scores
- Correlation heatmaps showing feature relationships
- Box plots for categorical feature impact
- Training history showing loss and MAE over epochs
- Actual vs. Predicted scatter plots for both models
- Feature importance bar charts
- Hours Studied - Strongest positive predictor of exam success
- Attendance - High correlation with better performance
- Previous Scores - Strong indicator of future performance
- Motivation Level - Significant impact on outcomes
- Parental Involvement - Associated with more consistent scores
- Actionable Interventions: Schools can focus on improving attendance and study habits
- Early Warning System: Identify at-risk students based on multiple factors
- Resource Allocation: Target support to students with low parental involvement or limited resources
- Non-Linear Relationships: Neural network captures complex interactions between factors
ML Project/
β
βββ Student_Performance_Prediction.ipynb # Main notebook with analysis
βββ StudentPerformanceFactors.csv # Dataset (6,608 records)
βββ requirements.txt # Python dependencies
βββ reconstruct_notebook.py # Utility script
βββ Altair_AI_Studio_Guide.md # Additional documentation
βββ README.md # This file
- pandas - Data manipulation and analysis
- numpy - Numerical computing
- matplotlib - Data visualization
- seaborn - Statistical data visualization
- scikit-learn - Machine learning algorithms and preprocessing
- tensorflow - Deep learning framework for neural networks
Install all dependencies with:
pip install -r requirements.txtContributions are welcome! Here are some ways to improve this project:
- Add more advanced models (XGBoost, Random Forest, etc.)
- Implement cross-validation for more robust evaluation
- Create a web interface for easy predictions
- Add feature engineering techniques
- Include hyperparameter tuning
This project is available for educational and research purposes.
Created as a comprehensive machine learning project for student performance analysis.
- Dataset: Student Performance Factors
- Libraries: TensorFlow, scikit-learn, pandas, matplotlib, seaborn
Last Updated: January 10, 2026
For questions or suggestions, please open an issue or contact the project maintainer.