This project aims to predict student performance based on various factors such as Gender, Ethnicity, Parental Level of Education, Lunch, and Test Preparation Course. The goal is to build a robust Machine Learning model using Python to predict student scores.
This project builds a Student Performance Predictor using Machine Learning techniques. It includes steps such as data ingestion, data transformation, model training, and prediction using the pipelines.
- Source: Kaggle - Student Performance Dataset
- Size: 8 columns, 1000 rows
- Features: Gender, Ethnicity, Parental Level of Education, Lunch, Test Preparation Course, and Test Scores
To get started, follow these steps:
- Clone the repository:
git clone https://github.com/sujeetgund/mlproject-udemy.git
cd mlproject-udemy- Create a virtual environment and activate it:
python -m venv env
source env/bin/activate # For Linux/macOS
env\Scripts\activate # For Windows- Install the dependencies:
pip install -r requirements.txt- Install the project using:
pip install -e .After installation, a folder named ml_project_udemy.egg-info will be created.
mlproject-udemy/
β
βββ README.md
βββ setup.py
βββ requirements.txt
βββ logs/
β βββ *.txt # Log files
βββ artifacts/
β βββ raw_data.csv
β βββ train.csv
β βββ test.csv
β βββ preprocessor.pkl # Saved preprocessor after transformation
β βββ model.pkl # Trained model file
βββ notebooks/
β βββ eda.ipynb
β βββ model_training.ipynb
β βββ data/
β βββ stud.csv
βββ ml_project_udemy.egg-info/
βββ src/
β βββ __init__.py
β βββ logger.py
β βββ exception.py
β βββ utils.py
β βββ components/
β β βββ __init__.py
β β βββ data_ingestion.py
β β βββ data_transformation.py
β β βββ model_trainer.py
β βββ pipelines/
β βββ __init__.py
β βββ train_pipeline.py
β βββ prediction_pipeline.py
βββ streamlit_app.py
logger.py: Handles logging for tracking events, stored in thelogsfolder.exception.py: Custom exception handling.utils.py: Utility functions for data processing.data_ingestion.py: Handles data loading. After running, theartifactsfolder will contain:raw_data.csv: The original dataset.train.csv: Training data split.test.csv: Testing data split.
data_transformation.py: Prepares and transforms data for modeling. After running, it generates:preprocessor.pkl: The saved preprocessor object.- Transformed train and test data arrays.
model_trainer.py: Trains multiple machine learning models, selects the best one based on R2 score, and saves it asmodel.pkl.train_pipeline.py: End-to-end pipeline for training.prediction_pipeline.py: Pipeline for making predictions. You can modifyprediction_pipeline.pyto use different student data for predictions.streamlit_app.py: Interactive web app using Streamlit to input custom data and get predictions.notebooks/eda.ipynb: Exploratory Data Analysis notebook.notebooks/model_training.ipynb: Model training and evaluation notebook.notebooks/data/stud.csv: Student performance dataset.
To run the full project, execute:
python src/pipelines/train_pipeline.pyThis will handle data ingestion, transformation, and model training.
To make predictions, execute:
python src/pipelines/prediction_pipeline.pyIf you want to predict using different student data, modify the following section inside predict_pipeline.py:
students_data = CustomData(
records=[
StudentExamRecord(
gender="male",
race_ethnicity="group B",
parental_level_of_education="some college",
lunch="standard",
test_preparation_course="none",
reading_score=72,
writing_score=83,
),
StudentExamRecord(
gender="female",
race_ethnicity="group C",
parental_level_of_education="bachelor's degree",
lunch="free/reduced",
test_preparation_course="completed",
reading_score=88,
writing_score=92,
),
]
)You can launch a user-friendly interface using Streamlit:
streamlit run streamlit_app.pyThis app allows you to:
- Input custom student data through a form.
- Get predicted math scores using the trained model.
- Trigger model training from the UI.
- Gender:
male,female - Race/Ethnicity:
group A,group B,group C,group D,group E - Parental Level of Education:
some high school,high school,some college,associate's degree,bachelor's degree,master's degree - Lunch:
standard,free/reduced - Test Preparation Course:
none,completed - Reading & Writing Scores: Integer values between 0 and 100
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature). - Commit your changes (
git commit -m 'Add some feature'). - Push to your branch (
git push origin feature/YourFeature). - Open a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.