Class projects: "Statistical Learning for Engineers"
This project is an individual semester-long project for IE 7300 Statistical Learning for Engineering class. The project objective is to design, implement, evaluate, and validate machine learning models using Python programming. The models are based on regression. The business problem to be addressed is training the model and predicting the target variable values. The objective of this project is to perform statistical analysis on this dataset to find a regression equation to predict the fit and and conclude which regression equation is the best fit for each parameter.
The chosen dataset is “The Beijing Air Quality Data” from the UCI Machine Learning Repository. The dataset can be found here.
The code is sourced from multiple blog posts and youtube, referenced individually in the ipynb file. General references for codes are shown below:
- The coding results are compared with peer results (mainly using sklearn library in Kaggle.
- For background of ML from scratch and basic understanding: Patrick Loeber files
- For alternative code of ML from scratch: Milan
- statmodels code contains posted code from the class with several modifications