Machine Learning Study

Alekzander Green

Purpose

This project was a deep-dive into different techniques in the world of data analysis and visualization. In the project, I use and compare many different techniques while analyzing a dataset containing traits for many students and their respective grades at a Portuguese school. The primary goal of the project was to analyze, compare, and study data analysis and ML techniques and the secondary goal was to see if some correlation could between grades and parental traits could be found and predicted by a machine learning model.

Techniques Studied

Literature Review

While often overlooked, literature review is one of the most important steps and should be the start to any good analysis.

This was of particular importance for this project because it takes place in another country, so some cultural differences needed to be accounted for.

Descriptive Storytelling

Another often overlooked technique is descriptive storytelling. This involves looking at data and descriptively talking about it in order to make sense of it more clearly.

I attempt to do this throughout the study to not only explain the trends but also explain the process and actions I took.

Exploratory Data Analysis (EDA)

EDA is the initial exploration of the data. This allows a researcher to more deeply understand the data they are working with and clean it up if necessary. It also provides initial statistics which can be useful for finding bad data or performing other analyses.

After reviewing the study, I began an in-depth EDA where I reviewed the data, analyzed basic statistical information, made initial observations, developed basic visualizations, and performed initial feature engineering to prepare for the next steps.

Feature Engineering

Feature engineering is performed as part of EDA, but warrants its own explanation. Feature engineering includes feature selection, level reduction, encoding, and dimensionality reduction. These processes allow a researcher to ignore irrelevant features, simplify data, and prepare it for many different machine learning techniques.

As part of this project, I used techniques like removing irrelevant features, target level reduction, dimensionality reduction with PCA, Label encoding, and One-Hot encoding to make the data set easier to work with.

ML Model Selection

Selecting an appropriate ML model is crucial to good analysis. As such, part of this project included reviewing and studing multiple models to identify which ones would serve best.

I studied 5 ML classification models: Decision Trees, Random Forests, KNN, Naive Bayes, SVM, and Logistical Regression. Due to the nature of the project, I decided to move forward with the ones that acted least like black boxes.

Classification Models

This project's secondary goal was to determine if one could use ML to predict if a student might pass or fail a class based on certain features. Due to this, a classification model was needed. Alternatively, regression could have been done on the original continuous target values, but this granularity is unnecessary and changes which models can be studied.

Unfortunately, the outputs suspiciously arrived at the same values after training. This is a large issue that was not addressed properly in the study.

Model Tuning

After initial outputs, the model needed to be tuned to try to find a better outcome either by adjusting model parameters or adjusting input data.

While model tuning was performed using Cross-validation, models converged on the same value. Again, this is incredibly concerning and likely shows some fault in the process which was not addressed properly in the study.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
docs		docs
Final_Report.pdf		Final_Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Study

Purpose

Techniques Studied

Literature Review

Descriptive Storytelling

Exploratory Data Analysis (EDA)

Feature Engineering

ML Model Selection

Classification Models

Model Tuning

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Study

Purpose

Techniques Studied

Literature Review

Descriptive Storytelling

Exploratory Data Analysis (EDA)

Feature Engineering

ML Model Selection

Classification Models

Model Tuning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages