Skip to content

greead/ML_study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Study

Alekzander Green

Purpose

This project was a deep-dive into different techniques in the world of data analysis and visualization. In the project, I use and compare many different techniques while analyzing a dataset containing traits for many students and their respective grades at a Portuguese school. The primary goal of the project was to analyze, compare, and study data analysis and ML techniques and the secondary goal was to see if some correlation could between grades and parental traits could be found and predicted by a machine learning model.

Techniques Studied

Literature Review

While often overlooked, literature review is one of the most important steps and should be the start to any good analysis.

This was of particular importance for this project because it takes place in another country, so some cultural differences needed to be accounted for.

Descriptive Storytelling

Another often overlooked technique is descriptive storytelling. This involves looking at data and descriptively talking about it in order to make sense of it more clearly.

I attempt to do this throughout the study to not only explain the trends but also explain the process and actions I took.

Exploratory Data Analysis (EDA)

EDA is the initial exploration of the data. This allows a researcher to more deeply understand the data they are working with and clean it up if necessary. It also provides initial statistics which can be useful for finding bad data or performing other analyses.

After reviewing the study, I began an in-depth EDA where I reviewed the data, analyzed basic statistical information, made initial observations, developed basic visualizations, and performed initial feature engineering to prepare for the next steps.

Feature Engineering

Feature engineering is performed as part of EDA, but warrants its own explanation. Feature engineering includes feature selection, level reduction, encoding, and dimensionality reduction. These processes allow a researcher to ignore irrelevant features, simplify data, and prepare it for many different machine learning techniques.

As part of this project, I used techniques like removing irrelevant features, target level reduction, dimensionality reduction with PCA, Label encoding, and One-Hot encoding to make the data set easier to work with.

ML Model Selection

Selecting an appropriate ML model is crucial to good analysis. As such, part of this project included reviewing and studing multiple models to identify which ones would serve best.

I studied 5 ML classification models: Decision Trees, Random Forests, KNN, Naive Bayes, SVM, and Logistical Regression. Due to the nature of the project, I decided to move forward with the ones that acted least like black boxes.

Classification Models

This project's secondary goal was to determine if one could use ML to predict if a student might pass or fail a class based on certain features. Due to this, a classification model was needed. Alternatively, regression could have been done on the original continuous target values, but this granularity is unnecessary and changes which models can be studied.

Unfortunately, the outputs suspiciously arrived at the same values after training. This is a large issue that was not addressed properly in the study.

Model Tuning

After initial outputs, the model needed to be tuned to try to find a better outcome either by adjusting model parameters or adjusting input data.

While model tuning was performed using Cross-validation, models converged on the same value. Again, this is incredibly concerning and likely shows some fault in the process which was not addressed properly in the study.

About

CS567 Term Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors