Supervised Learning Models

This is a course assignment for supervised machine learning models using R. This is from the Data Science and Advanced Analytics course from the Big Data & Analytics Masters @ EAE class of 2021. This assignment has three sections.

Regression Analysis for Child Carseat Sales
Classification Analysis for Breast Cancer
Classification Analysis for Iris Species

Regression Analysis for Child Carseat Sales

Given a dataset of 400 observations (locations) with 11 variables, we need to predict the sales volume.

Dataset documentation

Answer

I used Linear Regression with 8 different variable combinations. Model performance was evaluated using Mean Square Error

R script found here: regression_hands_on.R

Classification Analysis for Breast Cancer

Given a dataset of 699 observations with 11 variables, of what appears to be imaging from breast tissue. We need to train a model to predict whether the observation corresponds to a benignant or malignant class.

Dataset documentation

Answer

I used Support Vector Machines models with different kernel functions. For model evaluation purposes I added a cost matrix based on these assumptions

Conclusion: Use the model #7, as it represents the one with the lower prediction cost. Even though it has an accuracy of ~ 93% even though there are other models at higher accuracies ~ 95%

R script found here svm_hands_on_breast_cancer.R

Classification Analysis for Iris Species

Dataset of 150 observations with 4 variables and a class. The purpose isto predict the classification of the Iris species: Setosa, Versicolor, Virginica.

Dataset documentation

Answer I also used Support Vector Machines models. When doing the variable analysis, by eyeballing the distribution of the species in variable pairs, it looks like Sepal Width and Sepal Length are good input variables. From the different kernel functions tested, I went with the Polynomial Degree 3, Gamma 2.5. Another interesting takeaway from this assignment was to use the plot feature to visualize observations vs prediction.

R script found here svm_hands_on_flowers.R

Professor

Marta Tolós

Professor Assistants

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
RegressionLab_joseph_higaki.Rproj		RegressionLab_joseph_higaki.Rproj
dummny_miscellaneaous.R		dummny_miscellaneaous.R
josephs_functions.R		josephs_functions.R
readme.md		readme.md
regression_hands_on.R		regression_hands_on.R
result_models.csv		result_models.csv
svm_hands_on_breast_cancer.R		svm_hands_on_breast_cancer.R
svm_hands_on_flowers.R		svm_hands_on_flowers.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised Learning Models

Regression Analysis for Child Carseat Sales

Classification Analysis for Breast Cancer

Classification Analysis for Iris Species

Professor

About

Releases

Packages

Languages

joseph-higaki/supervised-learning-R

Folders and files

Latest commit

History

Repository files navigation

Supervised Learning Models

Regression Analysis for Child Carseat Sales

Classification Analysis for Breast Cancer

Classification Analysis for Iris Species

Professor

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages