Skip to content

Portfolio that consists of my personal data analysis / data science projects

Notifications You must be signed in to change notification settings

jolie-meow/JoliePhamPortfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

JoliePhamPortfolio

Portfolio that consists of my personal data analysis / data science projects

Project 1: City of Melbourne

  • The aim is to find out which suburb is the most liveable in City of Melbourne, based on property prices, facilities and overall safety perceptions.
  • Located and merged data from 3 different sources and tidied the data using R and Tidyverse.
  • Scanned for and dealt with all missing values, inconsistencies and outliers.
  • Transformed data for better understanding and help decreasing the skewness in distribution.
  • Visualised and communicated the results with ggplot2

Project 2: Spam Detection

  • The aim is to classify text messages (sms) as either spam or ham, based on the content
  • Prepared and cleaned raw data, conducted EDA in Python using Pandas and Matplotlib to form an interpretation of data.
  • Tokenised the spam/ham sms by words using CountVectorizer.
  • Trained and evaluated KNN and Decision Tree using SMOTE

Project 3: Employee Turnover Analysis

  • The aim is to create a model that can predict groups of employees that have a high tendency of leaving company.
  • Explored the data and relationships using descriptive analytics and graphs using Python.
  • Transformed, normalised data to prepare for training models, using K-Means and DBSCAN
  • Tested models and evaluated clustering performance.

Project 4: Recommendation Engine Algorithms Evaluation

  • The aim is to evaluate the company’s new recommendation engine algorithm is worth rolling to all customers.
  • Performed hypothesis testing to calculate the significance of key results on samples.
  • Identify trends by performing linear and multiple regression analysis.
  • Created visualisations to support communication using ggplot2 and plotly packages in R.
  • Provided recommendations for future data collection and investigation to minimise bias and errors.

About

Portfolio that consists of my personal data analysis / data science projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published