JoliePhamPortfolio

Portfolio that consists of my personal data analysis / data science projects

Project 1: City of Melbourne

The aim is to find out which suburb is the most liveable in City of Melbourne, based on property prices, facilities and overall safety perceptions.
Located and merged data from 3 different sources and tidied the data using R and Tidyverse.
Scanned for and dealt with all missing values, inconsistencies and outliers.
Transformed data for better understanding and help decreasing the skewness in distribution.
Visualised and communicated the results with ggplot2

The aim is to classify text messages (sms) as either spam or ham, based on the content
Prepared and cleaned raw data, conducted EDA in Python using Pandas and Matplotlib to form an interpretation of data.
Tokenised the spam/ham sms by words using CountVectorizer.
Trained and evaluated KNN and Decision Tree using SMOTE

The aim is to create a model that can predict groups of employees that have a high tendency of leaving company.
Explored the data and relationships using descriptive analytics and graphs using Python.
Transformed, normalised data to prepare for training models, using K-Means and DBSCAN
Tested models and evaluated clustering performance.

The aim is to evaluate the company’s new recommendation engine algorithm is worth rolling to all customers.
Performed hypothesis testing to calculate the significance of key results on samples.
Identify trends by performing linear and multiple regression analysis.
Created visualisations to support communication using ggplot2 and plotly packages in R.
Provided recommendations for future data collection and investigation to minimise bias and errors.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md