Portfolio that consists of my personal data analysis / data science projects
- The aim is to find out which suburb is the most liveable in City of Melbourne, based on property prices, facilities and overall safety perceptions.
- Located and merged data from 3 different sources and tidied the data using
R
andTidyverse
. - Scanned for and dealt with all missing values, inconsistencies and outliers.
- Transformed data for better understanding and help decreasing the skewness in distribution.
- Visualised and communicated the results with
ggplot2
- The aim is to classify text messages (sms) as either spam or ham, based on the content
- Prepared and cleaned raw data, conducted EDA in
Python
usingPandas
andMatplotlib
to form an interpretation of data. - Tokenised the spam/ham sms by words using
CountVectorizer
. - Trained and evaluated KNN and Decision Tree using
SMOTE
- The aim is to create a model that can predict groups of employees that have a high tendency of leaving company.
- Explored the data and relationships using descriptive analytics and graphs using
Python
. - Transformed, normalised data to prepare for training models, using K-Means and DBSCAN
- Tested models and evaluated clustering performance.
- The aim is to evaluate the company’s new recommendation engine algorithm is worth rolling to all customers.
- Performed hypothesis testing to calculate the significance of key results on samples.
- Identify trends by performing linear and multiple regression analysis.
- Created visualisations to support communication using
ggplot2
andplotly
packages inR
. - Provided recommendations for future data collection and investigation to minimise bias and errors.