I'm a data engineer and an aspiring data analyst. I mainly write code with Python and SQL, but have dabbled in other langueages.
In college I had an internship as a data analyst as at a social networking app. There I was able to visualize networks and was tasked with attemping to explain what made a specific user popular. With a Bachelorβs degree in Management Information Systems and a minor in Marketing, I bring a unique blend of technical and business acumen to data.
After being a Data Engineer, I realized that though backend engineering exciting, my true passion is leveraging data to drive business decisions and am working to transition into a data analyst role.
Code: Analyzing turnover at Salifort
Goal: To determine why there was such a high turnover rate at Salifort Motors.
Description: The project focused on analyzing a dataset of employees collected by the HR team. The dataset included satifaction level, employee's last performance review, number of projects an employee contributes to, number of hours an employee works a month, employee tenure, whether they were promoted in the last 5 years, and other relevant information. The project involved loading the data, cleaning and preprocessing it, performing exploratory data analysis (EDA), analyzing the correlation between whether an employee left or not against other variables, and builing logistical and tree-based models.
Skills: data cleaning, data analysis, data modeling, machine learning, data visualization
Technology: Python, Pandas, Numpy, Seaborn, Matplotlib, Sklearn
Results: Using Python functions the analysis revealed that there is cause for concern about a data leakage. However, with the data provided, it was found that the two variables with the highest importance that the model would use to predict if an employee was leaving was their last evaluation score and the number of hours the employee worked. Other factors also included their tenure and if the employee worked over 166.67 hours a month.
Vizualizations: 2018 Seoul Bike Rentals
Goal: To visualize the slowest time of bike rentals so workers can work on maintenance and repairs away from peak times.
Description: The project focused on visualizing the 2018 bike rental data in Seoul. The data set inclueded things such as rented bike count, date, and weather data such as snowfall and humidity. For simplicity, we chose to focus on season and date/time the bike was rented.
Skills: data visualization
Technology: Tableau
Results: After looking at the visualization, it is noted that the lowest bicycle traffic times are between 9am and 2pm on any given day, with no variation depending on the weekday. However, we notice that the bike rentals decrease significantly in the winter time. Because of this, regular maintanance can be recommended to be done at 9am-2pm on the weekedays and major repairs can be done in the winter months.
Vizualizations: Lightning Strike Visualizations
Goal: To understand the trends of lighning strikes from 2009-2018.
Description: The project focused on visualizing the lightning strike data to understand how the strikes changed over time. In this dataset columns used were latitute and longitude, and date the strike happened.
Skills: data visualization
Technology: Tableau
Results: After building the report we saw that over the last 9 years, the number of lighning strikes recorded were increasing with Q3 (summer months) being the most frequent month for lightning strikes to occur. Over the years, lightning strikes have moved frrom the east coast to the central mainlands with Louisiana, Arkansas, and Missisipi is where the lighning strikes occured the most in the whole decade.
Code: Predicting User Churn at Waze
Goal: To predict user churn at Waze and understand possible reasons why.
Description: The project focused on analyzing a dataset of churned users. The dataset included whether the user was retained or churned, number of times a user opened the app during the month, the number of drives over 1km during the month, whether the user had an iphone or adroid and other variables. The project involved loading the data, cleaning and preprocessing it, performing exploratory data analysis (EDA), analyzing the correlation between whetehr an employee left or not against other variables, and building machine learning models.
Skills: data cleaning, data analysis, data modeling, machine learning, data visualization
Technology: Python, Pandas, Numpy, Seaborn, Matplotlib, Sklearn, xgboost
Results: The machine learning model created would not be a strong predictor so we would not be able to drive any business decisions, however the model is a great start to guide further exploritory efforts. If we had additional information such as geographic location, drive times, or if a user ended the route before reaching their destination, it would give us a greater chance of improving the model.