I have been involved in data analysis and project management for the last nine years. I am taking another step in my career and developing Data Science and Data Engineering projects.
A startup called Sparkify wants to analyze the data they have been collecting about songs and user activity on their new music streaming app. The company is interested in finding out what songs users are listening to. The analysts don't have a were to query the data that is in a directory of JSON records about user activity in the app, as well as a directory with JSON metadata about the songs in their app.
I developed a solution using Python and SQL that collects and organizes client usage data and the application's song collection. I defined a star schema and the fact and dimension tables to receive the data. Finally, I wrote an ETL pipeline that transfers data from JSON files in two local directories to these tables in Postgres using Python and SQL.
Project repo: https://github.com/egoliveira1/Sparkify-PostgreSQL-Data-Modeling
A Brazilian Company is planning to enter the US fashion market with an e-commerce business model called Star Jeans. The initial idea is to enter the market with only one product and for a specific audience, in this case, the product will be jeans for men. However, even with the entrance product and the audience defined, the company have no experience in this market and therefore do not know how to define price, the model of the pants, and the material to manufacture each piece.
The solution is to research the main competitors to define which models exist, the composition of each piece, and the price of the products. To accomplish the task, a web scraping code was developed to extract, transform, and load (ETL) the information into a database. After the first load is done, the medians will be calculated and the results will be delivered through a Stramlit application.
Project repo: https://github.com/egoliveira1/SalesPricePredict
The company provides health insurance to its clients, and the product team is looking into the possibility of offering clients a new product: car insurance. The insurer surveyed its current health insurance customers, and all customers expressed interest or not in buying car insurance. The product team selected 127,000 new customers who did not respond to the survey to receive the offer of car insurance, however, the team has a limit of 20,000 contacts to make.
In this context, it is necessary to build a solution that indicates whether the customer will be interested in car insurance or not. Based on the solution, the sales team wants to prioritize the people most interested in the new product and thus optimize the campaign by only making contacts with the customers most likely to make the purchase. Classification algorithms were used and a score was generated for each customer. The final product, besides being available on a remote server, was also applied in a spreadsheet, where the team can make classifications on a smaller scale or simulate customer profiles.
Project repo: https://github.com/egoliveira1/CrossSellProject
The implementation of improvements in the pharmacy group's shops required well-structured financial planning with a small margin for error, preventing the company from wasting money in implementing the initiatives.
The solution aims to create an intelligent model that uses sales data from all stores and their main characteristics to forecast sales in a future period. A TelegramBot will be used to streamline and facilitate the delivery of information.
Project repo: https://github.com/egoliveira1/RossmannProject