Skip to content
View egoliveira1's full-sized avatar

Block or report egoliveira1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
egoliveira1/README.md

Hi there ๐Ÿ‘‹

I'm Eron Oliveira

I have been involved in data analysis and project management for the last nine years. I am taking another step in my career and developing Data Science and Data Engineering projects.

Tools


Talk to me...


Projects

Problem:

A startup called Sparkify wants to analyze the data they have been collecting about songs and user activity on their new music streaming app. The company is interested in finding out what songs users are listening to. The analysts don't have a were to query the data that is in a directory of JSON records about user activity in the app, as well as a directory with JSON metadata about the songs in their app.

Solution:

I developed a solution using Python and SQL that collects and organizes client usage data and the application's song collection. I defined a star schema and the fact and dimension tables to receive the data. Finally, I wrote an ETL pipeline that transfers data from JSON files in two local directories to these tables in Postgres using Python and SQL.

Project repo: https://github.com/egoliveira1/Sparkify-PostgreSQL-Data-Modeling

Problem:

A Brazilian Company is planning to enter the US fashion market with an e-commerce business model called Star Jeans. The initial idea is to enter the market with only one product and for a specific audience, in this case, the product will be jeans for men. However, even with the entrance product and the audience defined, the company have no experience in this market and therefore do not know how to define price, the model of the pants, and the material to manufacture each piece.

Solution:

The solution is to research the main competitors to define which models exist, the composition of each piece, and the price of the products. To accomplish the task, a web scraping code was developed to extract, transform, and load (ETL) the information into a database. After the first load is done, the medians will be calculated and the results will be delivered through a Stramlit application.

Project repo: https://github.com/egoliveira1/SalesPricePredict

Problem:

The company provides health insurance to its clients, and the product team is looking into the possibility of offering clients a new product: car insurance. The insurer surveyed its current health insurance customers, and all customers expressed interest or not in buying car insurance. The product team selected 127,000 new customers who did not respond to the survey to receive the offer of car insurance, however, the team has a limit of 20,000 contacts to make.

Solution:

In this context, it is necessary to build a solution that indicates whether the customer will be interested in car insurance or not. Based on the solution, the sales team wants to prioritize the people most interested in the new product and thus optimize the campaign by only making contacts with the customers most likely to make the purchase. Classification algorithms were used and a score was generated for each customer. The final product, besides being available on a remote server, was also applied in a spreadsheet, where the team can make classifications on a smaller scale or simulate customer profiles.

Project repo: https://github.com/egoliveira1/CrossSellProject

Problem:

The implementation of improvements in the pharmacy group's shops required well-structured financial planning with a small margin for error, preventing the company from wasting money in implementing the initiatives.

Solution:

The solution aims to create an intelligent model that uses sales data from all stores and their main characteristics to forecast sales in a future period. A TelegramBot will be used to streamline and facilitate the delivery of information.

Project repo: https://github.com/egoliveira1/RossmannProject

Pinned Loading

  1. RossmannProject RossmannProject Public

    A time series problem, solved using XGBoost Machine Learning model.

    Jupyter Notebook 2

  2. CrossSellProject CrossSellProject Public

    Classification problem - Solved by applying ML models to estimate the probability of purchase.

    Jupyter Notebook 1

  3. SalesPricePredict SalesPricePredict Public

    Predicting the selling price of pants for a new venture in the fashion business.

    Jupyter Notebook 1

  4. Sparkify-PostgreSQL-Data-Modeling Sparkify-PostgreSQL-Data-Modeling Public

    Udacity NanoDegree Data Engineering Project

    Jupyter Notebook 2

  5. Databricks_Divvy_bikeshare Databricks_Divvy_bikeshare Public

    ETL building and analysis

    Python 1

  6. NYC-payroll-project NYC-payroll-project Public

    Create high-quality data pipelines using Azure

    1