Welcome to my repository. Here you will find a series of projects about data science using Python, SQL, Power BI and Tableau. Projects can be part of competitions, university work, tech challenges or own initiatives. Among the techniques developed are: data extraction, cleaning, analysis and visualization; probability, statistics, machine learning and programming.
Some projects consist of more than file, so they are located in a single folder.
The "guided_projects" folder contain all of the notebooks for each guided project.
All of the avalable datasets from each project can be located in the "datasets" folder.
Aug-2023 - Factored’s Datathon: Data Engineering, Machine Learning & Data Analytics - (Team Competition)
Data Science competition hosted by Factored. We finished in 5th place out of 111 teams.
Link to the repo: factored-datathon-2023-Big-Data-Joropo
Project that consists of two sections: The first one is a data processing and analysis of the Top 5 Soccer Leagues in Europe using Python's pandas, numpy, matplotlib and seaborn. The second one uses a wine dataset and consists of an EDA and evaluation of differents techniques used in Machine Learning.
Link to the project: https://github.com/mauriciocarrazza/Data-Projects/tree/main/soccer_and_ml
Topics:
- Data manipulation and analysis in Python
- Data viz in Python
- Machine Learning (classification model)
This project consists of two sections: The first one is a data processing and analysis on tech profiles using Python and Tableau. The second one is the development of a classification model to optimize the hiring process of tech talents. This project contain 4 files: the Jupyter Notebook with the code answer, a PDF with the challenge instructions, a Tableau workbook with some visualizations of the answers and a Power Point presentation to the stakeholders
Link to the project: https://github.com/mauriciocarrazza/Python-Projects/tree/main/analysis_tech_profiles
Topics:
- Data manipulation and analysis in Python
- Data viz in Python and Tableau
- Machine Learning (classification model)
Segment customers based on groups based on their last purchase and number of purchases. I'm not sharing the dataset for this project.
Link to the project: https://github.com/mauriciocarrazza/Data-Science-Projects/tree/main/furniture_store
Topics:
- Data manipulation and analysis in Python
- Data viz in Python
- Machine Learning (clustering)
Developed my chemical engineering degree project entitled "Effective use and control of water in a cosmetic manufacturer" (Earned a final industry grade of 48/50). The folder contains the final doc presented to the university. All of the information about the project development is found in the chapter #6.
- Designed a database for the control of water's microbiological analysis and physicochemical properties (Microsoft Excel). This tool made it possible to increase decision making effectiveness about water treatment and reduce disinfection costs by 40%.
- Developed the techniques to measure water consumption and created data visualizations (Python). These both helped to improve the company's production planning process.
Link to the project: https://github.com/mauriciocarrazza/Data-Science-Projects/tree/main/Internship%20(water%20treatment)
Topics:
- Database creation (Excel)
- Data visualization (Excel & Python)
- Chemical Engineering
- Water treatment
Collecting and assembling an extensive database of activated carbon properties and applying machine learning techniques such as clustering and PCA to find the best ones.
Link to the project: https://github.com/mauriciocarrazza/Data-Science-Projects/tree/main/activated%20carbon%20-%20ML%20%26%20Data%20Analysis
Topics:
- Extensive Research and Database creation in Excel
- Statistical Analysis
- Machine Learning (PCA & Clustering) using Statgraphics
Based on the data provided by a company with almost 10,000 employees, analyze and predict the variables that are causing the turnovers of a high percentage of employees.
Link to the project: https://github.com/mauriciocarrazza/Data-Science-Projects/blob/main/employee_turnover.ipynb
Link to the competition: https://app.datacamp.com/workspace/w/916f34ed-c1d0-49ec-aeaa-52ee01d6f297
Topics:
- Data Manipulation
- Data Visualization
- Statistics
- Machine Learning (Logistic Regression)
A comprehensive analysis of the Android app market by comparing over ten thousand apps in Google Play across different categories, looking for insights in the data to devise strategies to drive growth and retention.
Topics:
- Data Cleaning
- Data Manipulation
- Data Visualization
- Probability & Statistics
Find the true Scala experts by exploring its development history in Git and Github.
Topics:
- Importing & Cleaning Data
- Data Manipulation
- Data Visualization
Build a Machine Learning Model to predict if a credit card application will get approved.
Topics:
- Data Manipulation
- Machine Learning (Logistic Regression)
- Importing & Cleaning Data
- Applied Finance
Reanalyse the data behind one of the most important discoveries of modern medicine: handwashing.
Topics:
- Data Manipulation
- Data Visualization
- Probability & Statistics
- Importing & Cleaning Data
Explore a dataset containing a century's worth of Nobel Laureates.
Topics:
- Data Manipulation
- Data Visualization
- Importing & Cleaning Data
From data collected from Sephora's global store, find the average prices for four competitor brands across three categories.
Topics:
- Data Manipulation
- Importing & Cleaning Data
Apply Python skills by manipulating and visualizing movie and TV data.
Topics:
- Data Manipulation
- Data Visualization
- Programming







