Skip to content

Welcome to my repository. Here you will find a series of projects about data science using Python, SQL and Tableau. Projects can be part of courses, competitions, university work or own initiatives. Among the techniques developed are: data importing, cleaning, analysis and visualization; probability, statistics, machine learning and programming.

Notifications You must be signed in to change notification settings

mauriciocarrazza/Data-Science-Projects

Repository files navigation

Data-Science-Projects

Welcome to my repository. Here you will find a series of projects about data science using Python, SQL, Power BI and Tableau. Projects can be part of competitions, university work, tech challenges or own initiatives. Among the techniques developed are: data extraction, cleaning, analysis and visualization; probability, statistics, machine learning and programming.

Some projects consist of more than file, so they are located in a single folder.

The "guided_projects" folder contain all of the notebooks for each guided project.

All of the avalable datasets from each project can be located in the "datasets" folder.

Aug-2023 - Factored’s Datathon: Data Engineering, Machine Learning & Data Analytics - (Team Competition)

Data Science competition hosted by Factored. We finished in 5th place out of 111 teams.

Link to the repo: factored-datathon-2023-Big-Data-Joropo

report

Mar-2023 - Soccer Leagues Analysis and Machine Learning Models Evaluation - (Tech Challenge)

Project that consists of two sections: The first one is a data processing and analysis of the Top 5 Soccer Leagues in Europe using Python's pandas, numpy, matplotlib and seaborn. The second one uses a wine dataset and consists of an EDA and evaluation of differents techniques used in Machine Learning.

Link to the project: https://github.com/mauriciocarrazza/Data-Projects/tree/main/soccer_and_ml

Topics:

  • Data manipulation and analysis in Python
  • Data viz in Python
  • Machine Learning (classification model)

soccer analysis

Jan-2023 - Analysis on Professional Profiles in Tech - (Tech Challenge)

This project consists of two sections: The first one is a data processing and analysis on tech profiles using Python and Tableau. The second one is the development of a classification model to optimize the hiring process of tech talents. This project contain 4 files: the Jupyter Notebook with the code answer, a PDF with the challenge instructions, a Tableau workbook with some visualizations of the answers and a Power Point presentation to the stakeholders

Link to the project: https://github.com/mauriciocarrazza/Python-Projects/tree/main/analysis_tech_profiles

Topics:

  • Data manipulation and analysis in Python
  • Data viz in Python and Tableau
  • Machine Learning (classification model)

tech profiles analysis

Dic-2022 - Customer Segmentation Analysis with a Furniture Store's Data - (Own Initiative)

Segment customers based on groups based on their last purchase and number of purchases. I'm not sharing the dataset for this project.

Link to the project: https://github.com/mauriciocarrazza/Data-Science-Projects/tree/main/furniture_store

Topics:

  • Data manipulation and analysis in Python
  • Data viz in Python
  • Machine Learning (clustering)

furniture store

Apr-2022 - Database creation for Water Treatment Processes - (Internship Project)

Developed my chemical engineering degree project entitled "Effective use and control of water in a cosmetic manufacturer" (Earned a final industry grade of 48/50). The folder contains the final doc presented to the university. All of the information about the project development is found in the chapter #6.

  • Designed a database for the control of water's microbiological analysis and physicochemical properties (Microsoft Excel). This tool made it possible to increase decision making effectiveness about water treatment and reduce disinfection costs by 40%.
  • Developed the techniques to measure water consumption and created data visualizations (Python). These both helped to improve the company's production planning process.

Link to the project: https://github.com/mauriciocarrazza/Data-Science-Projects/tree/main/Internship%20(water%20treatment)

Topics:

  • Database creation (Excel)
  • Data visualization (Excel & Python)
  • Chemical Engineering
  • Water treatment

mb del agua ph del agua

Mar-2021 - Study of correlations between activated carbon properties - (University Project)

Collecting and assembling an extensive database of activated carbon properties and applying machine learning techniques such as clustering and PCA to find the best ones.

Link to the project: https://github.com/mauriciocarrazza/Data-Science-Projects/tree/main/activated%20carbon%20-%20ML%20%26%20Data%20Analysis

Topics:

  • Extensive Research and Database creation in Excel
  • Statistical Analysis
  • Machine Learning (PCA & Clustering) using Statgraphics

act carbon

Oct-2022 - Employee turnover - (DataCamp Competition)

Based on the data provided by a company with almost 10,000 employees, analyze and predict the variables that are causing the turnovers of a high percentage of employees.

Link to the project: https://github.com/mauriciocarrazza/Data-Science-Projects/blob/main/employee_turnover.ipynb
Link to the competition: https://app.datacamp.com/workspace/w/916f34ed-c1d0-49ec-aeaa-52ee01d6f297

Topics:

  • Data Manipulation
  • Data Visualization
  • Statistics
  • Machine Learning (Logistic Regression)

employee turnoveer

May-2022 - The Android App Market on Google Play - (Guided Project)

A comprehensive analysis of the Android app market by comparing over ten thousand apps in Google Play across different categories, looking for insights in the data to devise strategies to drive growth and retention.

Topics:

  • Data Cleaning
  • Data Manipulation
  • Data Visualization
  • Probability & Statistics

Mar-2022 - The GitHub History of the Scala Language - (Guided Project)

Find the true Scala experts by exploring its development history in Git and Github.

Topics:

  • Importing & Cleaning Data
  • Data Manipulation
  • Data Visualization

Feb-2022 - Predicting Credit Card Approvals - (Guided Project)

Build a Machine Learning Model to predict if a credit card application will get approved.

Topics:

  • Data Manipulation
  • Machine Learning (Logistic Regression)
  • Importing & Cleaning Data
  • Applied Finance

Jan-2022 - The Discovery of Handwashing - (Guided Project)

Reanalyse the data behind one of the most important discoveries of modern medicine: handwashing.

Topics:

  • Data Manipulation
  • Data Visualization
  • Probability & Statistics
  • Importing & Cleaning Data

Jan-2022 - A Visual History of Nobel Prize Winners - (Guided Project)

Explore a dataset containing a century's worth of Nobel Laureates.

Topics:

  • Data Manipulation
  • Data Visualization
  • Importing & Cleaning Data

Dec-2021 - Coding Challenge: Cosmetic Brand Analysis - (Guided Project)

From data collected from Sephora's global store, find the average prices for four competitor brands across three categories.

Topics:

  • Data Manipulation
  • Importing & Cleaning Data

Dec-2021 - Investigating Netflix Movies and Guest Stars in The Office - (Guided Project)

Apply Python skills by manipulating and visualizing movie and TV data.

Topics:

  • Data Manipulation
  • Data Visualization
  • Programming

About

Welcome to my repository. Here you will find a series of projects about data science using Python, SQL and Tableau. Projects can be part of courses, competitions, university work or own initiatives. Among the techniques developed are: data importing, cleaning, analysis and visualization; probability, statistics, machine learning and programming.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published