Skip to content

poojaisabelle/ETL-Project

Repository files navigation

ETL-Project - Monash Data Analytics Bootcamp


Background

We are interested in looking at the specific locations of fast food chains in the US and various socio-economic measures like median level of income, unemployment rate etc. in those very locations. By transforming two different datasets that have a common point: Zip Code Tabulation Area (ZCTA), we hope that our database will allow analysts to draw insights on the potential link between low-socio economic status communities, the location of fast food chains as well as obesity levels across the US.

Project Report

For further details in any of the following steps and our project potentials and limitations, please refer to our report and the notebooks for each part.

Extract

1/ US Census Bureau Demographic Data

Use census API wrapper to retrieve data from the American Community Survey 5-Year Data (2009-2018) based on zip code tabulation area (zcta). Please refer to our notebook.

2/ Fast Food Restaurants Across America

This dataset was extracted from Kaggle and it came in the form of a downloadable CSV. Please refer to our notebook.

3/ Zip Code to ZCTA Cross Walk

This dataset was extracted from UDS Mapper and it came in the form of a downloadable CSV. Please refer to our notebook.

All of the input csv files can be found here.

Transform

Please refer to the following notebooks:

After our data analysis and transformation, we come up with this ERD and schema before loading data to the PostgreSQL database.

ERD

Load

Please refer to our notebook.


We make no claims as the ownership of the data. Hence, please do what you'd love with the data but credit the appropriate people.

About

Repo for the ETL Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •