ETL Data Pipeline

Project Context

The main dataset being used contains academic scores for student athletes on NCAA Division 1 teams. The granularity of the data is at the school, sport, and gender level. The goal for this project is to determine if the academic scores for sports teams are correlated with the physicality of the sport. In other words, is there a correlation between contact sports and poor academic performance.

The data was extracted from different sources (csv, web scraping), cleaned and transformed to uniformity, and then loaded into a PostgreSQL database according to the below star schema.

Dimensional Model

Dimension Tables: date_dim, location_dim, school_dim, sport_dim
Fact Table: academic_score_snapshot_fact

The dimensional model is implemented using a star schema.

Using the code

Create and activate a virtual environment, then install the dependencies. All example code below is using Powershell.
Note: venv_name is the name of your virtual environment

PS C:\> python -m venv venv_name
PS C:\> venv_name\Scripts\Activate.ps1
PS C:\> pip install -r packages.txt

To create PostgreSQL database and dimension and fact tables according to the above star schema, run the create_star_schema.py file.

PS C:\> python create_star_schema.py

Finally, to execute the ETL (Extract, Transform, Load) pipeline and populate the data warehouse according to the above star schema, run the loader.py file.

PS C:\> python loader.py

Testing the code

To test the code using pytest, run the following command in PowerShell:

PS C:\> pytest -q tests.py

Note: "-q" is used to condense the output of the above command.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
ETL_pipeline.py		ETL_pipeline.py
NCAA_school_academic_performance.csv		NCAA_school_academic_performance.csv
README.md		README.md
contact_sports.csv		contact_sports.csv
create_star_schema.py		create_star_schema.py
loader.py		loader.py
packages.txt		packages.txt
star_schema.png		star_schema.png
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Data Pipeline

Project Context

Dimensional Model

Using the code

Testing the code

About

Releases

Packages

Languages

oiannace/ETL-pipeline

Folders and files

Latest commit

History

Repository files navigation

ETL Data Pipeline

Project Context

Dimensional Model

Using the code

Testing the code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages