Skip to content

harisyammnv/nyc-taxi-trip-dwh-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NYC Taxi Trip DWH Project

This is a project repo which is extending the Data Engineering Zoomcamp in a more generic way. This project helps in understanding in general how data engineering workflows would work

Pipeline Steps

  • Initially the urls for the raw_data are scraped using BeautifulSoup (bs4) and stored in a json
  • Based on the configuration TOML the data is downloaded into the nyc_raw_data directory
  • Once the data is present the ingestion takes place into a Local postgres db using LocalDaskExecutor from prefect

Pipeline Visualization

The Local Postgres Pipeline is as below: Local Pipeline

About

Repo containing DWH Pipeline

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages