In this project we are making an ELT data pipeline. Before we start lets define the abbreviations
- E: Extract, refers to the process of getting data from the source.
- T: Transform, refers to the process of transforming the raw data from the source (eg: joins with other tables, group by, column mapping, denormalizing, lookups on external database, machine learning modeling, etc).
- L: Load, refers to the process of loading the data into a table to be used.
ELT is mostly used when we don't really know the type of transformation needed foor the data.
- SQL
- DBT
- REDASH
The data used is provided in https://anson.ucdavis.edu/~clarkf/ it contains station and traffic movement on those stations over time.
- we will load the data from the different csv's into our Mysql database with the help of ariflow.
- with dbt we will make models to perform some tranformations on the data
- we will display the queried columns on redash
- https://dev.mysql.com/doc/connector-python/en/connector-python-example-ddl.html
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the chat on Slack for live discussions and support
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices
- https://medium.com/@ikishan/creating-a-new-age-dashboard-with-self-hosted-open-source-redash-41e91434390