Vancouver Datajam 2023 workshop

Automating ETL Processes with JupySQL and GitHub Actions

Introduction

Data analytics and business intelligence rely heavily on efficient data extraction, transformation, and loading (ETL) processes. This workshop will provide participants with a comprehensive understanding of ETL and JupySQL - a Python library that enables seamless SQL based ETL from Jupyter notebooks. We will also introduce GitHub actions for scheduling and automating ETL processes. By the end of this workshop, participants will have hands-on experience with these tools and will be able to schedule their own ETL jobs.

Objectives

By the end of this workshop, participants will know how to:

Extract, load, and transform data using JupySQL, DuckDB and Amazon Redshift - from Jupyter notebooks.
Automate and schedule ETL processes using GitHub Actions.
Use Python and SQL to clean, aggregate, and transform data.
Apply these skills to a real-world data management problem.

Workshop Schedule

Duration: 2.5 hours

Introductions
Section 1: Introduction to ELT (30 minutes)
Section 2: Data extraction, wrangling and loading with SQL and DuckDB (30 minutes)
Short Break (15 minutes)
Section 3: Introduction to GitHub actions (30 minutes)
Section 4: CI/CD of ETL Processes with GitHub Actions (15 minutes)
Section 5: Deploying your ETL/ELT pipeline to Amazon Redshift (15 minutes)

Setup Instructions

Fork repository
Clone your fork of the repository:

git clone https://github.com/<your-github-id>/automate-etl-github-actions.git
cd automate-etl-github-actions

Create a virtual environment and install dependencies:

conda create -n automate-etl python=3.10
conda activate automate-etl
pip install poetry==1.5.1 redshift-connector "sqlalchemy<2"
poetry install

Pre-requisites

Participants should have:

Basic and intermediate understanding of Python programming.
Familiarity with SQL and Jupyter Notebooks.
Installed Jupyter notebooks on their local systems.
GitHub account.

Speaker bio

Laura Funderburk works as a developer advocate for Ploomber. She has over three years of professional working experience in data science roles in a variety of settings including the private and the NGO sectors. Laura completed her B.Sc. Mathematics at SFU. In recognition of her ability to face adversity and give back to the community she forms part of, her Alma Mater awarded her a Terry Fox gold medal in 2019.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
data		data
pipeline		pipeline
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

data

data

pipeline

pipeline

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

init.py

init.py

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

Vancouver Datajam 2023 workshop

Automating ETL Processes with JupySQL and GitHub Actions

Introduction

Objectives

Workshop Schedule

Setup Instructions

Pre-requisites

Speaker bio

About

Languages

License

lfunderburk/automate-elt-github

Folders and files

Latest commit

History

Repository files navigation

Vancouver Datajam 2023 workshop

Automating ETL Processes with JupySQL and GitHub Actions

Introduction

Objectives

Workshop Schedule

Setup Instructions

Pre-requisites

Speaker bio

About

Resources

License

Stars

Watchers

Forks

Languages