Data ETL pipeline to clean, process, and aggregate data from Canadian housing starts.
Built with Apache Airflow, dbt, and Amazon Web Services EC2.
Learn more about the project by reading the design document.
Table of contents created with VS Code Extension: Markdown All in One.
- Navigate to Amazon Web Services and create an account.
- In the search bar at the top, search for and click EC2.
- Click the Launch Instance button and follow the instructions. For this example, we will be using the Amazon Linux AMI operating system.
- Download the
.pem
key file and keep it secure. - Wait until the instance state is Running.
- In the AWS EC2 service, in the left sidebar, select Instances.
- Select the newly created instance.
- At the top of the screen, click Connect.
- Choose any of the provided options to connect to the instance.
As an alternative to CloudShell, you can also use SSH from your local computer.
Follow the instructions at:
Ensure your user has the permissions to execute Docker:
sudo groupadd docker
sudo usermod -aG docker $USER
Log out and back in to get permissions.
This section is based on the guide for running Airflow using Docker Compose.
If you are on Linux, update the Airflow UID in the .env
file with the host user ID:
echo -e "AIRFLOW_UID=$(id -u)" > .env
Run database migrations and initialize the first user account:
docker compose up airflow-init
Follow the commands under these instructions to add security group rules which permit HTTP access to port 80.
Once complete, the security rules should look like this:
Start all services:
sudo docker compose up
sudo
is required to run the Airflow console on port 80.
If you want to avoid sudo
or prefer another port:
- Open
docker-compose.yaml
- Find the configuration for
airflow-webserver
- Change the port number in the variable
ports
You can stop all services with:
docker compose down
Airflow is now running on your machine.
If you set up Airflow and Docker locally, you can log into Airflow at http://localhost:80; otherwise use the port you used to expose the Airflow console.
The default username and password airflow
; reset it immediately after logging in.
You can view information on the current environment:
docker compose run airflow-worker airflow info
OR
./bin/airflow info
Enter the running Docker container to execute commands:
./bin/airflow bash
Stop and delete all containers and volumes:
docker-compose down --volumes --rmi all