Skip to content

nenalukic/air-quality-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Air Quality Project

Project Description

This project contains an end-to-end data pipeline written in Python.

This was my final project for Data Engineering Zoomcamp in the 2024 Cohort.

The application uses the data from Open-Meteo by reading from two APIs:

  1. Air Quality API and
  2. Weather Forecast API

Pipeline description:

  • Pipeline fetches the data from APIs
  • Then it transforms both data sets and uploads them to Google Cloud Storage.
  • In the next step this data is loaded from GCS into BigQuery.
  • There we create a couple of tables with aggregated data.

All steps are orchestrated in Mage.

Problem:

Weather and Air Quality Aggregator: Collect historical and forecast data from various sources, aggregate the information to analyse trends over time, and generate comprehensive forecasts for different regions.


Prerequisites

  1. Docker
  2. Git
  3. Terraform
  4. Setup a GCP account

Before running the code you need to follow the steps below.

Setting up GCP

Google Cloud is a suite of Cloud Computing services offered by Google that provides various services like compute, storage, networking, and many more. It is organised into Regions and Zones.

Setting up GCP would require a GCP account. A GCP account can be created for free on trial but would still require a credit card to signup.

  1. Start by creating a GCP account at this link

  2. Navigate to the GCP Console and create a new project. Give the project an appropriate name and take note of the project ID.

  3. Create a service account:

    • In the left sidebar, click on "IAM & Admin" and then click on "Service accounts."

    • Click the "Create service account" button at the top of the page.

    • Enter a name for your service account and a description (optional).

    • Select the roles you want to grant to the service account. For this project, select the BigQuery Admin, Storage Admin and Compute Admin Roles.

    • Click "Create" to create the service account.

    • After you've created the service account, you need to download its private key file. This key file will be used to authenticate requests to GCP services.

    • Click on the service account you just created to view its details.

    • Click on the "Keys" tab and then click the "Add Key" button.

    • Select the "JSON" key type and click "Create" to download the private key file. This key would be used to interact to the google API from Mage.

    • Store the json key as you please, but then copy it into the mage directory of this project and give it exactly the name my-airquality-credentials.json.

  4. This application communicates with several APIs. Make sure you have enabled the BigQuery API.


Running the Code

Note: these instructions are used for macOS/Linux/WSL, for Windows it may differ

  1. Clone this repository
  2. cd into the terraform directory. We are using terraform to create google cloud resorces. My resources are created for region EU. If needed, you can change it in variables.tf file. In this file you need to change the project ID to the project ID you created in GCP.
  3. To prepare your working directory for other commands we are using:
terraform init
  1. To show changes required by the current configuration you can run:
terraform plan
  1. To create or update infrastructure we are using:
terraform apply
  1. To destroy previously-created infrastructure we are using:
terraform destroy

IMPORTANT: This line uses when you are done with the whole project.

  1. cd into the mage directory

  2. Rename dev.env to simply .env.

  3. Now, let's build the container

docker compose build
  1. Finally, start the Docker container:
docker compose up
  1. We just initialized a mage repository. It is present in your project under the name air-quality. Now, navigate to http://localhost:6789 in your browser!

This repository should have the following structure:

.
├── mage_data
│   └── air-quality
├── air-quality
│   ├── __pycache__
│   ├── charts
│   ├── custom
│   ├── data_exporters
│   ├── data_loaders
│   ├── dbt
│   ├── extensions
│   ├── interactions
│   ├── pipelines
│   ├── scratchpads
│   ├── transformers
│   ├── utils
│   ├── __init__.py
│   ├── io_config.yaml
│   ├── metadata.yaml
│   └── requirements.txt
├── .gitignore
├── .env
├── docker-compose.yml
├── Dockerfile
└── requirements.txt
  1. Time to work with mage. Go to the browser, find pipelines, click on air_quality_api pipeline and click on Run@once.
Find pipeline Pipeline Run pipeline

IMPORTANT: For some reason, an error may occur during the step of creating the 'air_aggregated' table, indicating '404 Not Found: Table air-quality-project-417718:air_quality.air_aggregated_data was not found in location EU.' However, if you navigate to BigQuery and refresh the database, the table should appear.

When you are done, in a google bucket you should have two CSV files and in the BigQuery you should have all tables. Your pipeline should look like this:



Creating Visualisations

  • With your google account, log in at Google looker studio

  • Connect your dataset using the Big Query Connector

  • Select your project name then select the dataset. This would bring you to the dashboard page

  • Create your visualizations and share.


Facts about Pollen

A pollen count is the measurement of the number of grains of pollen in a cubic meter of air. High pollen counts can sometimes lead to increased rates of allergic reactions for those with allergic disorders.

Pollen, a fine to coarse powdery substance, is created by certain plants as part of their reproduction process. It can appear from trees in the spring, grasses in the summer, and weeds in the fall. Interestingly, pollen from flowers doesn’t usually contribute to nasal allergy symptoms.


As a general observation, most aeropalynology studies indicate that temperature and wind have a positive correlation with airborne pollen concentrations, while rainfall and humidity are negatively correlated.


Air Quality and Pollen.

Urban areas tend to have lower pollen counts than the countryside, but pollen can combine with air pollution in the city center and bring on hay fever symptoms. It’s not just in the summer months either; it can peak as early as April and May.


Air Quality Report 1 Air Quality Report 2

Home

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published