<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#IMPACT-OF-COVID-19-PANDEMIC-ON-AIR-QUALITY" data-toc-modified-id="IMPACT-OF-COVID-19-PANDEMIC-ON-AIR-QUALITY-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>IMPACT OF COVID-19 PANDEMIC ON AIR QUALITY</a></span><ul class="toc-item"><li><span><a href="#Air-quality-Data-Exploration-and-Cleanup" data-toc-modified-id="Air-quality-Data-Exploration-and-Cleanup-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Air quality Data Exploration and Cleanup</a></span><ul class="toc-item"><li><span><a href="#Import-datasets-and-overview-of-the-air-quality-data" data-toc-modified-id="Import-datasets-and-overview-of-the-air-quality-data-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Import datasets and overview of the air quality data</a></span></li><li><span><a href="#Slice-and-dice-the-data-to-clean-up" data-toc-modified-id="Slice-and-dice-the-data-to-clean-up-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Slice and dice the data to clean up</a></span></li><li><span><a href="#Explore-the-data-through-graphs" data-toc-modified-id="Explore-the-data-through-graphs-1.1.3"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Explore the data through graphs</a></span></li></ul></li><li><span><a href="#COVID-19-DATA-EXPLORATION-AND-CLEAN-UP" data-toc-modified-id="COVID-19-DATA-EXPLORATION-AND-CLEAN-UP-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>COVID-19 DATA EXPLORATION AND CLEAN UP</a></span></li><li><span><a href="#COVID-19-in-AUSTRALIA-DATA-EXPLORATION-AND-CLEAN-UP" data-toc-modified-id="COVID-19-in-AUSTRALIA-DATA-EXPLORATION-AND-CLEAN-UP-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>COVID-19 in AUSTRALIA DATA EXPLORATION AND CLEAN UP</a></span></li></ul></li></ul></div>

# IMPACT OF COVID-19 PANDEMIC ON AIR QUALITY

## Air quality Data Exploration and Cleanup

In [1]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import json
from pprint import pprint

# Import API key
from config import api_key

ModuleNotFoundError: No module named 'config'

Data Source: https://aqicn.org/data-platform/covid19/
With COVID-19 spreading all over the world, the World Air Quality Index project team saw a surge in requests for global data. As a result, the WAQI project now provides a new dedicated data-set which is updated 3 times a day, and that covers approximately 380 major cities in the world - January 2020 until now. 

The data for each major cities is based on the average (median) of several stations. The data set provides min, max, median and standard deviation for each of the common air pollutants (PM2.5,PM10, Ozone ...) as well as meteorological data (Wind, Temperature, ...). All air pollutant data points are converted to the US EPA standard (i.e. no raw concentrations). All dates are UTC based. The count column is the number of samples used for calculating the median and standard deviation.

### Import datasets and overview of the air quality data

In [2]:
# Identift periods of interest 
periods = ["2020", "2019Q1", "2019Q2", "2019Q3", "2019Q4"]

df_list = list()

for period in periods:
    path = f"historical_data/waqi-covid19-airqualitydata-{period}.csv"
    df = pd.read_csv(path)
    df_list.append(df)

In [6]:
# Create dataframe that includes the series created before 
airdf_2019_2020 = pd.concat(df_list, ignore_index=True)
airdf_2019_2020.head()

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
0,31/05/2020,IR,Isfahan,temperature,120,17.5,35.0,27.5,331.51
1,13/06/2020,IR,Isfahan,temperature,144,16.0,36.5,27.5,488.74
2,3/07/2020,IR,Isfahan,temperature,67,19.0,33.0,24.0,128.08
3,28/03/2020,IR,Isfahan,temperature,240,3.0,14.0,9.5,136.68
4,23/04/2020,IR,Isfahan,temperature,168,6.0,25.5,16.0,400.79


### Slice and dice the data to clean up

In [8]:
# Display an overview of the Specie column
airdf_2019_2020["Specie"].unique()


array(['temperature', 'wind-speed', 'wind-gust', 'dew', 'pm25',
       'humidity', 'wind speed', 'pressure', 'wind gust', 'co', 'so2',
       'precipitation', 'no2', 'pm10', 'o3', 'aqi', 'pol', 'uvi', 'wd',
       'neph', 'mepaqi', 'pm1'], dtype=object)

In [6]:
# Identify the number of data points availabe per specie 
airdf_2019_2020["Specie"].value_counts()

temperature      310236
humidity         310145
pressure         308517
pm25             270688
no2              266793
pm10             264859
wind-speed       263292
o3               250438
so2              226576
dew              226147
co               203776
wind-gust        172805
wind speed        47002
wind gust         29576
precipitation     26825
wd                25720
aqi                8267
uvi                5632
pol                3790
pm1                1380
mepaqi              564
neph                440
Name: Specie, dtype: int64

> We understand that "Air movements influence the fate of air pollutants. So any study of air pollution should include a study of the local weather patterns (meteorology). If the air is calm and pollutants cannot disperse, then the concentration of these pollutants will build up. On the other hand, when strong, turbulent winds blow, pollutants disperse quickly, resulting in lower pollutant concentrations." https://www.qld.gov.au/environment/pollution/monitoring/air/air-monitoring/meteorology-influence/meteorology-factors#:~:text=Meteorological%20factors-,Meteorological%20factors,these%20pollutants%20will%20build%20up.
Hence the Meteorology parameters like temperature, humidity, pressure, wind speed, to name a few, should have some sorts of correlations with the air quality.
(http://www.bom.gov.au/vic/observations/melbourne.shtml)

> However, due to the scope of our project, we'll only focus on air pollutant parameters to assess their changes before COVID-19 and 6 months into the pandemic. We're not trying to explain the causes of air quality change. Hence, we'll remove data related to the following meteorology-related species: **temperature, humidity, pressure, wind-speed, dew, wind-gust, wind speed, wind gust, precipitation, wd (wind direction), uvi**.
https://aqicn.org/publishingdata/

> We'll also remove species with the least number of available data points  including **pol, pm1, mepaqi, neph**.

In [21]:
# Identify all removed elements and update the dataframe 
removed_elements = ["temperature", "humidity", "pressure", "wind-speed", "dew", "wind-gust",
                     "wind speed", "wind gust", "precipitation", "wd", "uvi", "pol", "pm1", "mepaqi", "neph"]

clean_airdf = airdf_2019_2020[~airdf_2019_2020["Specie"].isin(removed_elements)].reset_index(drop=True).copy()

In [22]:
clean_airdf.head()

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
0,24/02/2020,IR,Isfahan,pm25,129,54.0,194.0,126.0,10921.4
1,7/05/2020,IR,Isfahan,pm25,168,17.0,168.0,91.0,14014.0
2,28/05/2020,IR,Isfahan,pm25,127,17.0,115.0,72.0,3558.56
3,20/02/2020,IR,Isfahan,pm25,113,26.0,181.0,76.0,11209.8
4,23/02/2020,IR,Isfahan,pm25,132,22.0,132.0,76.0,3209.67


In [9]:
# Double check data points available for each air pollutant 
clean_airdf["Specie"].value_counts()

pm25    270688
no2     266793
pm10    264859
o3      250438
so2     226576
co      203776
aqi       8267
Name: Specie, dtype: int64

More about AQI:
https://www.airnow.gov/aqi/aqi-basics/
https://www.airnow.gov/sites/default/files/2020-05/aqi-technical-assistance-document-sept2018.pdf
"Five major pollutants:
EPA establishes an AQI for five major air pollutants regulated by the Clean Air Act. Each of these pollutants has a national air quality standard set by EPA to protect public health:

* Ground-level ozone **o3** (ppm - parts per million)
* Particulate Matter - including PM2.5 **pm25** and PM10 **pm10** (μg/m3)
* Carbon Monoxide **co** (ppm)
* Sulfur Dioxide **so2** (ppb - parts per billion)
* Nitrogen Dioxide **no2** (ppb)

https://en.wikipedia.org/wiki/Air_pollution

"https://waqi.info/
The Air Quality Index is based on measurement of particulate matter (PM2.5 and PM10), Ozone (O3), Nitrogen Dioxide (NO2), Sulfur Dioxide (SO2) and Carbon Monoxide (CO) emissions. Most of the stations on the map are monitoring both PM2.5 and PM10 data, but there are few exceptions where only PM10 is available.

All measurements are based on hourly readings: For instance, an AQI reported at 8AM means that the measurement was done from 7AM to 8AM.
More details https://aqicn.org/faq/


https://www.weatherbit.io/api/airquality-history#:~:text=Air%20Quality%20API%20(Historical),an%20air%20quality%20index%20score.

aqi: Air Quality Index [US - EPA standard 0 - +500]
o3: Concentration of surface O3 (µg/m³)
so2: Concentration of surface SO2 (µg/m³)
no2: Concentration of surface NO2 (µg/m³)
co: Concentration of carbon monoxide (µg/m³)
pm25: Concentration of particulate matter < 2.5 microns (µg/m³)
pm10: Concentration of particulate matter < 10 microns (µg/m³)

Some good info on air pollution impacts https://ourworldindata.org/air-pollution
https://www.who.int/health-topics/air-pollution#tab=tab_1
https://www.epa.vic.gov.au/for-community/airwatch
https://www.kaggle.com/frtgnn/clean-air-india-s-air-quality/data

In [25]:
# Check the Dtype of each column 
clean_airdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1491397 entries, 0 to 1491396
Data columns (total 9 columns):
 #   Column    Non-Null Count    Dtype  
---  ------    --------------    -----  
 0   Date      1491397 non-null  object 
 1   Country   1491397 non-null  object 
 2   City      1491397 non-null  object 
 3   Specie    1491397 non-null  object 
 4   count     1491397 non-null  int64  
 5   min       1491397 non-null  float64
 6   max       1491397 non-null  float64
 7   median    1491397 non-null  float64
 8   variance  1491397 non-null  float64
dtypes: float64(4), int64(1), object(4)
memory usage: 102.4+ MB


We can see that the Date column is of a generic object type. To perform some time related analysis on this data, we need to convert it to a datetime format. We will use the "to_datetime()" function to convert the Date column into a datetime object. 

In [26]:
# Convert the Date column into a datetime object 
clean_airdf["Date"] = pd.to_datetime(clean_airdf["Date"], format="%d/%m/%Y")

In [56]:
# Sort Country column alphabetically and Date column chronologically 
clean_airdf = clean_airdf.sort_values(by = ['Date', 'Country'])
clean_airdf.head(200)

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
498981,2018-12-31,AE,Abu Dhabi,pm25,24,72.0,152.0,112.0,5643.19
498891,2018-12-31,AE,Abu Dhabi,pm10,109,30.0,62.0,56.0,871.33
498918,2018-12-31,AE,Abu Dhabi,so2,109,1.6,27.9,4.6,240.10
498916,2018-12-31,AE,Abu Dhabi,o3,81,0.5,56.5,27.3,2349.96
499035,2018-12-31,AE,Abu Dhabi,no2,109,2.8,49.4,27.9,1492.45
...,...,...,...,...,...,...,...,...,...
625636,2018-12-31,CA,Québec,no2,54,6.0,29.6,15.0,371.65
627663,2018-12-31,CA,Toronto,co,120,2.4,9.0,2.9,53.52
626468,2018-12-31,CA,Victoria,o3,44,0.8,13.4,3.4,203.73
627277,2018-12-31,CA,Montréal,pm25,421,52.0,92.0,67.0,638.68


In [57]:
clean_airdf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1491397 entries, 498981 to 360596
Data columns (total 9 columns):
 #   Column    Non-Null Count    Dtype         
---  ------    --------------    -----         
 0   Date      1491397 non-null  datetime64[ns]
 1   Country   1491397 non-null  object        
 2   City      1491397 non-null  object        
 3   Specie    1491397 non-null  object        
 4   count     1491397 non-null  int64         
 5   min       1491397 non-null  float64       
 6   max       1491397 non-null  float64       
 7   median    1491397 non-null  float64       
 8   variance  1491397 non-null  float64       
dtypes: datetime64[ns](1), float64(4), int64(1), object(3)
memory usage: 113.8+ MB


In [39]:
# Find the earliest date the air quality dataset covers
clean_airdf["Date"].min()

Timestamp('2018-12-31 00:00:00')

In [40]:
# Find the latest date the air quality dataset covers:
clean_airdf["Date"].max()

Timestamp('2020-07-03 00:00:00')

In [41]:
clean_airdf["Country"].unique()

array(['AE', 'AF', 'AR', 'AT', 'AU', 'BA', 'BD', 'BE', 'BG', 'BH', 'BO',
       'BR', 'CA', 'CH', 'CI', 'CL', 'CN', 'CO', 'CR', 'CW', 'CY', 'CZ',
       'DE', 'DK', 'DZ', 'EC', 'EE', 'ES', 'ET', 'FI', 'FR', 'GB', 'GE',
       'GH', 'GN', 'GR', 'GT', 'HK', 'HR', 'HU', 'ID', 'IE', 'IL', 'IN',
       'IQ', 'IR', 'IS', 'IT', 'JO', 'JP', 'KG', 'KR', 'KW', 'KZ', 'LA',
       'LK', 'LT', 'MK', 'ML', 'MM', 'MN', 'MO', 'MX', 'MY', 'NL', 'NO',
       'NP', 'NZ', 'PE', 'PH', 'PK', 'PL', 'PR', 'PT', 'RE', 'RO', 'RS',
       'RU', 'SA', 'SE', 'SG', 'SK', 'SV', 'TH', 'TJ', 'TM', 'TR', 'TW',
       'UA', 'UG', 'US', 'UZ', 'VN', 'XK', 'ZA'], dtype=object)

There are 95 countries in the dataframe, including Australia (AU)....

There are 615 cities in our dataframe. Let's see what cities in Australia covered in the dataset.

In [19]:
# Identify the data points available for Australian cities 
clean_airdf.loc[clean_airdf["Country"]=="AU", "City"].value_counts()

Sydney        3364
Brisbane      3348
Melbourne     3284
Wollongong    3253
Darwin        3208
Adelaide      3177
Perth         3082
Newcastle     2871
Launceston    1126
Hobart        1126
Canberra      1093
Name: City, dtype: int64

### Explore the data through graphs

In [58]:
# Identifying the unique (common) air pollutants in the dataset 
air_pollutants = clean_airdf["Specie"].unique()
air_pollutants

array(['pm25', 'pm10', 'so2', 'o3', 'no2', 'co', 'aqi'], dtype=object)

LOOKING AT THE MEDIAN LEVELS OF CO2, AN AIR POLLUTANT THAT DUE TO ITS NATURE WOULD BE THE MOST IMPACTED BY COVID RESTRICTIONS SUCH AS LESS TRAVEL. WE WILL COMPARE A MAJOR AUSTRALIAN CITY WITH A MAJOR INDIAN CITY TO SHOW THAT AUSTRALIA'S AIR QUALITY AS IT STANDS IS QUITE GOOD. 

In [59]:
clean_airdf

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
498981,2018-12-31,AE,Abu Dhabi,pm25,24,72.0,152.0,112.0,5643.19
498891,2018-12-31,AE,Abu Dhabi,pm10,109,30.0,62.0,56.0,871.33
498918,2018-12-31,AE,Abu Dhabi,so2,109,1.6,27.9,4.6,240.10
498916,2018-12-31,AE,Abu Dhabi,o3,81,0.5,56.5,27.3,2349.96
499035,2018-12-31,AE,Abu Dhabi,no2,109,2.8,49.4,27.9,1492.45
...,...,...,...,...,...,...,...,...,...
360998,2020-07-03,ZA,Bloemfontein,pm10,10,106.0,652.0,206.0,284189.00
357959,2020-07-03,ZA,Pretoria,pm25,20,5.0,433.0,151.0,118732.00
360644,2020-07-03,ZA,Bloemfontein,pm25,10,174.0,488.0,257.0,103944.00
363835,2020-07-03,ZA,Klerksdorp,so2,10,1.1,3.1,1.6,5.69


CO DATA ON NEW DELHI IN INDA 

In [85]:
# Filter dataframe to just India 
india_df = clean_airdf.loc[clean_airdf["Country"] == "IN"] 

# Filter dataframe to just Delhi  
delhi_df = india_df.loc[india_df["City"] == "Delhi"]

# Filter dataframe to just co air pollutant 
delhi_co_df = delhi_df.loc[delhi_df["Specie"] == "co"]
delhi_co_df

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
526318,2018-12-31,IN,Delhi,co,640,0.1,105.3,21.5,3773.06
526323,2019-01-01,IN,Delhi,co,668,0.1,116.8,17.6,3940.94
526324,2019-01-02,IN,Delhi,co,681,0.1,138.9,19.9,4782.22
526362,2019-01-03,IN,Delhi,co,663,0.1,133.8,18.5,2615.45
526386,2019-01-04,IN,Delhi,co,645,0.2,91.0,15.0,1728.86
...,...,...,...,...,...,...,...,...,...
428264,2020-06-29,IN,Delhi,co,804,0.1,84.5,6.2,415.50
428213,2020-06-30,IN,Delhi,co,732,0.1,84.2,6.9,486.74
428172,2020-07-01,IN,Delhi,co,892,0.1,41.6,7.4,207.65
428310,2020-07-02,IN,Delhi,co,908,0.1,38.1,7.6,196.80


In [92]:
# Set Date column to index 
delhi_co_df.set_index('Date')

# Filter dates for 2019 Q1
start_date = '2019-01-01'
end_date = '2019-03-31'
Q1_2019 = (delhi_co_df['Date'] >= start_date) & (delhi_co_df['Date'] <= end_date)
delhi_Q1_2019 = delhi_co_df.loc[Q1_2019]
delhi_Q1_2019

# Filter dates for 2019 Q2
start_date = '2019-04-01'
end_date = '2019-06-30'
Q2_2019 = (delhi_co_df['Date'] >= start_date) & (delhi_co_df['Date'] <= end_date)
delhi_Q2_2019 = delhi_co_df.loc[Q2_2019]
delhi_Q2_2019

# Filter dates for 2020 Q1
start_date = '2020-01-01'
end_date = '2020-03-31'
Q1_2020 = (delhi_co_df['Date'] >= start_date) & (delhi_co_df['Date'] <= end_date)
delhi_Q1_2020 = delhi_co_df.loc[Q1_2020]


# Filter dates for 2020 Q2
start_date = '2020-04-01'
end_date = '2020-06-30'
Q2_2020 = (delhi_co_df['Date'] >= start_date) & (delhi_co_df['Date'] <= end_date)
delhi_Q2_2020 = delhi_co_df.loc[Q2_2020]

CO DATA ON MELBOURNE 

In [62]:
# Filter dataframe to just Australia 
au_df = clean_airdf.loc[clean_airdf["Country"] == "AU"] 

# Filter dataframe to just Melbourne 
mel_df = au_df.loc[au_df["City"] == "Melbourne"]

# Filter dataframe to just co air pollutant 
mel_co_df = mel_df.loc[mel_df["Specie"] == "co"]
mel_co_df

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
655955,2018-12-31,AU,Melbourne,co,36,1.2,3.4,2.3,5.10
655956,2019-01-01,AU,Melbourne,co,18,1.2,2.3,1.2,3.20
655963,2019-01-02,AU,Melbourne,co,18,1.2,2.3,1.2,3.20
655975,2019-01-03,AU,Melbourne,co,48,1.2,4.5,2.3,13.47
655942,2019-01-04,AU,Melbourne,co,54,1.2,4.5,2.3,13.09
...,...,...,...,...,...,...,...,...,...
175518,2020-06-29,AU,Melbourne,co,115,1.8,15.6,6.2,111.74
175618,2020-06-30,AU,Melbourne,co,107,1.7,11.2,3.2,46.83
175566,2020-07-01,AU,Melbourne,co,115,0.5,9.8,8.5,99.23
175597,2020-07-02,AU,Melbourne,co,100,1.0,7.6,2.0,17.19


In [95]:
# Set Date column to index 
mel_co_df.set_index('Date')

# Filter dates for 2019 Q1
start_date = '2019-01-01'
end_date = '2019-03-31'
Q1_2019 = (mel_co_df['Date'] >= start_date) & (mel_co_df['Date'] <= end_date)
mel_Q1_2019 = mel_co_df.loc[Q1_2019]
mel_Q1_2019

# Filter dates for 2019 Q2
start_date = '2019-04-01'
end_date = '2019-06-30'
Q2_2019 = (mel_co_df['Date'] >= start_date) & (mel_co_df['Date'] <= end_date)
mel_Q2_2019 = mel_co_df.loc[Q2_2019]
mel_Q2_2019

# Filter dates for 2020 Q1
start_date = '2020-01-01'
end_date = '2020-03-31'
Q1_2020 = (mel_co_df['Date'] >= start_date) & (mel_co_df['Date'] <= end_date)
mel_Q1_2020 = mel_co_df.loc[Q1_2020]

# Filter dates for 2020 Q2
start_date = '2020-04-01'
end_date = '2020-06-30'
Q2_2020 = (mel_co_df['Date'] >= start_date) & (mel_co_df['Date'] <= end_date)
mel_Q2_2020 = mel_co_df.loc[Q2_2020]

In [30]:
def specie_median_distribution(df, country):

    country_air_df = df[df["Country"] == country]

    fig, ax = plt.subplots(
        figsize=(10, 2*len(unique_species)),
        ncols=1,
        nrows=len(unique_species)
        )

    for index, specie in enumerate(unique_species):
        red_square=dict(markerfacecolor='r', marker='s', alpha=0.4)

        country_air_df[country_air_df["Specie"] == specie].boxplot(
            column="median",
            flierprops=red_square,
            ax=ax[index],
            vert=False)
        ax[index].set_title=f"Distribution of median {specie} values in {country} (2019-2020H1)"

In [31]:
specie_median_distribution(clean_airdf, "AU")

NameError: name 'unique_species' is not defined

## COVID-19 DATA EXPLORATION AND CLEAN UP

Covid-19 is sourced from here https://covid19api.com/

In [23]:
country_url = "https://api.covid19api.com/countries"
country_covid_data = requests.get(country_url).json()
pprint(country_covid_data)

[{'Country': 'Congo (Kinshasa)', 'ISO2': 'CD', 'Slug': 'congo-kinshasa'},
 {'Country': 'Pakistan', 'ISO2': 'PK', 'Slug': 'pakistan'},
 {'Country': 'Botswana', 'ISO2': 'BW', 'Slug': 'botswana'},
 {'Country': 'Denmark', 'ISO2': 'DK', 'Slug': 'denmark'},
 {'Country': 'Djibouti', 'ISO2': 'DJ', 'Slug': 'djibouti'},
 {'Country': 'Japan', 'ISO2': 'JP', 'Slug': 'japan'},
 {'Country': 'Niue', 'ISO2': 'NU', 'Slug': 'niue'},
 {'Country': 'ALA Aland Islands', 'ISO2': 'AX', 'Slug': 'ala-aland-islands'},
 {'Country': 'Bosnia and Herzegovina',
  'ISO2': 'BA',
  'Slug': 'bosnia-and-herzegovina'},
 {'Country': 'Comoros', 'ISO2': 'KM', 'Slug': 'comoros'},
 {'Country': 'French Southern Territories',
  'ISO2': 'TF',
  'Slug': 'french-southern-territories'},
 {'Country': 'Serbia', 'ISO2': 'RS', 'Slug': 'serbia'},
 {'Country': 'Slovenia', 'ISO2': 'SI', 'Slug': 'slovenia'},
 {'Country': 'Estonia', 'ISO2': 'EE', 'Slug': 'estonia'},
 {'Country': 'Isle of Man', 'ISO2': 'IM', 'Slug': 'isle-of-man'},
 {'Country':

In [24]:
country_covid_df = pd.DataFrame(country_covid_data)
country_covid_df

Unnamed: 0,Country,Slug,ISO2
0,Congo (Kinshasa),congo-kinshasa,CD
1,Pakistan,pakistan,PK
2,Botswana,botswana,BW
3,Denmark,denmark,DK
4,Djibouti,djibouti,DJ
...,...,...,...
243,Antarctica,antarctica,AQ
244,Cook Islands,cook-islands,CK
245,Germany,germany,DE
246,Paraguay,paraguay,PY


In [25]:
# Merge countries available on the air quality data and the covid data
country_covid_air_df = pd.merge(
    country_airdata_df, country_covid_df, how="left", left_on="country_code", right_on="ISO2")
country_covid_air_df

Unnamed: 0,country_code,Country,Slug,ISO2
0,IR,"Iran, Islamic Republic of",iran,IR
1,TJ,Tajikistan,tajikistan,TJ
2,BR,Brazil,brazil,BR
3,CN,China,china,CN
4,DK,Denmark,denmark,DK
...,...,...,...,...
90,CW,,,
91,GB,United Kingdom,united-kingdom,GB
92,NP,Nepal,nepal,NP
93,PT,Portugal,portugal,PT


In [26]:
# Find the country in the country_airdata_df but not country_covid_df
country_to_remove = country_covid_air_df[country_covid_air_df["ISO2"].isna()]["country_code"].tolist()
country_to_remove

['CW']

In [27]:
final_airdf = clean_airdf[~clean_airdf["Country"].isin(country_to_remove)].copy()

In [28]:
final_airdf["Country"].nunique()

94

In [29]:
final_airdf.to_csv("air_data.csv", index=False)

In [30]:
country_covid_air_df

Unnamed: 0,country_code,Country,Slug,ISO2
0,IR,"Iran, Islamic Republic of",iran,IR
1,TJ,Tajikistan,tajikistan,TJ
2,BR,Brazil,brazil,BR
3,CN,China,china,CN
4,DK,Denmark,denmark,DK
...,...,...,...,...
90,CW,,,
91,GB,United Kingdom,united-kingdom,GB
92,NP,Nepal,nepal,NP
93,PT,Portugal,portugal,PT


In [31]:
del country_covid_air_df["ISO2"]

In [32]:
# Return a dataframe covering all countries in both the air quality data and covid-19 data.
country_covid_air_df = country_covid_air_df[~country_covid_air_df["country_code"].isin(
    country_to_remove)].reset_index()
country_covid_air_df

Unnamed: 0,index,country_code,Country,Slug
0,0,IR,"Iran, Islamic Republic of",iran
1,1,TJ,Tajikistan,tajikistan
2,2,BR,Brazil,brazil
3,3,CN,China,china
4,4,DK,Denmark,denmark
...,...,...,...,...
89,89,KG,Kyrgyzstan,kyrgyzstan
90,91,GB,United Kingdom,united-kingdom
91,92,NP,Nepal,nepal
92,93,PT,Portugal,portugal


In [33]:
# Explore one covid API - By Country Total All Status
covid_url_example = "https://api.covid19api.com/total/country/australia"
covid_data_example = requests.get(covid_url_example).json()
pprint(covid_data_example)

[{'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-01-22T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-01-23T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-01-24T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-01-25T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 4,
  'City': '',
  'CityCode': '',
  'Confirmed': 4,
  'Country': 'Australia',
  'Co

  'Lon': '0',
  'Province': '',
  'Recovered': 1806},
 {'Active': 4449,
  'City': '',
  'CityCode': '',
  'Confirmed': 6315,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-04-12T00:00:00Z',
  'Deaths': 60,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 1806},
 {'Active': 4484,
  'City': '',
  'CityCode': '',
  'Confirmed': 6351,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-04-13T00:00:00Z',
  'Deaths': 61,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 1806},
 {'Active': 4167,
  'City': '',
  'CityCode': '',
  'Confirmed': 6415,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-04-14T00:00:00Z',
  'Deaths': 62,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 2186},
 {'Active': 4191,
  'City': '',
  'CityCode': '',
  'Confirmed': 6440,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-04-15T00:00:00Z',
  'Deaths': 63,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 2186},
 {

  'CountryCode': '',
  'Date': '2020-06-22T00:00:00Z',
  'Deaths': 102,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 6915},
 {'Active': 494,
  'City': '',
  'CityCode': '',
  'Confirmed': 7521,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-06-23T00:00:00Z',
  'Deaths': 103,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 6924},
 {'Active': 523,
  'City': '',
  'CityCode': '',
  'Confirmed': 7558,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-06-24T00:00:00Z',
  'Deaths': 104,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 6931},
 {'Active': 533,
  'City': '',
  'CityCode': '',
  'Confirmed': 7595,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-06-25T00:00:00Z',
  'Deaths': 104,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 6958},
 {'Active': 537,
  'City': '',
  'CityCode': '',
  'Confirmed': 7601,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-06-26T00:00:00Z',
 

The above api example covers covid-19 data until 4th July 2020. It also shows the number of confirmed, active, recovered, and death cases for each chosen country over the course of the current pandemic. Hence, we'll use this api to loop through our country_covid_air_df as above.

In [34]:
slug_list = country_covid_air_df["Slug"].tolist()
len(slug_list)

94

In [35]:
base_covid_url = "https://api.covid19api.com/total/country/"
    
country_list = list()
date_list = list()
active_list = list()
confirmed_list = list()
recovered_list = list()
deaths_list = list()

print("Beginning Data Retrieval")
print("-----------------------------------")

counter = 0
set_counter = 1

for slug in slug_list:
    
    try:
        response = requests.get(base_covid_url + slug).json()
    
        for element in response:
            country_list.append(element['Country'])
            date_list.append(element['Date'])
            active_list.append(element['Active'])
            confirmed_list.append(element['Confirmed'])
            recovered_list.append(element['Recovered'])
            deaths_list.append(element['Deaths'])

        counter += 1
        print(f"Processing Record {counter} of Set {set_counter} | {slug}")

        if counter == 50:
            set_counter += 1
            counter = 0

    except KeyError:
        print("Country not found. Skipping...")

print("-----------------------------------")
print("Data Retrieval Complete")
print("-----------------------------------")

Beginning Data Retrieval
-----------------------------------
Processing Record 1 of Set 1 | iran
Processing Record 2 of Set 1 | tajikistan
Processing Record 3 of Set 1 | brazil
Processing Record 4 of Set 1 | china
Processing Record 5 of Set 1 | denmark
Processing Record 6 of Set 1 | spain
Processing Record 7 of Set 1 | mali
Processing Record 8 of Set 1 | slovakia
Processing Record 9 of Set 1 | kosovo
Processing Record 10 of Set 1 | chile
Processing Record 11 of Set 1 | germany
Processing Record 12 of Set 1 | kuwait
Processing Record 13 of Set 1 | myanmar
Processing Record 14 of Set 1 | philippines
Processing Record 15 of Set 1 | pakistan
Processing Record 16 of Set 1 | poland
Processing Record 17 of Set 1 | russia
Processing Record 18 of Set 1 | sweden
Processing Record 19 of Set 1 | singapore
Processing Record 20 of Set 1 | united-arab-emirates
Processing Record 21 of Set 1 | bosnia-and-herzegovina
Processing Record 22 of Set 1 | czech-republic
Processing Record 23 of Set 1 | indonesi

In [36]:
covid_df = pd.DataFrame({
    "Country": country_list,
    "Date": date_list,
    "Active cases": active_list,
    "Confirmed cases": confirmed_list,
    "Recovered cases": recovered_list,
    "Deaths": deaths_list
})
covid_df.head()

Unnamed: 0,Country,Date,Active cases,Confirmed cases,Recovered cases,Deaths
0,"Iran, Islamic Republic of",2020-01-22T00:00:00Z,0,0,0,0
1,"Iran, Islamic Republic of",2020-01-23T00:00:00Z,0,0,0,0
2,"Iran, Islamic Republic of",2020-01-24T00:00:00Z,0,0,0,0
3,"Iran, Islamic Republic of",2020-01-25T00:00:00Z,0,0,0,0
4,"Iran, Islamic Republic of",2020-01-26T00:00:00Z,0,0,0,0


In [37]:
covid_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14863 entries, 0 to 14862
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Country          14863 non-null  object
 1   Date             14863 non-null  object
 2   Active cases     14863 non-null  int64 
 3   Confirmed cases  14863 non-null  int64 
 4   Recovered cases  14863 non-null  int64 
 5   Deaths           14863 non-null  int64 
dtypes: int64(4), object(2)
memory usage: 696.8+ KB


In [38]:
# Convert the Date column to datetime format
covid_df['Date'] = covid_df['Date'].astype('datetime64[ns]')

In [39]:
# Find the earliest date the covid dataset covers:
covid_df["Date"].min()

Timestamp('2020-01-22 00:00:00')

In [40]:
# Find the latrest date the covid dataset covers:
covid_df["Date"].max()

Timestamp('2020-07-06 00:00:00')

In [41]:
covid_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14863 entries, 0 to 14862
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Country          14863 non-null  object        
 1   Date             14863 non-null  datetime64[ns]
 2   Active cases     14863 non-null  int64         
 3   Confirmed cases  14863 non-null  int64         
 4   Recovered cases  14863 non-null  int64         
 5   Deaths           14863 non-null  int64         
dtypes: datetime64[ns](1), int64(4), object(1)
memory usage: 696.8+ KB


In [42]:
covid_df.to_csv("covid_data.csv", index=False)

## COVID-19 in AUSTRALIA DATA EXPLORATION AND CLEAN UP

In [37]:
au_covid_url = "https://interactive.guim.co.uk/docsdata/1q5gdePANXci8enuiS4oHUJxcxC13d6bjMRSicakychE.json"
au_covid_data = requests.get(au_covid_url).json()
pprint(au_covid_data)

{'sheets': {'about': [{'about': 'This data has been compiled by Guardian '
                                'Australia from official state and territory '
                                'media releases and websites. Some death dates '
                                'and figures are from media reports. We assign '
                                'cases to the date on which they were reported '
                                'by the health department, and deaths are '
                                'assigned to the date they occured. Extended '
                                'data on testing and demographics varies '
                                'between each state and territory so may not '
                                'always be available. Please contact '
                                'australia.coronatracking@theguardian.com if '
                                'you spot an error in the data or to make a '
                                'suggestion. This data is released

                         'dashboard': 'https://dohwa.maps.arcgis.com/apps/opsdashboard/index.html#/744650bd230546928a0df2e87fd5b8e5',
                         'email': 'media@health.wa.gov.au',
                         'media releases': 'https://ww2.health.wa.gov.au/News/Media-releases-listing-page',
                         'state': 'WA'},
                        {'daily update': 'https://www.health.act.gov.au/about-our-health-system/novel-coronavirus-covid-19',
                         'dashboard': '',
                         'email': 'healthmedia@act.gov.au',
                         'media releases': '',
                         'state': 'ACT'},
                        {'daily update': 'https://www.dhhs.tas.gov.au/news/2020',
                         'dashboard': 'https://coronavirus.tas.gov.au/facts/cases-and-testing-updates',
                         'email': '',
                         'media releases': 'https://www.dhhs.tas.gov.au/news/2020',
                         'state':

                         'Recovered (cumulative)': '',
                         'State': 'QLD',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '',
                         'Update Source': 'Queensland Health',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '27',
                         'Cumulative deaths': '',
                         'Date': '12/03/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'VIC',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '',
                         'Update Source': 'Victoria DHHS',
                     

                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'SA',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '17:12',
                         'Update Source': 'national summary',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '205',
                         'Cumulative deaths': '',
                         'Date': '25/03/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '6',
                         'State': 'WA',
                         'Tests conducted (negative)': '',
                   

                        {'Cumulative case count': '470',
                         'Cumulative deaths': '6',
                         'Date': '07/04/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'WA',
                         'Tests conducted (negative)': '18731',
                         'Tests conducted (total)': '',
                         'Time': '14:00',
                         'Update Source': 'daily snapshot',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '934',
                         'Cumulative deaths': '',
                         'Date': '07/04/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                

                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'TAS',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '17:30',
                         'Update Source': 'national summary',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '',
                         'Cumulative deaths': '36',
                         'Date': '25/04/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'NSW',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '',
                         'Update Sourc

                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'TAS',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '20:15',
                         'Update Source': 'National figures',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '30',
                         'Cumulative deaths': '',
                         'Date': '08/05/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '28',
                         'State': 'NT',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '5117',
               

                         'Update Source': 'media release',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '107',
                         'Cumulative deaths': '',
                         'Date': '25/05/2020',
                         'Hospitalisations (count)': '0',
                         'Intensive care (count)': '0',
                         'Notes': '',
                         'Recovered (cumulative)': '104',
                         'State': 'ACT',
                         'Tests conducted (negative)': '15899',
                         'Tests conducted (total)': '',
                         'Time': '14:15',
                         'Update Source': 'media release',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '1056',
                         'Cumulative deaths': '',
                         'Date': '25/05/2020',
                         'Hospitalisations (

                         'State': 'NSW',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '661226',
                         'Time': '20:00',
                         'Update Source': 'media release',
                         'Ventilator usage (count)': '0'},
                        {'Cumulative case count': '29',
                         'Cumulative deaths': '',
                         'Date': '17/06/2020',
                         'Hospitalisations (count)': '0',
                         'Intensive care (count)': '0',
                         'Notes': '',
                         'Recovered (cumulative)': '29',
                         'State': 'NT',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '10665',
                         'Time': '08:30',
                         'Update Source': 'media release',
                         'Ventilator usage (count)': '0

In [38]:
au_covid_data.keys()

dict_keys(['sheets'])

In [39]:
au_covid_data['sheets'].keys()

dict_keys(['updates', 'deaths', 'latest totals', 'locations', 'sources', 'about', 'data validation'])

In [40]:
covid_by_state = au_covid_data['sheets']['updates']
covid_by_state[-1]

{'State': 'VIC',
 'Date': '07/07/2020',
 'Time': '13:00',
 'Cumulative case count': '2824',
 'Cumulative deaths': '',
 'Tests conducted (negative)': '',
 'Tests conducted (total)': '979000',
 'Hospitalisations (count)': '35',
 'Intensive care (count)': '9',
 'Ventilator usage (count)': '0',
 'Recovered (cumulative)': '2028',
 'Update Source': 'media release',
 'Notes': ''}

In [41]:
state_list = list()
date_list = list()
cumulative_case_count = list()
cumulative_recovered_count = list()

for element in covid_by_state:
    state_list.append(element["State"])
    date_list.append(element["Date"])
    cumulative_case_count.append(element["Cumulative case count"])
    cumulative_recovered_count.append(element["Recovered (cumulative)"])

In [42]:
clean_au_covid = pd.DataFrame({
    "State": state_list,
    "Date": date_list,
    "Cumulative case count": cumulative_case_count,
    "Cumulative recovered count": cumulative_recovered_count
})
clean_au_covid.head()

Unnamed: 0,State,Date,Cumulative case count,Cumulative recovered count
0,SA,23/01/2020,0,
1,VIC,25/01/2020,1,
2,NSW,25/01/2020,3,
3,NSW,27/01/2020,4,
4,QLD,28/01/2020,0,


In [43]:
clean_au_covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 870 entries, 0 to 869
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   State                       870 non-null    object
 1   Date                        870 non-null    object
 2   Cumulative case count       870 non-null    object
 3   Cumulative recovered count  870 non-null    object
dtypes: object(4)
memory usage: 27.3+ KB


In [44]:
clean_au_covid["Cumulative case count"] = pd.to_numeric(
    clean_au_covid["Cumulative case count"])

In [45]:
clean_au_covid["Cumulative recovered count"] = pd.to_numeric(
    clean_au_covid["Cumulative recovered count"].str.replace(",", ""))

In [46]:
# Convert the Date column to datetime format
clean_au_covid['Date'] = pd.to_datetime(
    clean_au_covid["Date"], format="%d/%m/%Y")

In [47]:
clean_au_covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 870 entries, 0 to 869
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   State                       870 non-null    object        
 1   Date                        870 non-null    datetime64[ns]
 2   Cumulative case count       841 non-null    float64       
 3   Cumulative recovered count  434 non-null    float64       
dtypes: datetime64[ns](1), float64(2), object(1)
memory usage: 27.3+ KB


In [48]:
clean_au_covid.fillna(0, inplace=True)

In [49]:
clean_au_covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 870 entries, 0 to 869
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   State                       870 non-null    object        
 1   Date                        870 non-null    datetime64[ns]
 2   Cumulative case count       870 non-null    float64       
 3   Cumulative recovered count  870 non-null    float64       
dtypes: datetime64[ns](1), float64(2), object(1)
memory usage: 27.3+ KB


In [50]:
clean_au_covid[["Cumulative case count", "Cumulative recovered count"]] = clean_au_covid[[
    "Cumulative case count", "Cumulative recovered count"]].astype(int)

In [51]:
clean_au_covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 870 entries, 0 to 869
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   State                       870 non-null    object        
 1   Date                        870 non-null    datetime64[ns]
 2   Cumulative case count       870 non-null    int32         
 3   Cumulative recovered count  870 non-null    int32         
dtypes: datetime64[ns](1), int32(2), object(1)
memory usage: 20.5+ KB


In [52]:
clean_au_covid.tail(20)

Unnamed: 0,State,Date,Cumulative case count,Cumulative recovered count
850,NT,2020-07-02,30,29
851,VIC,2020-07-02,2303,1866
852,NSW,2020-07-02,3211,2789
853,VIC,2020-07-03,2368,1904
854,WA,2020-07-03,611,598
855,NSW,2020-07-03,3216,2788
856,WA,2020-07-04,612,598
857,QLD,2020-07-04,1067,0
858,ACT,2020-07-04,108,105
859,NT,2020-07-04,31,30


In [53]:
# Find the earliest date the covid dataset covers:
clean_au_covid["Date"].min()

Timestamp('2020-01-23 00:00:00')

In [54]:
# Find the latrest date the covid dataset covers:
clean_au_covid["Date"].max()

Timestamp('2020-07-07 00:00:00')

In [58]:
final_au_covid =  clean_au_covid[clean_au_covid["Date"] <= "2020-07-05"].copy()

In [59]:
final_au_covid.to_csv("au_covid_data.csv", index=False)

In [60]:
final_au_covid["Date"].max()

Timestamp('2020-07-05 00:00:00')

In [None]:
# import time
# from scipy.stats import linregress

# Incorporated citipy to determine city based on latitude and longitude
# from citipy import citipy

In [None]:
# url = "https://api.openaq.org/v1/measurements"

# data = requests.get(url).json()
# pprint(data)

In [None]:
# len(data["results"])

In [None]:
# data["results"][0]

In [None]:
# Data source: https://aqicn.org/api/
# base_url = "https://api.waqi.info/feed/"
# city = "melbourne"
# url = f"{base_url}{city}/?token={api_key}"

In [None]:
# url_2 = "https://api.covid19api.com/country/south-africa/status/confirmed/live"
# covid_data_2 = requests.get(url_2).json()
# pprint(covid_data_2[-1])