# IMPACT OF COVID-19 PANDEMIC ON AIR QUALITY

## Air quality Data Exploration and Cleanup

In [1]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import json
from pprint import pprint

# Import API key
from config import api_key

"https://aqicn.org/data-platform/covid19/
With the COVID-19 spreading out all over the world, the World Air Quality Index project team saw a surge in requests for global data covering the whole world map. As a result, the WAQI project is now providing a new dedicated data-set, updated 3 times a day, and covering about 380 major cities in the world, from January 2020 until now.

The data for each major cities is based on the average (median) of several stations. The data set provides min, max, median and standard deviation for each of the air pollutant species (PM2.5,PM10, Ozone ...) as well as meteorological data (Wind, Temperature, ...). All air pollutant species are converted to the US EPA standard (i.e. no raw concentrations). All dates are UTC based. The count column is the number of samples used for calculating the median and standard deviation."

In [2]:
periods = ["2020", "2019Q1", "2019Q2", "2019Q3", "2019Q4"]

df_list = list()

for period in periods:
    path = f"historical_data/waqi-covid19-airqualitydata-{period}.csv"
    df = pd.read_csv(path)
    df_list.append(df)

In [3]:
airdf_2019_2020 = pd.concat(df_list, ignore_index=True)

In [4]:
airdf_2019_2020.head()

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
0,31/05/2020,IR,Isfahan,temperature,120,17.5,35.0,27.5,331.51
1,13/06/2020,IR,Isfahan,temperature,144,16.0,36.5,27.5,488.74
2,3/07/2020,IR,Isfahan,temperature,67,19.0,33.0,24.0,128.08
3,28/03/2020,IR,Isfahan,temperature,240,3.0,14.0,9.5,136.68
4,23/04/2020,IR,Isfahan,temperature,168,6.0,25.5,16.0,400.79


In [5]:
# Display an overview of the Specie column
airdf_2019_2020["Specie"].unique()

array(['temperature', 'wind-speed', 'wind-gust', 'dew', 'pm25',
       'humidity', 'wind speed', 'pressure', 'wind gust', 'co', 'so2',
       'precipitation', 'no2', 'pm10', 'o3', 'aqi', 'pol', 'uvi', 'wd',
       'neph', 'mepaqi', 'pm1'], dtype=object)

In [6]:
airdf_2019_2020["Specie"].value_counts()

temperature      310236
humidity         310145
pressure         308517
pm25             270688
no2              266793
pm10             264859
wind-speed       263292
o3               250438
so2              226576
dew              226147
co               203776
wind-gust        172805
wind speed        47002
wind gust         29576
precipitation     26825
wd                25720
aqi                8267
uvi                5632
pol                3790
pm1                1380
mepaqi              564
neph                440
Name: Specie, dtype: int64

> We understand that "Air movements influence the fate of air pollutants. So any study of air pollution should include a study of the local weather patterns (meteorology). If the air is calm and pollutants cannot disperse, then the concentration of these pollutants will build up. On the other hand, when strong, turbulent winds blow, pollutants disperse quickly, resulting in lower pollutant concentrations." https://www.qld.gov.au/environment/pollution/monitoring/air/air-monitoring/meteorology-influence/meteorology-factors#:~:text=Meteorological%20factors-,Meteorological%20factors,these%20pollutants%20will%20build%20up.
Hence the Meteorology parameters like temperature, humidity, pressure, wind speed, to name a few, should have some sorts of correlations with the air quality.
(http://www.bom.gov.au/vic/observations/melbourne.shtml)

> However, due to the scope of our project, we'll only focus on air pollutant parameters to assess their changes before COVID-19 and 6 months into the pandemic. We're not trying to explain the causes of air quality change. Hence, we'll remove data related to the following meteorology-related species: **temperature, humidity, pressure, wind-speed, dew, wind-gust, wind speed, wind gust, precipitation, wd (wind direction), uvi**.
https://aqicn.org/publishingdata/

> We'll also remove species with the least number of available data points  including **pol, pm1, mepaqi, neph**.

In [7]:
species_to_remove = ["temperature", "humidity", "pressure", "wind-speed", "dew", "wind-gust", "wind speed", "wind gust", "precipitation", "wd", "uvi", "pol", "pm1", "mepaqi", "neph"]
                     
final_airdf = airdf_2019_2020[~airdf_2019_2020["Specie"].isin(species_to_remove)].copy()

In [8]:
final_airdf["Specie"].value_counts()

pm25    270688
no2     266793
pm10    264859
o3      250438
so2     226576
co      203776
aqi       8267
Name: Specie, dtype: int64

More about AQI:
https://www.airnow.gov/aqi/aqi-basics/
https://www.airnow.gov/sites/default/files/2020-05/aqi-technical-assistance-document-sept2018.pdf
"Five major pollutants:
EPA establishes an AQI for five major air pollutants regulated by the Clean Air Act. Each of these pollutants has a national air quality standard set by EPA to protect public health:

* Ground-level ozone **o3** (ppm - parts per million)
* Particulate Matter - including PM2.5 **pm25** and PM10 **pm10** (μg/m3)
* Carbon Monoxide **co** (ppm)
* Sulfur Dioxide **so2** (ppb - parts per billion)
* Nitrogen Dioxide **no2** (ppb)

https://en.wikipedia.org/wiki/Air_pollution

"https://waqi.info/
The Air Quality Index is based on measurement of particulate matter (PM2.5 and PM10), Ozone (O3), Nitrogen Dioxide (NO2), Sulfur Dioxide (SO2) and Carbon Monoxide (CO) emissions. Most of the stations on the map are monitoring both PM2.5 and PM10 data, but there are few exceptions where only PM10 is available.

All measurements are based on hourly readings: For instance, an AQI reported at 8AM means that the measurement was done from 7AM to 8AM.
More details https://aqicn.org/faq/


https://www.weatherbit.io/api/airquality-history#:~:text=Air%20Quality%20API%20(Historical),an%20air%20quality%20index%20score.

aqi: Air Quality Index [US - EPA standard 0 - +500]
o3: Concentration of surface O3 (µg/m³)
so2: Concentration of surface SO2 (µg/m³)
no2: Concentration of surface NO2 (µg/m³)
co: Concentration of carbon monoxide (µg/m³)
pm25: Concentration of particulate matter < 2.5 microns (µg/m³)
pm10: Concentration of particulate matter < 10 microns (µg/m³)

Some good info on air pollution impacts https://ourworldindata.org/air-pollution
https://www.who.int/health-topics/air-pollution#tab=tab_1
https://www.epa.vic.gov.au/for-community/airwatch
https://www.kaggle.com/frtgnn/clean-air-india-s-air-quality/data

In [9]:
final_airdf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1491397 entries, 439 to 3223283
Data columns (total 9 columns):
 #   Column    Non-Null Count    Dtype  
---  ------    --------------    -----  
 0   Date      1491397 non-null  object 
 1   Country   1491397 non-null  object 
 2   City      1491397 non-null  object 
 3   Specie    1491397 non-null  object 
 4   count     1491397 non-null  int64  
 5   min       1491397 non-null  float64
 6   max       1491397 non-null  float64
 7   median    1491397 non-null  float64
 8   variance  1491397 non-null  float64
dtypes: float64(4), int64(1), object(4)
memory usage: 113.8+ MB


We can see that the Date column is of generic object type. Since we want to perform some time related analysis on this data, we need to convert it to a datetime format. Let’s use to_datetime() function to convert the Date column into a datetime object. 

In [10]:
final_airdf["Date"] = pd.to_datetime(final_airdf["Date"], format="%d/%m/%Y")

In [11]:
final_airdf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1491397 entries, 439 to 3223283
Data columns (total 9 columns):
 #   Column    Non-Null Count    Dtype         
---  ------    --------------    -----         
 0   Date      1491397 non-null  datetime64[ns]
 1   Country   1491397 non-null  object        
 2   City      1491397 non-null  object        
 3   Specie    1491397 non-null  object        
 4   count     1491397 non-null  int64         
 5   min       1491397 non-null  float64       
 6   max       1491397 non-null  float64       
 7   median    1491397 non-null  float64       
 8   variance  1491397 non-null  float64       
dtypes: datetime64[ns](1), float64(4), int64(1), object(3)
memory usage: 113.8+ MB


In [12]:
# Find the earliest date the dataset covers:
final_airdf["Date"].min()

Timestamp('2018-12-31 00:00:00')

In [13]:
# Find the latest date the dataset covers:
final_airdf["Date"].max()

Timestamp('2020-07-03 00:00:00')

In [14]:
final_airdf["Country"].unique()

array(['IR', 'TJ', 'BR', 'CN', 'DK', 'ES', 'ML', 'SK', 'XK', 'CL', 'DE',
       'KW', 'MM', 'PH', 'PK', 'PL', 'RU', 'SE', 'SG', 'AE', 'BA', 'CZ',
       'ID', 'IS', 'MO', 'RO', 'AR', 'AU', 'EC', 'GH', 'HK', 'PE', 'UA',
       'EE', 'FR', 'JP', 'MN', 'FI', 'IE', 'IL', 'KZ', 'LA', 'UZ', 'BD',
       'BE', 'GR', 'KR', 'LK', 'MK', 'MX', 'TR', 'AF', 'AT', 'GT', 'BO',
       'CR', 'JO', 'PR', 'SA', 'SV', 'CA', 'IT', 'NO', 'RE', 'TM', 'ZA',
       'BH', 'LT', 'TH', 'BG', 'CH', 'HU', 'MY', 'NL', 'NZ', 'UG', 'VN',
       'ET', 'GE', 'GN', 'IQ', 'RS', 'TW', 'CI', 'CO', 'CY', 'DZ', 'HR',
       'IN', 'KG', 'CW', 'GB', 'NP', 'PT', 'US'], dtype=object)

There are 95 countries in the dataframe, including Australia (AU)....

In [15]:
# Display an overview of the Country column
country_airdata_df = pd.DataFrame(final_airdf["Country"].unique(), columns=["country_code"])
country_airdata_df

Unnamed: 0,country_code
0,IR
1,TJ
2,BR
3,CN
4,DK
...,...
90,CW
91,GB
92,NP
93,PT


In [16]:
# Display an overview of the City column
final_airdf["City"].unique()

array(['Isfahan', 'Arāk', 'Karaj', 'Qom', 'Orūmīyeh', 'Yazd', 'Īlām',
       'Kerman', 'Khorramshahr', 'Tabriz', 'Bandar Abbas', 'Sanandaj',
       'Kermanshah', 'Khorramabad', 'Shiraz', 'Zanjān', 'Mashhad',
       'Tehran', 'Dushanbe', 'São José dos Campos', 'Vitória',
       'São Paulo', 'Beijing', 'Jieyang', 'Kunming', 'Hangzhou',
       'Chongqing', 'Qingdao', 'Haikou', 'Ürümqi', 'Qiqihar', 'Guiyang',
       'Shenzhen', 'Yunfu', 'Xuchang', 'Yinchuan', 'Shenyang', 'Lhasa',
       'Shanghai', 'Changchun', 'Foshan', 'Nanning', 'Fushun', 'Hefei',
       'Chengdu', 'Hohhot', 'Qinhuangdao', 'Shijiazhuang', 'Shantou',
       'Zhengzhou', 'Nanjing', 'Xining', 'Xi’an', 'Zhuzhou', 'Wuhan',
       'Tianjin', 'Changzhou', 'Nanchang', 'Shiyan', 'Xinxiang', 'Suzhou',
       'Harbin', 'Lanzhou', 'Jinan', 'Changsha', 'Hegang', 'Anyang',
       'Wuxi', 'Taiyuan', 'Guangzhou', 'Fuzhou', 'Ningbo', 'Xiamen',
       'Dongguan', 'Copenhagen', 'Las Palmas de Gran Canaria',
       'Salamanca', 'Barcelona'

In [17]:
final_airdf["City"].value_counts()

London          5075
Shijiazhuang    3612
Qinhuangdao     3502
Anyang          3482
Beijing         3443
                ... 
Almaty           132
Abidjan          115
Conakry          115
Accra             97
Zamboanga         60
Name: City, Length: 615, dtype: int64

There are 615 cities in our dataframe. Let's see what cities in Australia covered in the dataset.

In [18]:
final_airdf.loc[airdf_2019_2020["Country"]=="AU", "City"].value_counts()

Sydney        3364
Brisbane      3348
Melbourne     3284
Wollongong    3253
Darwin        3208
Adelaide      3177
Perth         3082
Newcastle     2871
Hobart        1126
Launceston    1126
Canberra      1093
Name: City, dtype: int64

In [19]:
final_airdf.to_csv("air_data.csv")

## COVID-19 DATA EXPLORATION AND CLEAN UP

Covid-19 is sourced from here https://covid19api.com/

In [20]:
country_url = "https://api.covid19api.com/countries"
country_covid_data = requests.get(country_url).json()
pprint(country_covid_data)

[{'Country': 'French Polynesia', 'ISO2': 'PF', 'Slug': 'french-polynesia'},
 {'Country': 'Isle of Man', 'ISO2': 'IM', 'Slug': 'isle-of-man'},
 {'Country': 'Korea (South)', 'ISO2': 'KR', 'Slug': 'korea-south'},
 {'Country': 'Mongolia', 'ISO2': 'MN', 'Slug': 'mongolia'},
 {'Country': 'Niger', 'ISO2': 'NE', 'Slug': 'niger'},
 {'Country': 'Guernsey', 'ISO2': 'GG', 'Slug': 'guernsey'},
 {'Country': 'Pakistan', 'ISO2': 'PK', 'Slug': 'pakistan'},
 {'Country': 'Tunisia', 'ISO2': 'TN', 'Slug': 'tunisia'},
 {'Country': 'United States of America', 'ISO2': 'US', 'Slug': 'united-states'},
 {'Country': 'US Minor Outlying Islands',
  'ISO2': 'UM',
  'Slug': 'us-minor-outlying-islands'},
 {'Country': 'Benin', 'ISO2': 'BJ', 'Slug': 'benin'},
 {'Country': 'Kuwait', 'ISO2': 'KW', 'Slug': 'kuwait'},
 {'Country': 'Switzerland', 'ISO2': 'CH', 'Slug': 'switzerland'},
 {'Country': 'Iceland', 'ISO2': 'IS', 'Slug': 'iceland'},
 {'Country': 'Saudi Arabia', 'ISO2': 'SA', 'Slug': 'saudi-arabia'},
 {'Country': 'Som

In [21]:
country_covid_df = pd.DataFrame(country_covid_data)
country_covid_df

Unnamed: 0,Country,Slug,ISO2
0,French Polynesia,french-polynesia,PF
1,Isle of Man,isle-of-man,IM
2,Korea (South),korea-south,KR
3,Mongolia,mongolia,MN
4,Niger,niger,NE
...,...,...,...
243,Cambodia,cambodia,KH
244,Hungary,hungary,HU
245,United Arab Emirates,united-arab-emirates,AE
246,Central African Republic,central-african-republic,CF


In [22]:
# Merge countries available on the air quality data and the covid data
country_covid_air_df = pd.merge(country_airdata_df, country_covid_df, how="inner", left_on="country_code", right_on="ISO2")
country_covid_air_df

Unnamed: 0,country_code,Country,Slug,ISO2
0,IR,"Iran, Islamic Republic of",iran,IR
1,TJ,Tajikistan,tajikistan,TJ
2,BR,Brazil,brazil,BR
3,CN,China,china,CN
4,DK,Denmark,denmark,DK
...,...,...,...,...
89,KG,Kyrgyzstan,kyrgyzstan,KG
90,GB,United Kingdom,united-kingdom,GB
91,NP,Nepal,nepal,NP
92,PT,Portugal,portugal,PT


In [23]:
# Return a dataframe covering all countries in both the air quality data and covid-19 data.
del country_covid_air_df["ISO2"]
country_covid_air_df

Unnamed: 0,country_code,Country,Slug
0,IR,"Iran, Islamic Republic of",iran
1,TJ,Tajikistan,tajikistan
2,BR,Brazil,brazil
3,CN,China,china
4,DK,Denmark,denmark
...,...,...,...
89,KG,Kyrgyzstan,kyrgyzstan
90,GB,United Kingdom,united-kingdom
91,NP,Nepal,nepal
92,PT,Portugal,portugal


In [24]:
# Explore one covid API - By Country Total All Status
covid_url_example = "https://api.covid19api.com/total/country/south-africa"
covid_data_example = requests.get(covid_url_example).json()
pprint(covid_data_example)

[{'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-01-22T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-01-23T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-01-24T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-01-25T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'South

  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-04-07T00:00:00Z',
  'Deaths': 13,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 95},
 {'Active': 1732,
  'City': '',
  'CityCode': '',
  'Confirmed': 1845,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-04-08T00:00:00Z',
  'Deaths': 18,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 95},
 {'Active': 1821,
  'City': '',
  'CityCode': '',
  'Confirmed': 1934,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-04-09T00:00:00Z',
  'Deaths': 18,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 95},
 {'Active': 1569,
  'City': '',
  'CityCode': '',
  'Confirmed': 2003,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-04-10T00:00:00Z',
  'Deaths': 24,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 410},
 {'Active': 1593,
  'City': '',
  'CityCode': '',
  'Confirmed': 2028,
  'Country': 'South Africa',
  'CountryCode': '',
 

  'Date': '2020-06-10T00:00:00Z',
  'Deaths': 1210,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 31505},
 {'Active': 24032,
  'City': '',
  'CityCode': '',
  'Confirmed': 58568,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-06-11T00:00:00Z',
  'Deaths': 1284,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 33252},
 {'Active': 25567,
  'City': '',
  'CityCode': '',
  'Confirmed': 61927,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-06-12T00:00:00Z',
  'Deaths': 1354,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 35006},
 {'Active': 27463,
  'City': '',
  'CityCode': '',
  'Confirmed': 65736,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-06-13T00:00:00Z',
  'Deaths': 1423,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 36850},
 {'Active': 30027,
  'City': '',
  'CityCode': '',
  'Confirmed': 70038,
  'Country': 'South Africa',
  'CountryCode': '',
  'Date': '2020-06-14T00

The above api example covers covid-19 data until 3rd July 2020. It also shows the number of confirmed, active, recovered, and death cases for each chosen country over the course of the current pandemic. Hence, we'll use this api to loop through our country_covid_air_df as above.

In [25]:
slug_list = country_covid_air_df["Slug"].tolist()

In [26]:
base_covid_url = "https://api.covid19api.com/total/country/"
    
country_list = list()
date_list = list()
active_list = list()
confirmed_list = list()
recovered_list = list()
deaths_list = list()

print("Beginning Data Retrieval")
print("-----------------------------------")

counter = 0
set_counter = 1

for slug in slug_list:
    
    try:
        response = requests.get(base_covid_url + slug).json()
    
        for element in response:
            country_list.append(element['Country'])
            date_list.append(element['Date'])
            active_list.append(element['Active'])
            confirmed_list.append(element['Confirmed'])
            recovered_list.append(element['Recovered'])
            deaths_list.append(element['Deaths'])

        counter += 1
        print(f"Processing Record {counter} of Set {set_counter} | {slug}")

        if counter == 50:
            set_counter += 1
            counter = 0

    except KeyError:
        print("Country not found. Skipping...")

print("-----------------------------------")
print("Data Retrieval Complete")
print("-----------------------------------")

Beginning Data Retrieval
-----------------------------------
Processing Record 1 of Set 1 | iran
Processing Record 2 of Set 1 | tajikistan
Processing Record 3 of Set 1 | brazil
Processing Record 4 of Set 1 | china
Processing Record 5 of Set 1 | denmark
Processing Record 6 of Set 1 | spain
Processing Record 7 of Set 1 | mali
Processing Record 8 of Set 1 | slovakia
Processing Record 9 of Set 1 | kosovo
Processing Record 10 of Set 1 | chile
Processing Record 11 of Set 1 | germany
Processing Record 12 of Set 1 | kuwait
Processing Record 13 of Set 1 | myanmar
Processing Record 14 of Set 1 | philippines
Processing Record 15 of Set 1 | pakistan
Processing Record 16 of Set 1 | poland
Processing Record 17 of Set 1 | russia
Processing Record 18 of Set 1 | sweden
Processing Record 19 of Set 1 | singapore
Processing Record 20 of Set 1 | united-arab-emirates
Processing Record 21 of Set 1 | bosnia-and-herzegovina
Processing Record 22 of Set 1 | czech-republic
Processing Record 23 of Set 1 | indonesi

In [27]:
covid_df = pd.DataFrame({
    "Country": country_list,
    "Date": date_list,
    "Active cases": active_list,
    "Confirmed cases": confirmed_list,
    "Recovered cases": recovered_list,
    "Deaths": deaths_list
})
covid_df.head()

Unnamed: 0,Country,Date,Active cases,Confirmed cases,Recovered cases,Deaths
0,"Iran, Islamic Republic of",2020-01-22T00:00:00Z,0,0,0,0
1,"Iran, Islamic Republic of",2020-01-23T00:00:00Z,0,0,0,0
2,"Iran, Islamic Republic of",2020-01-24T00:00:00Z,0,0,0,0
3,"Iran, Islamic Republic of",2020-01-25T00:00:00Z,0,0,0,0
4,"Iran, Islamic Republic of",2020-01-26T00:00:00Z,0,0,0,0


In [28]:
covid_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14685 entries, 0 to 14684
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Country          14685 non-null  object
 1   Date             14685 non-null  object
 2   Active cases     14685 non-null  int64 
 3   Confirmed cases  14685 non-null  int64 
 4   Recovered cases  14685 non-null  int64 
 5   Deaths           14685 non-null  int64 
dtypes: int64(4), object(2)
memory usage: 688.5+ KB


In [29]:
# Convert the Date column to datetime format
covid_df['Date'] = covid_df['Date'].astype('datetime64[ns]')

In [30]:
covid_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14685 entries, 0 to 14684
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Country          14685 non-null  object        
 1   Date             14685 non-null  datetime64[ns]
 2   Active cases     14685 non-null  int64         
 3   Confirmed cases  14685 non-null  int64         
 4   Recovered cases  14685 non-null  int64         
 5   Deaths           14685 non-null  int64         
dtypes: datetime64[ns](1), int64(4), object(1)
memory usage: 688.5+ KB


In [31]:
covid_df.to_csv("covid_data.csv")

In [32]:
# import time
# from scipy.stats import linregress

# Incorporated citipy to determine city based on latitude and longitude
# from citipy import citipy

In [33]:
# url = "https://api.openaq.org/v1/measurements"

# data = requests.get(url).json()
# pprint(data)

In [34]:
# len(data["results"])

In [35]:
# data["results"][0]

In [36]:
# Data source: https://aqicn.org/api/
# base_url = "https://api.waqi.info/feed/"
# city = "melbourne"
# url = f"{base_url}{city}/?token={api_key}"

In [37]:
# url_2 = "https://api.covid19api.com/country/south-africa/status/confirmed/live"
# covid_data_2 = requests.get(url_2).json()
# pprint(covid_data_2[-1])