# IMPACT OF COVID-19 PANDEMIC ON AIR QUALITY

## Air quality Data Exploration and Cleanup

In [1]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import json
from pprint import pprint

# Import API key
from config import api_key

"https://aqicn.org/data-platform/covid19/
With the COVID-19 spreading out all over the world, the World Air Quality Index project team saw a surge in requests for global data covering the whole world map. As a result, the WAQI project is now providing a new dedicated data-set, updated 3 times a day, and covering about 380 major cities in the world, from January 2020 until now.

The data for each major cities is based on the average (median) of several stations. The data set provides min, max, median and standard deviation for each of the air pollutant species (PM2.5,PM10, Ozone ...) as well as meteorological data (Wind, Temperature, ...). All air pollutant species are converted to the US EPA standard (i.e. no raw concentrations). All dates are UTC based. The count column is the number of samples used for calculating the median and standard deviation."

In [2]:
periods = ["2020", "2019Q1", "2019Q2", "2019Q3", "2019Q4"]

df_list = list()

for period in periods:
    path = f"historical_data/waqi-covid19-airqualitydata-{period}.csv"
    df = pd.read_csv(path)
    df_list.append(df)

In [3]:
airdf_2019_2020 = pd.concat(df_list, ignore_index=True)

In [4]:
airdf_2019_2020.head()

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
0,31/05/2020,IR,Isfahan,temperature,120,17.5,35.0,27.5,331.51
1,13/06/2020,IR,Isfahan,temperature,144,16.0,36.5,27.5,488.74
2,3/07/2020,IR,Isfahan,temperature,67,19.0,33.0,24.0,128.08
3,28/03/2020,IR,Isfahan,temperature,240,3.0,14.0,9.5,136.68
4,23/04/2020,IR,Isfahan,temperature,168,6.0,25.5,16.0,400.79


In [5]:
# Display an overview of the Specie column
airdf_2019_2020["Specie"].unique()

array(['temperature', 'wind-speed', 'wind-gust', 'dew', 'pm25',
       'humidity', 'wind speed', 'pressure', 'wind gust', 'co', 'so2',
       'precipitation', 'no2', 'pm10', 'o3', 'aqi', 'pol', 'uvi', 'wd',
       'neph', 'mepaqi', 'pm1'], dtype=object)

In [6]:
airdf_2019_2020["Specie"].value_counts()

temperature      310236
humidity         310145
pressure         308517
pm25             270688
no2              266793
pm10             264859
wind-speed       263292
o3               250438
so2              226576
dew              226147
co               203776
wind-gust        172805
wind speed        47002
wind gust         29576
precipitation     26825
wd                25720
aqi                8267
uvi                5632
pol                3790
pm1                1380
mepaqi              564
neph                440
Name: Specie, dtype: int64

> We understand that "Air movements influence the fate of air pollutants. So any study of air pollution should include a study of the local weather patterns (meteorology). If the air is calm and pollutants cannot disperse, then the concentration of these pollutants will build up. On the other hand, when strong, turbulent winds blow, pollutants disperse quickly, resulting in lower pollutant concentrations." https://www.qld.gov.au/environment/pollution/monitoring/air/air-monitoring/meteorology-influence/meteorology-factors#:~:text=Meteorological%20factors-,Meteorological%20factors,these%20pollutants%20will%20build%20up.
Hence the Meteorology parameters like temperature, humidity, pressure, wind speed, to name a few, should have some sorts of correlations with the air quality.
(http://www.bom.gov.au/vic/observations/melbourne.shtml)

> However, due to the scope of our project, we'll only focus on air pollutant parameters to assess their changes before COVID-19 and 6 months into the pandemic. We're not trying to explain the causes of air quality change. Hence, we'll remove data related to the following meteorology-related species: **temperature, humidity, pressure, wind-speed, dew, wind-gust, wind speed, wind gust, precipitation, wd (wind direction), uvi**.
https://aqicn.org/publishingdata/

> We'll also remove species with the least number of available data points  including **pol, pm1, mepaqi, neph**.

In [7]:
species_to_remove = ["temperature", "humidity", "pressure", "wind-speed", "dew", "wind-gust",
                     "wind speed", "wind gust", "precipitation", "wd", "uvi", "pol", "pm1", "mepaqi", "neph"]

clean_airdf = airdf_2019_2020[~airdf_2019_2020["Specie"].isin(
    species_to_remove)].reset_index(drop=True).copy()

In [8]:
clean_airdf.head()

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
0,24/02/2020,IR,Isfahan,pm25,129,54.0,194.0,126.0,10921.4
1,7/05/2020,IR,Isfahan,pm25,168,17.0,168.0,91.0,14014.0
2,28/05/2020,IR,Isfahan,pm25,127,17.0,115.0,72.0,3558.56
3,20/02/2020,IR,Isfahan,pm25,113,26.0,181.0,76.0,11209.8
4,23/02/2020,IR,Isfahan,pm25,132,22.0,132.0,76.0,3209.67


In [9]:
clean_airdf["Specie"].value_counts()

pm25    270688
no2     266793
pm10    264859
o3      250438
so2     226576
co      203776
aqi       8267
Name: Specie, dtype: int64

More about AQI:
https://www.airnow.gov/aqi/aqi-basics/
https://www.airnow.gov/sites/default/files/2020-05/aqi-technical-assistance-document-sept2018.pdf
"Five major pollutants:
EPA establishes an AQI for five major air pollutants regulated by the Clean Air Act. Each of these pollutants has a national air quality standard set by EPA to protect public health:

* Ground-level ozone **o3** (ppm - parts per million)
* Particulate Matter - including PM2.5 **pm25** and PM10 **pm10** (μg/m3)
* Carbon Monoxide **co** (ppm)
* Sulfur Dioxide **so2** (ppb - parts per billion)
* Nitrogen Dioxide **no2** (ppb)

https://en.wikipedia.org/wiki/Air_pollution

"https://waqi.info/
The Air Quality Index is based on measurement of particulate matter (PM2.5 and PM10), Ozone (O3), Nitrogen Dioxide (NO2), Sulfur Dioxide (SO2) and Carbon Monoxide (CO) emissions. Most of the stations on the map are monitoring both PM2.5 and PM10 data, but there are few exceptions where only PM10 is available.

All measurements are based on hourly readings: For instance, an AQI reported at 8AM means that the measurement was done from 7AM to 8AM.
More details https://aqicn.org/faq/


https://www.weatherbit.io/api/airquality-history#:~:text=Air%20Quality%20API%20(Historical),an%20air%20quality%20index%20score.

aqi: Air Quality Index [US - EPA standard 0 - +500]
o3: Concentration of surface O3 (µg/m³)
so2: Concentration of surface SO2 (µg/m³)
no2: Concentration of surface NO2 (µg/m³)
co: Concentration of carbon monoxide (µg/m³)
pm25: Concentration of particulate matter < 2.5 microns (µg/m³)
pm10: Concentration of particulate matter < 10 microns (µg/m³)

Some good info on air pollution impacts https://ourworldindata.org/air-pollution
https://www.who.int/health-topics/air-pollution#tab=tab_1
https://www.epa.vic.gov.au/for-community/airwatch
https://www.kaggle.com/frtgnn/clean-air-india-s-air-quality/data

In [10]:
clean_airdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1491397 entries, 0 to 1491396
Data columns (total 9 columns):
 #   Column    Non-Null Count    Dtype  
---  ------    --------------    -----  
 0   Date      1491397 non-null  object 
 1   Country   1491397 non-null  object 
 2   City      1491397 non-null  object 
 3   Specie    1491397 non-null  object 
 4   count     1491397 non-null  int64  
 5   min       1491397 non-null  float64
 6   max       1491397 non-null  float64
 7   median    1491397 non-null  float64
 8   variance  1491397 non-null  float64
dtypes: float64(4), int64(1), object(4)
memory usage: 102.4+ MB


We can see that the Date column is of generic object type. Since we want to perform some time related analysis on this data, we need to convert it to a datetime format. Let’s use to_datetime() function to convert the Date column into a datetime object. 

In [11]:
clean_airdf["Date"] = pd.to_datetime(clean_airdf["Date"], format="%d/%m/%Y")

In [12]:
clean_airdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1491397 entries, 0 to 1491396
Data columns (total 9 columns):
 #   Column    Non-Null Count    Dtype         
---  ------    --------------    -----         
 0   Date      1491397 non-null  datetime64[ns]
 1   Country   1491397 non-null  object        
 2   City      1491397 non-null  object        
 3   Specie    1491397 non-null  object        
 4   count     1491397 non-null  int64         
 5   min       1491397 non-null  float64       
 6   max       1491397 non-null  float64       
 7   median    1491397 non-null  float64       
 8   variance  1491397 non-null  float64       
dtypes: datetime64[ns](1), float64(4), int64(1), object(3)
memory usage: 102.4+ MB


In [13]:
# Find the earliest date the air quality dataset covers:
clean_airdf["Date"].min()

Timestamp('2018-12-31 00:00:00')

In [14]:
# Find the latest date the air quality dataset covers:
clean_airdf["Date"].max()

Timestamp('2020-07-03 00:00:00')

In [15]:
clean_airdf["Country"].unique()

array(['IR', 'TJ', 'BR', 'CN', 'DK', 'ES', 'ML', 'SK', 'XK', 'CL', 'DE',
       'KW', 'MM', 'PH', 'PK', 'PL', 'RU', 'SE', 'SG', 'AE', 'BA', 'CZ',
       'ID', 'IS', 'MO', 'RO', 'AR', 'AU', 'EC', 'GH', 'HK', 'PE', 'UA',
       'EE', 'FR', 'JP', 'MN', 'FI', 'IE', 'IL', 'KZ', 'LA', 'UZ', 'BD',
       'BE', 'GR', 'KR', 'LK', 'MK', 'MX', 'TR', 'AF', 'AT', 'GT', 'BO',
       'CR', 'JO', 'PR', 'SA', 'SV', 'CA', 'IT', 'NO', 'RE', 'TM', 'ZA',
       'BH', 'LT', 'TH', 'BG', 'CH', 'HU', 'MY', 'NL', 'NZ', 'UG', 'VN',
       'ET', 'GE', 'GN', 'IQ', 'RS', 'TW', 'CI', 'CO', 'CY', 'DZ', 'HR',
       'IN', 'KG', 'CW', 'GB', 'NP', 'PT', 'US'], dtype=object)

In [16]:
# Display an overview of the Country column
country_airdata_df = pd.DataFrame(clean_airdf["Country"].unique(), columns=["country_code"])
country_airdata_df

Unnamed: 0,country_code
0,IR
1,TJ
2,BR
3,CN
4,DK
...,...
90,CW
91,GB
92,NP
93,PT


There are 95 countries in the dataframe, including Australia (AU)....

In [17]:
# Display an overview of the City column
clean_airdf["City"].unique()

array(['Isfahan', 'Arāk', 'Karaj', 'Qom', 'Orūmīyeh', 'Yazd', 'Īlām',
       'Kerman', 'Khorramshahr', 'Tabriz', 'Bandar Abbas', 'Sanandaj',
       'Kermanshah', 'Khorramabad', 'Shiraz', 'Zanjān', 'Mashhad',
       'Tehran', 'Dushanbe', 'São José dos Campos', 'Vitória',
       'São Paulo', 'Beijing', 'Jieyang', 'Kunming', 'Hangzhou',
       'Chongqing', 'Qingdao', 'Haikou', 'Ürümqi', 'Qiqihar', 'Guiyang',
       'Shenzhen', 'Yunfu', 'Xuchang', 'Yinchuan', 'Shenyang', 'Lhasa',
       'Shanghai', 'Changchun', 'Foshan', 'Nanning', 'Fushun', 'Hefei',
       'Chengdu', 'Hohhot', 'Qinhuangdao', 'Shijiazhuang', 'Shantou',
       'Zhengzhou', 'Nanjing', 'Xining', 'Xi’an', 'Zhuzhou', 'Wuhan',
       'Tianjin', 'Changzhou', 'Nanchang', 'Shiyan', 'Xinxiang', 'Suzhou',
       'Harbin', 'Lanzhou', 'Jinan', 'Changsha', 'Hegang', 'Anyang',
       'Wuxi', 'Taiyuan', 'Guangzhou', 'Fuzhou', 'Ningbo', 'Xiamen',
       'Dongguan', 'Copenhagen', 'Las Palmas de Gran Canaria',
       'Salamanca', 'Barcelona'

In [18]:
clean_airdf["City"].value_counts()

London          5075
Shijiazhuang    3612
Qinhuangdao     3502
Anyang          3482
Beijing         3443
                ... 
Almaty           132
Abidjan          115
Conakry          115
Accra             97
Zamboanga         60
Name: City, Length: 615, dtype: int64

There are 615 cities in our dataframe. Let's see what cities in Australia covered in the dataset.

In [19]:
clean_airdf.loc[clean_airdf["Country"]=="AU", "City"].value_counts()

Sydney        3364
Brisbane      3348
Melbourne     3284
Wollongong    3253
Darwin        3208
Adelaide      3177
Perth         3082
Newcastle     2871
Launceston    1126
Hobart        1126
Canberra      1093
Name: City, dtype: int64

## COVID-19 DATA EXPLORATION AND CLEAN UP

Covid-19 is sourced from here https://covid19api.com/

In [20]:
country_url = "https://api.covid19api.com/countries"
country_covid_data = requests.get(country_url).json()
pprint(country_covid_data)

[{'Country': 'Nigeria', 'ISO2': 'NG', 'Slug': 'nigeria'},
 {'Country': 'Samoa', 'ISO2': 'WS', 'Slug': 'samoa'},
 {'Country': 'South Georgia and the South Sandwich Islands',
  'ISO2': 'GS',
  'Slug': 'south-georgia-and-the-south-sandwich-islands'},
 {'Country': 'British Indian Ocean Territory',
  'ISO2': 'IO',
  'Slug': 'british-indian-ocean-territory'},
 {'Country': 'Sao Tome and Principe',
  'ISO2': 'ST',
  'Slug': 'sao-tome-and-principe'},
 {'Country': 'Indonesia', 'ISO2': 'ID', 'Slug': 'indonesia'},
 {'Country': 'Benin', 'ISO2': 'BJ', 'Slug': 'benin'},
 {'Country': 'ALA Aland Islands', 'ISO2': 'AX', 'Slug': 'ala-aland-islands'},
 {'Country': 'Armenia', 'ISO2': 'AM', 'Slug': 'armenia'},
 {'Country': 'Nicaragua', 'ISO2': 'NI', 'Slug': 'nicaragua'},
 {'Country': 'Northern Mariana Islands',
  'ISO2': 'MP',
  'Slug': 'northern-mariana-islands'},
 {'Country': 'Andorra', 'ISO2': 'AD', 'Slug': 'andorra'},
 {'Country': 'Macedonia, Republic of', 'ISO2': 'MK', 'Slug': 'macedonia'},
 {'Country'

In [21]:
country_covid_df = pd.DataFrame(country_covid_data)
country_covid_df

Unnamed: 0,Country,Slug,ISO2
0,Nigeria,nigeria,NG
1,Samoa,samoa,WS
2,South Georgia and the South Sandwich Islands,south-georgia-and-the-south-sandwich-islands,GS
3,British Indian Ocean Territory,british-indian-ocean-territory,IO
4,Sao Tome and Principe,sao-tome-and-principe,ST
...,...,...,...
243,"Micronesia, Federated States of",micronesia,FM
244,Malawi,malawi,MW
245,Marshall Islands,marshall-islands,MH
246,"Tanzania, United Republic of",tanzania,TZ


In [22]:
# Merge countries available on the air quality data and the covid data
country_covid_air_df = pd.merge(
    country_airdata_df, country_covid_df, how="left", left_on="country_code", right_on="ISO2")
country_covid_air_df

Unnamed: 0,country_code,Country,Slug,ISO2
0,IR,"Iran, Islamic Republic of",iran,IR
1,TJ,Tajikistan,tajikistan,TJ
2,BR,Brazil,brazil,BR
3,CN,China,china,CN
4,DK,Denmark,denmark,DK
...,...,...,...,...
90,CW,,,
91,GB,United Kingdom,united-kingdom,GB
92,NP,Nepal,nepal,NP
93,PT,Portugal,portugal,PT


In [23]:
# Find the country in the country_airdata_df but not country_covid_df
country_to_remove = country_covid_air_df[country_covid_air_df["ISO2"].isna()]["country_code"].tolist()
country_to_remove

['CW']

In [24]:
final_airdf = clean_airdf[~clean_airdf["Country"].isin(country_to_remove)].copy()

In [25]:
final_airdf["Country"].nunique()

94

In [26]:
final_airdf.to_csv("air_data.csv", index=False)

In [27]:
country_covid_air_df

Unnamed: 0,country_code,Country,Slug,ISO2
0,IR,"Iran, Islamic Republic of",iran,IR
1,TJ,Tajikistan,tajikistan,TJ
2,BR,Brazil,brazil,BR
3,CN,China,china,CN
4,DK,Denmark,denmark,DK
...,...,...,...,...
90,CW,,,
91,GB,United Kingdom,united-kingdom,GB
92,NP,Nepal,nepal,NP
93,PT,Portugal,portugal,PT


In [28]:
del country_covid_air_df["ISO2"]

In [29]:
# Return a dataframe covering all countries in both the air quality data and covid-19 data.
country_covid_air_df = country_covid_air_df[~country_covid_air_df["country_code"].isin(
    country_to_remove)].reset_index()
country_covid_air_df

Unnamed: 0,index,country_code,Country,Slug
0,0,IR,"Iran, Islamic Republic of",iran
1,1,TJ,Tajikistan,tajikistan
2,2,BR,Brazil,brazil
3,3,CN,China,china
4,4,DK,Denmark,denmark
...,...,...,...,...
89,89,KG,Kyrgyzstan,kyrgyzstan
90,91,GB,United Kingdom,united-kingdom
91,92,NP,Nepal,nepal
92,93,PT,Portugal,portugal


In [30]:
# Explore one covid API - By Country Total All Status
covid_url_example = "https://api.covid19api.com/total/country/australia"
covid_data_example = requests.get(covid_url_example).json()
pprint(covid_data_example)

[{'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-01-22T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-01-23T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-01-24T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 0,
  'City': '',
  'CityCode': '',
  'Confirmed': 0,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-01-25T00:00:00Z',
  'Deaths': 0,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 0},
 {'Active': 4,
  'City': '',
  'CityCode': '',
  'Confirmed': 4,
  'Country': 'Australia',
  'Co

  'Date': '2020-04-01T00:00:00Z',
  'Deaths': 20,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 422},
 {'Active': 4572,
  'City': '',
  'CityCode': '',
  'Confirmed': 5116,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-04-02T00:00:00Z',
  'Deaths': 24,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 520},
 {'Active': 4653,
  'City': '',
  'CityCode': '',
  'Confirmed': 5330,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-04-03T00:00:00Z',
  'Deaths': 28,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 649},
 {'Active': 4819,
  'City': '',
  'CityCode': '',
  'Confirmed': 5550,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-04-04T00:00:00Z',
  'Deaths': 30,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 701},
 {'Active': 4895,
  'City': '',
  'CityCode': '',
  'Confirmed': 5687,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-04-05T00:00:00Z',
  'Deaths': 35,
  'Lat': '

  'Deaths': 102,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 6851},
 {'Active': 389,
  'City': '',
  'CityCode': '',
  'Confirmed': 7347,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-06-15T00:00:00Z',
  'Deaths': 102,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 6856},
 {'Active': 407,
  'City': '',
  'CityCode': '',
  'Confirmed': 7370,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-06-16T00:00:00Z',
  'Deaths': 102,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 6861},
 {'Active': 412,
  'City': '',
  'CityCode': '',
  'Confirmed': 7391,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-06-17T00:00:00Z',
  'Deaths': 102,
  'Lat': '0',
  'Lon': '0',
  'Province': '',
  'Recovered': 6877},
 {'Active': 429,
  'City': '',
  'CityCode': '',
  'Confirmed': 7409,
  'Country': 'Australia',
  'CountryCode': '',
  'Date': '2020-06-18T00:00:00Z',
  'Deaths': 102,
  'Lat': '0',
  'Lon': '0',
  'Province

The above api example covers covid-19 data until 4th July 2020. It also shows the number of confirmed, active, recovered, and death cases for each chosen country over the course of the current pandemic. Hence, we'll use this api to loop through our country_covid_air_df as above.

In [31]:
slug_list = country_covid_air_df["Slug"].tolist()
len(slug_list)

94

In [32]:
base_covid_url = "https://api.covid19api.com/total/country/"
    
country_list = list()
date_list = list()
active_list = list()
confirmed_list = list()
recovered_list = list()
deaths_list = list()

print("Beginning Data Retrieval")
print("-----------------------------------")

counter = 0
set_counter = 1

for slug in slug_list:
    
    try:
        response = requests.get(base_covid_url + slug).json()
    
        for element in response:
            country_list.append(element['Country'])
            date_list.append(element['Date'])
            active_list.append(element['Active'])
            confirmed_list.append(element['Confirmed'])
            recovered_list.append(element['Recovered'])
            deaths_list.append(element['Deaths'])

        counter += 1
        print(f"Processing Record {counter} of Set {set_counter} | {slug}")

        if counter == 50:
            set_counter += 1
            counter = 0

    except KeyError:
        print("Country not found. Skipping...")

print("-----------------------------------")
print("Data Retrieval Complete")
print("-----------------------------------")

Beginning Data Retrieval
-----------------------------------
Processing Record 1 of Set 1 | iran
Processing Record 2 of Set 1 | tajikistan
Processing Record 3 of Set 1 | brazil
Processing Record 4 of Set 1 | china
Processing Record 5 of Set 1 | denmark
Processing Record 6 of Set 1 | spain
Processing Record 7 of Set 1 | mali
Processing Record 8 of Set 1 | slovakia
Processing Record 9 of Set 1 | kosovo
Processing Record 10 of Set 1 | chile
Processing Record 11 of Set 1 | germany
Processing Record 12 of Set 1 | kuwait
Processing Record 13 of Set 1 | myanmar
Processing Record 14 of Set 1 | philippines
Processing Record 15 of Set 1 | pakistan
Processing Record 16 of Set 1 | poland
Processing Record 17 of Set 1 | russia
Processing Record 18 of Set 1 | sweden
Processing Record 19 of Set 1 | singapore
Processing Record 20 of Set 1 | united-arab-emirates
Processing Record 21 of Set 1 | bosnia-and-herzegovina
Processing Record 22 of Set 1 | czech-republic
Processing Record 23 of Set 1 | indonesi

In [33]:
covid_df = pd.DataFrame({
    "Country": country_list,
    "Date": date_list,
    "Active cases": active_list,
    "Confirmed cases": confirmed_list,
    "Recovered cases": recovered_list,
    "Deaths": deaths_list
})
covid_df.head()

Unnamed: 0,Country,Date,Active cases,Confirmed cases,Recovered cases,Deaths
0,"Iran, Islamic Republic of",2020-01-22T00:00:00Z,0,0,0,0
1,"Iran, Islamic Republic of",2020-01-23T00:00:00Z,0,0,0,0
2,"Iran, Islamic Republic of",2020-01-24T00:00:00Z,0,0,0,0
3,"Iran, Islamic Republic of",2020-01-25T00:00:00Z,0,0,0,0
4,"Iran, Islamic Republic of",2020-01-26T00:00:00Z,0,0,0,0


In [34]:
covid_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14685 entries, 0 to 14684
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Country          14685 non-null  object
 1   Date             14685 non-null  object
 2   Active cases     14685 non-null  int64 
 3   Confirmed cases  14685 non-null  int64 
 4   Recovered cases  14685 non-null  int64 
 5   Deaths           14685 non-null  int64 
dtypes: int64(4), object(2)
memory usage: 688.5+ KB


In [35]:
# Convert the Date column to datetime format
covid_df['Date'] = covid_df['Date'].astype('datetime64[ns]')

In [36]:
# Find the earliest date the covid dataset covers:
covid_df["Date"].min()

Timestamp('2020-01-22 00:00:00')

In [37]:
# Find the latrest date the covid dataset covers:
covid_df["Date"].max()

Timestamp('2020-07-04 00:00:00')

In [38]:
covid_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14685 entries, 0 to 14684
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Country          14685 non-null  object        
 1   Date             14685 non-null  datetime64[ns]
 2   Active cases     14685 non-null  int64         
 3   Confirmed cases  14685 non-null  int64         
 4   Recovered cases  14685 non-null  int64         
 5   Deaths           14685 non-null  int64         
dtypes: datetime64[ns](1), int64(4), object(1)
memory usage: 688.5+ KB


In [39]:
covid_df.to_csv("covid_data.csv", index=False)

## COVID-19 in AUSTRALIA DATA EXPLORATION AND CLEAN UP

In [40]:
au_covid_url = "https://interactive.guim.co.uk/docsdata/1q5gdePANXci8enuiS4oHUJxcxC13d6bjMRSicakychE.json"
au_covid_data = requests.get(au_covid_url).json()
pprint(au_covid_data)

{'sheets': {'about': [{'about': 'This data has been compiled by Guardian '
                                'Australia from official state and territory '
                                'media releases and websites. Some death dates '
                                'and figures are from media reports. We assign '
                                'cases to the date on which they were reported '
                                'by the health department, and deaths are '
                                'assigned to the date they occured. Extended '
                                'data on testing and demographics varies '
                                'between each state and territory so may not '
                                'always be available. Please contact '
                                'australia.coronatracking@theguardian.com if '
                                'you spot an error in the data or to make a '
                                'suggestion. This data is released

                       {'Date of death': '28/04/2020',
                        'Date reported': '28/04/2020',
                        'Death No (in state)': '40',
                        'Details': '90 year old female Anglicare Sydney '
                                   'resident, Newmarch house (10th death)',
                        'Name (if known)': '',
                        'Source': 'media release (Anglicare)',
                        'State': 'NSW'},
                       {'Date of death': '28/04/2020',
                        'Date reported': '28/04/2020',
                        'Death No (in state)': '41',
                        'Details': '89 year old female Anglicare Sydney '
                                   'resident, Newmarch House (11th death)',
                        'Name (if known)': '',
                        'Source': 'media release (Anglicare)',
                        'State': 'NSW'},
                       {'Date of death': '28/04/2020',
                 

                        {'Cumulative case count': '3',
                         'Cumulative deaths': '',
                         'Date': '04/02/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'QLD',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '',
                         'Update Source': 'Queensland Health',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '4',
                         'Cumulative deaths': '',
                         'Date': '04/02/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'R

                        {'Cumulative case count': '4',
                         'Cumulative deaths': '',
                         'Date': '08/03/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'WA',
                         'Tests conducted (negative)': '1665',
                         'Tests conducted (total)': '1669',
                         'Time': '',
                         'Update Source': '',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '12',
                         'Cumulative deaths': '',
                         'Date': '08/03/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered 

                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '3',
                         'Cumulative deaths': '',
                         'Date': '18/03/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'ACT',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '08:03',
                         'Update Source': 'ACT health email',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '3',
                         'Cumulative deaths': '',
                         'Date': '18/03/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
   

                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '493',
                         'Cumulative deaths': '',
                         'Date': '26/03/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'QLD',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '38860',
                         'Time': '',
                         'Update Source': 'media release',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '520',
                         'Cumulative deaths': '2',
                         'Date': '26/03/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
 

                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'SA',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '18:45',
                         'Update Source': 'national summary',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '400',
                         'Cumulative deaths': '',
                         'Date': '02/04/2020',
                         'Hospitalisations (count)': '2',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '92',
                         'State': 'WA',
                         'Tests conducted (negative)': '15790',
                         'Tests conducted (total)': '16190',
                         'Time': '18:45',
                     

                         'Recovered (cumulative)': '368',
                         'State': 'QLD',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '15:30',
                         'Update Source': 'press release/PM&C',
                         'Ventilator usage (count)': '12'},
                        {'Cumulative case count': '1265',
                         'Cumulative deaths': '14',
                         'Date': '11/04/2020',
                         'Hospitalisations (count)': '44',
                         'Intensive care (count)': '15',
                         'Notes': '',
                         'Recovered (cumulative)': '986',
                         'State': 'VIC',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '10:00',
                         'Update Source': 'press rele

                         'Recovered (cumulative)': '426',
                         'State': 'WA',
                         'Tests conducted (negative)': '28343',
                         'Tests conducted (total)': '',
                         'Time': '',
                         'Update Source': 'daily snapshot/media release',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '2963',
                         'Cumulative deaths': '30',
                         'Date': '19/04/2020',
                         'Hospitalisations (count)': '249',
                         'Intensive care (count)': '22',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'NSW',
                         'Tests conducted (negative)': '165663',
                         'Tests conducted (total)': '168626',
                         'Time': '20:00',
                         'Update Sou

                        {'Cumulative case count': '214',
                         'Cumulative deaths': '',
                         'Date': '28/04/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'TAS',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '',
                         'Time': '08:30',
                         'Update Source': 'national summary/press release '
                                          '(Anglicare)',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '1351',
                         'Cumulative deaths': '17',
                         'Date': '28/04/2020',
                         'Hospitalisations (count)': '23',
                         'Intensive 

                         'State': 'NT',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '5003',
                         'Time': '09:30',
                         'Update Source': 'https://coronavirus.nt.gov.au/current-status',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '1454',
                         'Cumulative deaths': '18',
                         'Date': '07/05/2020',
                         'Hospitalisations (count)': '8',
                         'Intensive care (count)': '6',
                         'Notes': '',
                         'Recovered (cumulative)': '',
                         'State': 'VIC',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '176500',
                         'Time': '10:00',
                         'Update Source': 'press conference',
                     

                        {'Cumulative case count': '1567',
                         'Cumulative deaths': '18',
                         'Date': '18/05/2020',
                         'Hospitalisations (count)': '9',
                         'Intensive care (count)': '5',
                         'Notes': '',
                         'Recovered (cumulative)': '1439',
                         'State': 'VIC',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '345000',
                         'Time': '15:37',
                         'Update Source': 'media release',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '30',
                         'Cumulative deaths': '',
                         'Date': '18/05/2020',
                         'Hospitalisations (count)': '',
                         'Intensive care (count)': '',
                         'Notes': '',
        

                         'Notes': '',
                         'Recovered (cumulative)': '1044',
                         'State': 'QLD',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '180371',
                         'Time': '16:20',
                         'Update Source': 'media release',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '228',
                         'Cumulative deaths': '',
                         'Date': '27/05/2020',
                         'Hospitalisations (count)': '2',
                         'Intensive care (count)': '0',
                         'Notes': '',
                         'Recovered (cumulative)': '205',
                         'State': 'TAS',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '26447',
                         'Time': '18:00',
               

                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '1061',
                         'Cumulative deaths': '',
                         'Date': '07/06/2020',
                         'Hospitalisations (count)': '2',
                         'Intensive care (count)': '1',
                         'Notes': '',
                         'Recovered (cumulative)': '1050',
                         'State': 'QLD',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '219422',
                         'Time': '12:00',
                         'Update Source': 'media release',
                         'Ventilator usage (count)': ''},
                        {'Cumulative case count': '108',
                         'Cumulative deaths': '',
                         'Date': '07/06/2020',
                         'Hospitalisations (count)': '0',
                         'Intensive care (c

                         'Cumulative deaths': '',
                         'Date': '21/06/2020',
                         'Hospitalisations (count)': '0',
                         'Intensive care (count)': '0',
                         'Notes': '',
                         'Recovered (cumulative)': '213',
                         'State': 'TAS',
                         'Tests conducted (negative)': '',
                         'Tests conducted (total)': '44217',
                         'Time': '18:00',
                         'Update Source': 'National/ '
                                          'https://coronavirus.tas.gov.au/facts/cases-and-testing-updates',
                         'Ventilator usage (count)': '0'},
                        {'Cumulative case count': '3151',
                         'Cumulative deaths': '',
                         'Date': '21/06/2020',
                         'Hospitalisations (count)': '46',
                         'Intensive care (count)': '0'

In [41]:
au_covid_data.keys()

dict_keys(['sheets'])

In [42]:
au_covid_data['sheets'].keys()

dict_keys(['updates', 'deaths', 'latest totals', 'locations', 'sources', 'about', 'data validation'])

In [43]:
covid_by_state = au_covid_data['sheets']['updates']
covid_by_state[-1]

{'State': 'VIC',
 'Date': '06/07/2020',
 'Time': '11:00',
 'Cumulative case count': '',
 'Cumulative deaths': '21',
 'Tests conducted (negative)': '',
 'Tests conducted (total)': '',
 'Hospitalisations (count)': '',
 'Intensive care (count)': '',
 'Ventilator usage (count)': '',
 'Recovered (cumulative)': '',
 'Update Source': '',
 'Notes': ''}

In [44]:
state_list = list()
date_list = list()
cumulative_case_count = list()
cumulative_recovered_count = list()

for element in covid_by_state:
    state_list.append(element["State"])
    date_list.append(element["Date"])
    cumulative_case_count.append(element["Cumulative case count"])
    cumulative_recovered_count.append(element["Recovered (cumulative)"])

In [45]:
au_covid_data = pd.DataFrame({
    "State": state_list,
    "Date": date_list,
    "Cumulative case count": cumulative_case_count,
    "Cumulative recovered count": cumulative_recovered_count
})
au_covid_data.head()

Unnamed: 0,State,Date,Cumulative case count,Cumulative recovered count
0,SA,23/01/2020,0,
1,VIC,25/01/2020,1,
2,NSW,25/01/2020,3,
3,NSW,27/01/2020,4,
4,QLD,28/01/2020,0,


In [46]:
au_covid_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 865 entries, 0 to 864
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   State                       865 non-null    object
 1   Date                        865 non-null    object
 2   Cumulative case count       865 non-null    object
 3   Cumulative recovered count  865 non-null    object
dtypes: object(4)
memory usage: 27.2+ KB


In [47]:
au_covid_data["Cumulative case count"] = pd.to_numeric(
    au_covid_data["Cumulative case count"])

In [48]:
au_covid_data["Cumulative recovered count"] = pd.to_numeric(
    au_covid_data["Cumulative recovered count"].str.replace(",", ""))

In [49]:
# Convert the Date column to datetime format
au_covid_data['Date'] = pd.to_datetime(
    au_covid_data["Date"], format="%d/%m/%Y")

In [50]:
au_covid_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 865 entries, 0 to 864
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   State                       865 non-null    object        
 1   Date                        865 non-null    datetime64[ns]
 2   Cumulative case count       835 non-null    float64       
 3   Cumulative recovered count  427 non-null    float64       
dtypes: datetime64[ns](1), float64(2), object(1)
memory usage: 27.2+ KB


In [51]:
clean_au_covid = au_covid_data.fillna(0).copy()

In [52]:
clean_au_covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 865 entries, 0 to 864
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   State                       865 non-null    object        
 1   Date                        865 non-null    datetime64[ns]
 2   Cumulative case count       865 non-null    float64       
 3   Cumulative recovered count  865 non-null    float64       
dtypes: datetime64[ns](1), float64(2), object(1)
memory usage: 27.2+ KB


In [53]:
clean_au_covid[["Cumulative case count", "Cumulative recovered count"]] = clean_au_covid[[
    "Cumulative case count", "Cumulative recovered count"]].astype(int)

In [54]:
clean_au_covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 865 entries, 0 to 864
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   State                       865 non-null    object        
 1   Date                        865 non-null    datetime64[ns]
 2   Cumulative case count       865 non-null    int32         
 3   Cumulative recovered count  865 non-null    int32         
dtypes: datetime64[ns](1), int32(2), object(1)
memory usage: 20.4+ KB


In [55]:
# Find the earliest date the covid dataset covers:
clean_au_covid["Date"].min()

Timestamp('2020-01-23 00:00:00')

In [56]:
# Find the latrest date the covid dataset covers:
clean_au_covid["Date"].max()

Timestamp('2020-07-06 00:00:00')

In [57]:
clean_au_covid.to_csv("au_covid_data.csv", index=False)

In [None]:
# import time
# from scipy.stats import linregress

# Incorporated citipy to determine city based on latitude and longitude
# from citipy import citipy

In [None]:
# url = "https://api.openaq.org/v1/measurements"

# data = requests.get(url).json()
# pprint(data)

In [None]:
# len(data["results"])

In [None]:
# data["results"][0]

In [None]:
# Data source: https://aqicn.org/api/
# base_url = "https://api.waqi.info/feed/"
# city = "melbourne"
# url = f"{base_url}{city}/?token={api_key}"

In [None]:
# url_2 = "https://api.covid19api.com/country/south-africa/status/confirmed/live"
# covid_data_2 = requests.get(url_2).json()
# pprint(covid_data_2[-1])