<div style="background: #000;
            color: #FFF;
            margin: 0px;
                padding: 10px 0px 20px 0px;
            text-align: center; 
                ">
    <h1>Week 8 - Project</h1>
</div>

#### Write a python script that:  
- scrapes the following webpage:  
https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data    
- parses the html for the table of covid data  
- fills missing values with `None`   
- calculates the death and recovery rate for each  country   
- writes the data in csv format to a file called:`{your-name}-covid-report.csv`

Your csv file should contain the following "header" (and corresponding data):

```csv
country,cases,deaths,recoveries,death_rate,recovery_rate

```

Please ask for clarification if needed.


In [1]:
import csv
import requests
from bs4 import BeautifulSoup as bs

url = 'https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data'

In [2]:
# get soup from url
r = requests.get(url)
soup = bs(r.text)

# get covid table 
table = soup.find('table', id='thetable')

In [3]:
# parse table for body and row elements
tbody = table.find('tbody')
rows = tbody('tr') # list of all table rows
sub_rows = rows[2:-2] # remove table headers and non-data rows

In [4]:
# helper function to return appropriately formatted cell value
def convert_cell(cell):
    try:
        return int(cell.text.strip().replace(',',''))
    except:
        return None

# returns rate rounded to 3 decimals places if non-NA inputs
def calculate_rate(part, whole):
    try:
        return round(part/whole, 3)
    except TypeError:
        return None

In [5]:
all_data = []

for row in sub_rows:
    # get country name
    country = row('th', scope='row')[1].find('a').text
    
    # access data for country from row
    row_data = row('td')[:3]
    cases = convert_cell(row_data[0])
    deaths = convert_cell(row_data[1])
    recoveries = convert_cell(row_data[2])
    
    # calculate death and recovery rate
    death_rate = calculate_rate(deaths,cases)
    recovery_rate = calculate_rate(recoveries,cases)
    
    # store data in global list
    row_data = [country, cases, deaths, recoveries, death_rate, recovery_rate]
    all_data.append(row_data)

In [6]:
for data in all_data:
    print(data)

['United States', 11116640, 249730, 5946496, 0.022, 0.535]
['India', 8814579, 129635, 8205728, 0.015, 0.931]
['Brazil', 5863093, 165811, 5291511, 0.028, 0.903]
['France', 1981827, 44548, 139810, 0.022, 0.071]
['Russia', 1925825, 33186, 1439985, 0.017, 0.748]
['Spain', 1458591, 40769, None, 0.028, None]
['United Kingdom', 1369318, 51934, None, 0.038, None]
['Argentina', 1310478, 35436, 1129088, 0.027, 0.862]
['Colombia', 1198746, 34031, 1104956, 0.028, 0.922]
['Italy', 1178529, 45229, 420810, 0.038, 0.357]
['Mexico', 1006522, 98542, 750190, 0.098, 0.745]
['Peru', 937011, 35231, 863120, 0.038, 0.921]
['Germany', 802944, 12523, 502278, 0.016, 0.626]
['Iran', 762068, 41493, 558818, 0.054, 0.733]
['South Africa', 751024, 20241, 693467, 0.027, 0.923]
['Poland', 712972, 10348, 294783, 0.015, 0.413]
['Ukraine', 535857, 9603, 241444, 0.018, 0.451]
['Belgium', 535939, 14421, None, 0.027, None]
['Chile', 531273, 14819, 506700, 0.028, 0.954]
['Iraq', 519152, 11670, 447039, 0.022, 0.861]
['Indonesi

In [7]:
# create output csv
with open('output.csv', 'w', newline='') as f:
    # create writer object
    writer = csv.writer(f)
    
    # add headers
    header = ['country', 'cases', 'deaths', 'recoveries', 'death_rate', 'recovery_rate']
    writer.writerow(header)
    
    # add table data
    for data in all_data:
        writer.writerow(data)