# Fetching WHO's situation reports on COVID-19 as DataFrames

## Get the data

In [1]:
from who_covid_scraper import WHOCovidScraper
scraper = WHOCovidScraper('https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports')
file_save_location = './data'

## Visualise the scraped data

In [2]:
scraper.df

Unnamed: 0,Report_ID,Date,Link
0,71,2020-03-31,https://www.who.int/docs/default-source/corona...
1,70,2020-03-30,https://www.who.int/docs/default-source/corona...
2,69,2020-03-29,https://www.who.int/docs/default-source/corona...
3,69,2020-03-29,https://www.who.int/docs/default-source/corona...
4,68,2020-03-28,https://www.who.int/docs/default-source/corona...
...,...,...,...
70,5,2020-01-25,https://www.who.int/docs/default-source/corona...
71,4,2020-01-24,https://www.who.int/docs/default-source/corona...
72,3,2020-01-23,https://www.who.int/docs/default-source/corona...
73,2,2020-01-22,https://www.who.int/docs/default-source/corona...


## Download for a given date

In [3]:
download = scraper.download_for_date(datearg='21st of March', folder=file_save_location)

report for date 2020/03/21 downloaded at ./data/20200321-sitrep-61-covid-19.pdf


## Send to Parsr for extraction

In [4]:
job = scraper.send_document_to_parsr(download['file'])

> Polling server for the job 1875d3da7ea94671791b97d975dc1f...
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Job done!


In [5]:
job

{'file': './data/20200321-sitrep-61-covid-19.pdf',
 'config': 'defaultConfig.json',
 'status_code': 202,
 'server_response': '1875d3da7ea94671791b97d975dc1f'}

## Fetch all tabular data and assemble the data together

In [6]:
data = scraper.assemble_data(job['server_response'])
data

Unnamed: 0,Reporting Country/ Territory/Area†,Total confirmed ‡ cases,Total confirmed new cases,Total deaths,Total new deaths,Transmission classification§,Days since last reported case
0,Western Pacific Region,,,,,,
1,China,81416,116,3261,8,Local transmission,0
2,Republic of Korea,8799,147,102,8,Local transmission,0
3,Malaysia,1030,130,3,1,Local transmission,0
4,Japan,996,46,35,2,Local transmission,0
...,...,...,...,...,...,...,...
192,Mayotte,4,0,0,0,Imported cases only,1
193,Subtotal for all regions,265361,32000,11176,1343,,
194,International conveyance,712,0,7,0,Local transmission,5
195,(Diamond Princess),,,,,,


## Download all the reports

In [3]:
downloaded_files = scraper.download_everything(folder=file_save_location)

report ./data/20200331-sitrep-71-covid-19.pdf already exists. didn't re-download
report ./data/20200330-sitrep-70-covid-19.pdf already exists. didn't re-download
report ./data/20200329-sitrep-69-covid-19.pdf already exists. didn't re-download
report ./data/20200328-sitrep-68-covid-19.pdf already exists. didn't re-download
report ./data/20200327-sitrep-67-covid-19.pdf already exists. didn't re-download
report ./data/20200326-sitrep-66-covid-19.pdf already exists. didn't re-download
report ./data/20200325-sitrep-65-covid-19.pdf already exists. didn't re-download
report ./data/20200324-sitrep-64-covid-19.pdf already exists. didn't re-download
report ./data/20200323-sitrep-63-covid-19.pdf already exists. didn't re-download
report ./data/20200322-sitrep-62-covid-19.pdf already exists. didn't re-download
report ./data/20200321-sitrep-61-covid-19.pdf already exists. didn't re-download
report ./data/20200320-sitrep-60-covid-19.pdf already exists. didn't re-download
report ./data/20200319-sitre

## Extract and save all DataFrames as CSVs

In [None]:
from pathlib import Path
for file in downloaded_files:
    job = scraper.send_document_to_parsr(file)
    data = scraper.assemble_data(job['server_response'])
    file_csv = Path(Path(file).parent.as_posix() + '/' + Path(file).stem + '.csv').as_posix()
    data.to_csv(file_csv, sep='\t', encoding='utf-8', index=False)

> Polling server for the job 70d4120f6400138bd70bc955cab831...
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
>> Progress percentage: 0
