### Exploring data from datahub.io

Acquiring data from [here](https://github.com/datasets/covid-19) and maybe soon be able to automate the integration of that data with wikidata.

Just some things to think about (jvfe):
- How to properly reference the data? Choose [datahub.io](https://datahub.io/core/covid-19) as the reference?
    - They aggregate it from various sources
    
    
- I've acquired the country outbreak items via the following query and modified it slightly to better merge the items.
```
SELECT ?item ?itemLabel ?countryid ?countryidLabel
WHERE 
{
  ?item p:P31 ?statement. 
      ?statement ps:P31 wd:Q3241045. 
      ?statement pq:P642 wd:Q84263196.
      ?statement pq:P3005 ?countryid.
      ?countryid wdt:P31 wd:Q6256.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```

In [1]:
import pandas as pd

In [8]:
from datetime import date,timedelta
yesterday = date.today() - timedelta(days=1)
today = date.today()

In [2]:
countries = pd.read_csv("https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv")
wdt_items = pd.read_csv("../data/country_outbreaks.csv")

In [3]:
full = pd.merge(countries, wdt_items, on="Country")
full

Unnamed: 0,Date,Country,Confirmed,Recovered,Deaths,item,itemLabel,countryid
0,2020-01-22,Afghanistan,0,0,0,Q87768605,2020 coronavirus pandemic in Afghanistan,Q889
1,2020-01-23,Afghanistan,0,0,0,Q87768605,2020 coronavirus pandemic in Afghanistan,Q889
2,2020-01-24,Afghanistan,0,0,0,Q87768605,2020 coronavirus pandemic in Afghanistan,Q889
3,2020-01-25,Afghanistan,0,0,0,Q87768605,2020 coronavirus pandemic in Afghanistan,Q889
4,2020-01-26,Afghanistan,0,0,0,Q87768605,2020 coronavirus pandemic in Afghanistan,Q889
...,...,...,...,...,...,...,...,...
11695,2020-04-16,Zimbabwe,23,1,3,Q88164033,2020 coronavirus pandemic in Zimbabwe,Q954
11696,2020-04-17,Zimbabwe,24,2,3,Q88164033,2020 coronavirus pandemic in Zimbabwe,Q954
11697,2020-04-18,Zimbabwe,25,2,3,Q88164033,2020 coronavirus pandemic in Zimbabwe,Q954
11698,2020-04-19,Zimbabwe,25,2,3,Q88164033,2020 coronavirus pandemic in Zimbabwe,Q954


In [6]:
#Most recent data seems to be from the day before, at least at the time I'm looking at (8p.m. Brazil)
yesterday_table = yesterday.strftime("%Y-%m-%d")
recent = full.query("Date == @yesterday")
recent

Unnamed: 0,Date,Country,Confirmed,Recovered,Deaths,item,itemLabel,countryid
89,2020-04-20,Afghanistan,1026,135,36,Q87768605,2020 coronavirus pandemic in Afghanistan,Q889
179,2020-04-20,Algeria,2718,1099,384,Q87202921,2020 coronavirus pandemic in Algeria,Q262
269,2020-04-20,Angola,24,6,2,Q88082534,2020 coronavirus pandemic in Angola,Q916
359,2020-04-20,Antigua and Barbuda,23,3,3,Q87708331,2020 coronavirus pandemic in Antigua and Barbuda,Q781
449,2020-04-20,Argentina,2941,737,136,Q87235137,2020 coronavirus pandemic in Argentina,Q414
...,...,...,...,...,...,...,...,...
11339,2020-04-20,Venezuela,256,117,9,Q87652010,2020 coronavirus pandemic in Venezuela,Q717
11429,2020-04-20,Vietnam,268,214,0,Q83873057,2020 coronavirus pandemic in Vietnam,Q881
11519,2020-04-20,Yemen,1,0,0,Q89695985,2020 coronavirus pandemic in Yemen,Q805
11609,2020-04-20,Zambia,65,35,3,Q87976629,2020 coronavirus pandemic in Zambia,Q953


In [None]:
yesterday_wdt = yesterday.strftime("+%Y-%m-%dT00:00:00Z/11")
today_wdt = today.strftime("+%Y-%m-%dT00:00:00Z/11")

for index, row in recent.iterrows():
    print(
          row['item'] + "|P1603|" + str(int(row['Confirmed'])) + "|P585|" + yesterday_wdt + "|S854|" + '"' + 
                "https://github.com/datasets/covid-19" + '"' +
                "|S813|" + today_wdt + "\n" +
          row['item'] + "|P1120|" + str(int(row['Deaths'])) + "|P585|" + yesterday_wdt + "|S854|" + '"' + 
                "https://github.com/datasets/covid-19" + '"' +
                "|S813|" + today_wdt +
          row['item'] + "|P8010|" + str(int(row['Recovered'])) + "|P585|" + yesterday_wdt + "|S854|" + '"' + 
                "https://github.com/datasets/covid-19" + '"' +
                "|S813|" + today_wdt
            )