# Intro

This notebook fetches the raw data from 2 sources and logs them for later use. The 2 sources are:
1. [Our World in Data COVID19 dataset](https://ourworldindata.org/coronavirus-source-data). [Direct link](https://github.com/owid/covid-19-data/raw/master/public/data/owid-covid-data.csv) to the csv. This dataset provides a large amount of data that we need, including but not limited to cases, deaths, ICU beds, vaccinations etc. The codebook explaining the variables is [here](https://github.com/owid/covid-19-data/raw/master/public/data/owid-covid-codebook.csv) and is avaible as a csv.

2. [Oxford COVID-19 Government Response Tracker (OxCGRT)](https://www.bsg.ox.ac.uk/research/research-projects/covid-19-government-response-tracker) [Direct link](https://github.com/OxCGRT/covid-policy-tracker/raw/master/data/OxCGRT_latest.csv) to the csv. The codebook explaining the variables is [here](https://github.com/OxCGRT/covid-policy-tracker/raw/master/documentation/codebook.md), it is a Markdown file which contains a table that explains the variables.

**Both these datasets and associated codebooks were fetched on 16/04/21 at 11:52 PM**

In [1]:
import wandb
import os

In [2]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("WANDB_API_KEY")

In [3]:
# setting the wandb API from Kaggle secrets to avoid manual login
os.environ["WANDB_API_KEY"] = secret_value_0

# uncomment below line for dry run of wandb
#os.environ["WANDB_MODE"] = 'dryrun'

In [4]:
%%capture
!wget -q https://github.com/owid/covid-19-data/raw/master/public/data/owid-covid-data.csv
!wget -q https://github.com/owid/covid-19-data/raw/master/public/data/owid-covid-codebook.csv
!wget -q https://github.com/OxCGRT/covid-policy-tracker/raw/master/data/OxCGRT_latest.csv
!wget -q https://github.com/OxCGRT/covid-policy-tracker/raw/master/documentation/codebook.md

In [5]:
# starting a run on project, entity(team)
run = wandb.init(entity='ml-major-project-g3', project='major-project',
                 job_type="upload-data", save_code=True)
raw_data = wandb.Artifact(
            "covid19-raw-dataset", type="dataset",
            description="Raw COVID-19 data from OWID and Oxford University.")

# adding all files to the artifact folder
raw_data.add_file('owid-covid-data.csv')
raw_data.add_file('owid-covid-codebook.csv')
raw_data.add_file('OxCGRT_latest.csv')
raw_data.add_file('codebook.md')
run.log_artifact(raw_data)
# finish all uploads
wandb.finish()

[34m[1mwandb[0m: Currently logged in as: [33menigma0160[0m (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: wandb version 0.10.26 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


VBox(children=(Label(value=' 48.12MB of 48.12MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.…