# World Wide Vaccination Data from Our World in Data

The following script takes data from the repository of the `Data on COVID-19 (coronavirus) by Our World in Data` operated Our World in Data.  


> Hasell, J., Mathieu, E., Beltekian, D. _et al_. A cross-country database of COVID-19 testing. _Sci Data 7_, 345 (2020). https://doi.org/10.1038/s41597-020-00688-8

The data produced by third parties and made available by Our World in Data is subject to the license terms from the original third-party authors. OWD and Starschema will always indicate the original source of the data in our database, and you should always check the license of any such third-party data before use.



In [None]:
import pandas as pd
import csv
import datetime
import pycountry

In [None]:
# papermill parameters
output_folder = "../output/"

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv")

In [None]:
# drop States/Provinces
df.dropna(subset=['iso_code'], inplace=True)
# drop world wide aggregate
df = df[~df['iso_code'].isin(['OWID_NIR','OWID_ENG','OWID_WLS','OWID_SCT','OWID_WRL','OWID_KOS'])]
df.loc[df.iso_code == 'OWID_CYN','iso_code'] = 'CYP'

Download and join location data with the main dataset

In [None]:
location = pd.read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/locations.csv")
location.dropna(subset=['iso_code'], inplace=True)


In [None]:
df = df.merge(location, on='iso_code', copy=False, validate='many_to_one',suffixes=['','_l'] )


Convert 3 character ISO country codes to two character ones 

In [None]:
df["iso_code"].unique()

In [None]:
country_3c_codes = df['iso_code'].unique()

replace_dict = map(lambda x: [x, pycountry.countries.get(alpha_3=x).alpha_2], country_3c_codes) 


df["iso_code"].replace(dict(replace_dict), inplace=True)

In [None]:
country_3c_codes

In [None]:
df["Last_Update_Date"] = datetime.datetime.utcnow()

## Set Last_Reported_Date_Flag

In [None]:
df['Last_Reported_Flag'] = df["date"].max() == df["date"]

## Output

Finally, we store the output in the `output` folder as `OWID_VACCINATIONS.csv` as an unindexed CSV file.

In [None]:
df.rename(columns={'location':'Country_Region','iso_code': 'ISO3166_1' }, inplace=True)

In [None]:
df.to_csv(output_folder + "OWID_VACCINATIONS.csv", index=False, quoting=csv.QUOTE_NONNUMERIC,
    quotechar='"', escapechar='\\', doublequote=False, columns=['date',
    'Country_Region','ISO3166_1','total_vaccinations','people_vaccinated','people_fully_vaccinated'
    ,'daily_vaccinations_raw','daily_vaccinations','total_vaccinations_per_hundred',
    'people_vaccinated_per_hundred','people_fully_vaccinated_per_hundred','daily_vaccinations_per_million',
    'vaccines','last_observation_date','source_name','source_website','Last_Update_Date',
    'Last_Reported_Flag'])