# Collect data

## Data sources

- [Johns Hopkins University (JHU) - Time Series](https://github.com/CSSEGISandData/COVID-19)
- [Johns Hopkins University (JHU) - Vaccination](https://github.com/govex/COVID-19/)
- [Our World in Data (OWiD)](https://ourworldindata.org/covid-vaccinations)
- [World Health Organization (WHO)](https://covid19.who.int/who-data/vaccination-data.csv)
- [Government of Mexico - COVID-19](https://datos.gob.mx/busca/dataset/informacion-referente-a-casos-covid-19-en-mexico)

## Load libraries

In [None]:
import requests
import covid_analysis.utils.paths as path

## Utility functions

In [None]:
def download_csv(url: str, out_file: path.Path) -> None:
    request = requests.get(url)
    content = request.content

    with open(out_file, "wb") as file_content:
        file_content.write(content)


## Define default output directory

In [None]:
output_dir = path.data_raw_dir()
output_dir.mkdir(parents=True, exist_ok=True)

## Download Johns Hopkins University time series

The time series provided by Johns Hopkins University includes the confirmed cases and deaths accumulated since January 22, 2020, by country or province of the country. The recovered table was depreciated due to [Issue #3464](https://github.com/CSSEGISandData/COVID-19/issues/3464) and [Issue #4465](https://github.com/CSSEGISandData/COVID-19/issues/4465) and is subject to return uncertainty.

In [None]:
hopkins_base_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/"

hopkins_filenames = (
    "time_series_covid19_confirmed_global.csv",
    "time_series_covid19_deaths_global.csv"
)

hopkins_time_series_urls = {
    path.data_raw_dir(file_name): f"{hopkins_base_url}{file_name}"
    for file_name in hopkins_filenames
}

In [None]:
[
    download_csv(url, out_path) for out_path, url in hopkins_time_series_urls.items()
];

## Download Johns Hopkins University countries metadata

This table contains the identifiers of each country and province together with an estimate of its population.

Here it is important to note that, in its description, you can found the following information:

:::{warning}
The names of locations included on the Website correspond with the official designations used by the U.S. Department of State. The presentation of material therein does not imply the expression of any opinion whatsoever on the part of JHU concerning the legal status of any country, area or territory or of its authorities. The depiction and use of boundaries, geographic names and related data shown on maps and included in lists, tables, documents, and databases on this website are not warranted to be error free nor do they necessarily imply official endorsement or acceptance by JHU.
:::

Therefore, you will be able to find observations that have a description or country name such as `Taiwan*` instead of just `Taiwan`. However, for quantitative fines, you can ignore that fact and quantify the results by identifier.

In [None]:
countries_meta_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv"
countries_meta_filename = output_dir.joinpath("UID_ISO_FIPS_LookUp_Table.csv")

In [None]:
download_csv(countries_meta_url, countries_meta_filename);

## Download Johns Hopkins University vaccination time series

The global time series provided by Johns Hopkins University contains the number of vaccination doses given and whether the person received their first dose or was fully vaccinated.

In [None]:
vaccination_url = "https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/vaccine_data/global_data/time_series_covid19_vaccine_global.csv"
vaccination_filename = output_dir.joinpath("time_series_covid19_vaccine_global.csv")

In [None]:
download_csv(vaccination_url, vaccination_filename);

## Download Government of Mexico data

The Government of Mexico provides information on the Epidemiological Surveillance System for Viral Respiratory Diseases.

:::{note}
Preliminary data subject to validation by the Ministry of Health through the General Directorate of Epidemiology. The information contained corresponds only to the data obtained from the epidemiological study of a suspected case of viral respiratory disease when it is identified in the medical units of the Health Sector.
:::

### Data dictionaries

The data dictionaries contain an excel (`.xlsx`) file where each sheet corresponds to the annotation of a table with its key and description.

In [None]:
data_dict_mex_url = "http://datosabiertos.salud.gob.mx/gobmx/salud/datos_abiertos/diccionario_datos_covid19.zip"
data_dict_mex_filename = str(output_dir.joinpath("diccionario_datos_covid19.zip"))

In [None]:
!wget -q {data_dict_mex_url} -O {data_dict_mex_filename}

### Open covid-19 data

The open data of covid19 from the government of Mexico is in a pervasive and zero standardized `.csv` file at the database level because it is a look-up table. Therefore, different variables can be calculated from others in the same data set.

In [None]:
data_mex_url = "http://datosabiertos.salud.gob.mx/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip"
data_mex_filename = str(output_dir.joinpath("datos_abiertos_covid19.zip"))

Although you could download the database (approximately 2GB) using the following command:

```bash
!wget -q {data_mex_url} -O {output_dir}
```

Here we opted to use `axel` to speed up the process a bit.

In [None]:
!axel -q -n 8 {data_mex_url} -o {output_dir}