# The Data Collector

This notebook will be used to collect data to be used in the rest of the dashboard. Each
cell will be a self-contained codebase to collect data for a single data point, and will
correspond to a similar cell within The Dashboard UI notebook. Data persistence for these
notebooks will be in locally-stored CSV files (that can be changed easily by updating a
shared data persistence function), and the general execution flow will be as follows:

## Check last import date
Each data type and source will have a hard-coded Data Import Frequency variable, set by
looking at historical update frequency for said data. Checking this against the last import
date of the data in storage protects against unnecessary data pulls, network IO, and IP
blocking from data sources.

## Import data
If our cell passes the last import date gate, then we append new data to our existing source
in persistent storage.

## Check for consistency
Once data is imported, new data is checked against existing data to compare for consistency,
outliers, and missing values. If inconsistencies or missing values are found, the import is
flagged for human review.

# Shared functions
This cell contains functions to be used among all collectors, to minimize duplicated code.
*Note*: this cell _must_ be initialized before running any collector cell.

In [None]:
# function that opens or creates CSV in persistent storage
def file_opener(collector_type: str, collector_subtype: str, file_name: str) -> None:
    import os

    target_directory = os.path.join('data', 'output', collector_type, collector_subtype)
    target_file = os.path.join(target_directory, file_name)

    if os.path.exists(target_directory):
        print('exists')
    else:
        os.

# function that checks CSV date last updated
def file_last_updated(file_name: str) -> str:
    pass

# stub function that connects to an API to collect data based on CSV last updated date
def API_downloader(url:str, params: dict) -> None:
    pass

# stub function that scrapes website (if no API available) based on CSV last updated date
def scraper_downloader(url: str, params: dict) -> None:
    pass

# 1.1 Geographic Data: Basemaps
This consists of international, supranational, national, and province-level boundaries, as
well as major cities. Data is meant to be used as base layers for other geospatial products.

Data Import Frequency: n/a