# Retrieval

This notebook only serves the purpose of retrieving, we only retrieve the data, or . We will employ the following modules:

In [None]:
import os
import subprocess
import shutil
import requests
import tempfile

from local_nbutils import CFG

Here `local_nbutils` is a custom module. 
Dont want to silence dynamic type checker `CFG` our configuration variables which we will use throughout the: 

In [None]:
padding_length = max([len(k) for k in CFG])

print("CFG Dictionary:")
for k, v in CFG.items():
    print(f"  {k:{padding_length}} : {v}")

## Flight Delay Data

Contrary to e.g. [kaggle](https://www.kaggle.com), zindi provides no option to . Therefore, the easiest way to retrieve the data is to download them manually from [here](https://zindi.africa/competitions/flight-delay-prediction-challenge/data) (an account is necessary). Among the three offered files, only `train.csv` is relevant to us. Please store this file at the location `CFG[" TRAIN_DATA_PATH"]`.

## Airport Data

Next, let us specify all required configurations to have them in one place.

### HTTP Method

In [None]:
HTTP_REPO_URL = "https://github.com/davidmegginson/ourairports-data"
CSV_PATH_REL = "airports.csv"
AIRPORTS_DATA_PATH = CFG["AIRPORTS_DATA_PATH"]

http_url = f"{HTTP_REPO_URL}/blob/main/{CSV_PATH_REL}"

In [None]:
response = requests.get(http_url)
response.raise_for_status()

with open(AIRPORTS_DATA_PATH, "wb") as f:
    f.write(response.content)

### SSH Version

For reasons of completeness
Let us mention the way if there you need your  when you have to authenticate on github via ssh key!

How to download single file from Github.

Note that the referenced repository is not public. You need to authenticate with ssh!

In [None]:
SSH_REPO_URL = "git@github.com:davidmegginson/ourairports-data.git"
CSV_PATH_REL = "airports.csv"

In [None]:
with tempfile.TemporaryDirectory() as tmpdir: 
    subprocess.run([
        "git", "clone",
        # Clones shallowly.
        "--depth", "1",
        # Skips blobs initially.
        "--filter=blob:none",
        # Enables sparse checkout mode 
        "--sparse", 
        SSH_REPO_URL, tmpdir
    ], check=True)
    
    subprocess.run([
        # Runs git commands in the cloned repo directory.
        "git", "-C", tmpdir, 
        # Initialises sparse checkout.
        "sparse-checkout", "init",
    ], check=True)
    
    subprocess.run([
        # Runs git commands in the cloned repo directory.
        "git", "-C", tmpdir,
        # Specifies which files to include in the sparse checkout.
        "sparse-checkout", "set", CSV_PATH_REL,
        # Relaxes the checks as sparse-checkout expects directories.
        "--skip-checks", 
    ], check=True)
    
    src_file = os.path.join(tmpdir, CSV_PATH_REL)
    shutil.copy2(src_file, AIRPORTS_DATA_PATH)