# Download Docs
This notebook downloads documentation from the IGRA2 website.

- igra2-station-list.txt - The data in this file allows us to identify which station ids we are interested in
- igra2-list-format.txt - Documents the data structure of igra2-station-list.txt
- igra2-data-format.txt - Documents the data structure of the data files

Update the following parameters in the first cell to accomodate your installation:

- DST_PATH - The location to download the files into

In [1]:
import os
import requests

SRC_PATH = 'https://www.ncei.noaa.gov/data/integrated-global-radiosonde-archive/doc'
DST_PATH = '/lakehouse/default/Files/bronze/igra2/doc'
FILES = ['igra2-station-list.txt', 'igra2-data-format.txt', 'igra2-list-format.txt']

In [2]:
# Make sure the destination path exists
os.makedirs(DST_PATH, exist_ok=True)

In [3]:
# Loop through the list of files
for filename in FILES:
    src = f'{SRC_PATH}/{filename}'
    dst = f'{DST_PATH}/{filename}'

    # If the file already exists, don't download it again
    if os.path.exists(dst):
        print(f"File exists: {filename}")
        continue

    # Download the file
    with requests.get(src) as r:
        open(dst, 'wb').write(r.content)

    print(f"Downloaded: {filename}")

Downloaded: igra2-station-list.txt
Downloaded: igra2-data-format.txt
Downloaded: igra2-list-format.txt
