# Download Data
This notebook will download the actual IGRA2 data. By default, we download data for those stations in Tornado Alley that are currently collecting data.

## Data-Por
We are obtaining the data-por version of the IGRA2 data because it contains the full history of observations. The more data the better. By default, we download 23 data files totalling about 1.6 GB in size. You do not need to unzip these files as the olieigra reader can process them in their zipped form.

## Parameterization
Update these parameters to suit your needs

- SILVER_STATION_LIST_PATH - Full path to the igra2-station-list.csv file (See README in this folder if you don't have this file)
- DST_PATH - Folder to download the data files into
- QUERY - SQL Statement to select the stations for which to download data

In [1]:
import os
import pandas as pd
import requests
from pandasql import sqldf

SILVER_STATION_LIST_PATH = '/Users/olievortex/lakehouse/default/Files/silver/igra2/doc/igra2-station-list.csv'
SRC_PATH = 'https://www.ncei.noaa.gov/data/integrated-global-radiosonde-archive/access/data-por'
DST_PATH = '/Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por'

# Select stations in Tornado Alley that are currently collecting data
QUERY = '''SELECT id, state, name, fst_year, nobs
FROM df 
WHERE STATE IN ('CO', 'IA', 'IL', 'KS', 'MN', 'MO', 'ND', 'NE', 'OK', 'SD', 'TX', 'WI')
    AND lst_year = 2024
ORDER BY state, name'''

In [2]:
# Make sure the destination path exists
os.makedirs(DST_PATH, exist_ok=True)

In [3]:
# Load the CSV and data sanity check
df = pd.read_csv(SILVER_STATION_LIST_PATH, index_col=0)
df.head()

Unnamed: 0_level_0,latitude,longitude,elevation,state,name,fst_year,lst_year,nobs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ACM00078861,17.117,-61.783,10.0,,COOLIDGE FIELD (UA),1947,1993,13896
AEM00041217,24.4333,54.65,16.0,,ABU DHABI INTERNATIONAL AIRPOR,1983,2024,39914
AEXUAE05467,25.25,55.37,4.0,,SHARJAH,1935,1942,2477
AFM00040911,36.7,67.2,378.0,,MAZAR-I-SHARIF,2010,2014,2179
AFM00040913,36.6667,68.9167,433.0,,KUNDUZ,2010,2013,4540


In [4]:
# Perform the query on the data frame
results = sqldf(QUERY)

# Output which stations we matched
results

Unnamed: 0,id,state,name,fst_year,nobs
0,USM00072476,CO,GRAND JUNCTION/WALKER FIELD,1938,68763
1,USM00074455,IA,QUAD CITY,1935,23193
2,USM00074560,IL,LINCOLN,1995,21214
3,USM00072451,KS,DODGE CITY/MUN.,1940,65711
4,USM00072456,KS,TOPEKA/MUN.,1953,54326
5,USM00072649,MN,CHANHASSEN,1937,26612
6,USM00072747,MN,INT.FALLS/FALLS INT. MN.,1942,61276
7,USM00072440,MO,SPRINGFIELD/MUN.,1939,32343
8,USM00072764,ND,BISMARCK/MUN.,1932,78952
9,USM00072562,NE,NORTH PLATTE/LEE BIRD,1930,76207


In [5]:
for _, station in results.iterrows():
    filename = f"{station.id}-data.txt.zip"
    url = f"{SRC_PATH}/{filename}"
    local = f"{DST_PATH}/{filename}"

    if os.path.exists(local):
        print(f"File already exists: {local}")
        continue

    r = requests.get(url)
    open(local, 'wb').write(r.content)
    print(f"Downloaded {local}")

File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00072476-data.txt.zip
File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00074455-data.txt.zip
File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00074560-data.txt.zip
File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00072451-data.txt.zip
File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00072456-data.txt.zip
File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00072649-data.txt.zip
File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00072747-data.txt.zip
File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00072440-data.txt.zip
File already exists: /Users/olievortex/lakehouse/default/Files/bronze/igra2/data-por/USM00072764-data.txt.zip
File alrea