# ISD-Lite

Download near surface (10 m) wind speed data from the Integrated Surface Dataset (ISD) Lite. We use the [PyISD](https://github.com/CyrilJl/isd-fetch) package. We download `winddirection` [degrees] and `windspeed` [m/s] for 1979-01-01 till 2024-12-31.

In [7]:
import sys
import os
sys.path.append('../../')
from setup import CWD, SCRATCH_DIR#, DATA_DIR
print(CWD)
DATA_DIR = os.path.join(CWD, 'data/')
print(DATA_DIR)
print(SCRATCH_DIR)

/home/valencig/GPLLJ/
/home/valencig/GPLLJ/data/
/scratch/valencig/GPLLJ-Scratch/


In [2]:
from pyisd import IsdLite
import polars as pl

# Initialize the client
isd = IsdLite(crs=4326, verbose=True)

# View available stations
isd.raw_metadata.sample(5)

Unnamed: 0,USAF,WBAN,STATION NAME,CTRY,ST,CALL,ELEV(M),BEGIN,END,x,y,geometry
7606,325400,99999,YELIZOVO,RS,,UHPP,39.9,1932-01-01,2025-05-25,158.454,53.168,POINT (158.454 53.168)
24430,868730,99999,PASSA QUATRO,BR,,,1041.5,2016-07-04,2025-01-03,-44.967,-22.4,POINT (-44.967 -22.4)
22557,767230,99999,ISLA SOCORRO COL.,MX,,,35.0,1973-01-01,1997-12-30,-110.95,18.717,POINT (-110.95 18.717)
2462,67600,99999,LOCARNO-MONTI,CH,,,380.0,1973-01-01,2025-05-25,8.783,46.167,POINT (8.783 46.167)
20312,725690,24089,NATRONA COUNTY INTERNAT,US,WY,KCPR,1621.1,1940-01-01,2025-05-28,-106.474,42.898,POINT (-106.474 42.898)


In [3]:
# xmin, ymin, xmax, ymax
geometry = (-105, 25, -90, 50)  # CNA region

In [4]:
# Query the data -> satellite era, want complete years
cna_data = isd.get_data(
    start='1979-01-01',
    end='2024-12-31',
    geometry=geometry,
    organize_by='field'
)

  0%|          | 0/1155 [00:00<?, ?it/s]

## Create dataframes

In [8]:
windspeed = pl.from_pandas(cna_data['windspeed'].reset_index(drop=False))\
    .unpivot(
        index='index',
        variable_name='station',
        value_name='windspeed'
    ).rename({'index': 'time', 'station': 'USAF'})
winddirection = pl.from_pandas(cna_data['winddirection'].reset_index(drop=False))\
    .unpivot(
        index='index',
        variable_name='station',
        value_name='winddirection'
    ).rename({'index': 'time', 'station': 'USAF'})
wind_df = windspeed.join(winddirection, how='left', on=['USAF', 'time'])
wind_df

time,USAF,windspeed,winddirection
datetime[ns],str,f64,f64
1979-01-01 00:00:00,"""990179""",,
1979-01-01 01:00:00,"""990179""",,
1979-01-01 02:00:00,"""990179""",,
1979-01-01 03:00:00,"""990179""",,
1979-01-01 04:00:00,"""990179""",,
…,…,…,…
2024-12-31 19:00:00,"""718425""",,
2024-12-31 20:00:00,"""718425""",,
2024-12-31 21:00:00,"""718425""",,
2024-12-31 22:00:00,"""718425""",,


## Filter metadata to CNA stations

A station can close then reopen so the same USAF number can have multiple listings in the metadata, this is okay we just need to keep it in mind in the future.

In [14]:
raw_metadata = pl.from_pandas(isd.raw_metadata[['USAF', 'WBAN', 'STATION NAME', 'ST', 'CALL', 'ELEV(M)', 'BEGIN', 'END', 'x', 'y']])\
    .rename({
        'x': 'lon',
        'y': 'lat',
    })
metadata = raw_metadata.filter(pl.col('USAF').is_in(wind_df['USAF'].unique()))
metadata

Please use `implode` to return to previous behavior.

See https://github.com/pola-rs/polars/issues/22149 for more information.
  metadata = raw_metadata.filter(pl.col('USAF').is_in(wind_df['USAF'].unique()))


USAF,WBAN,STATION NAME,ST,CALL,ELEV(M),BEGIN,END,lon,lat
str,i64,str,str,str,f64,datetime[ns],datetime[ns],f64,f64
"""690190""",13910,"""ABILENE DYESS AFB""","""TX""","""KDYS""",545.3,1943-12-01 00:00:00,1997-12-31 00:00:00,-99.85,32.433
"""690190""",99999,"""DYESS AFB/ABILENE""","""TX""",,545.0,2000-01-03 00:00:00,2004-12-30 00:00:00,-99.85,32.417
"""690280""",99999,"""PORT ISABEL (CGS)""","""TX""",,5.0,1982-06-27 00:00:00,1983-11-16 00:00:00,-97.167,26.067
"""690290""",99999,"""PORT ARANSAS""","""TX""",,2.0,1982-06-25 00:00:00,2004-11-29 00:00:00,-97.067,27.833
"""690500""",99999,"""FORT CHAFFEE""","""AR""","""KQCU""",143.0,1983-01-19 00:00:00,1983-01-19 00:00:00,-94.2,35.167
…,…,…,…,…,…,…,…,…,…
"""998482""",99999,"""TEXAS STATE AQUARIUM""","""TX""",,8.0,2011-10-06 00:00:00,2025-05-25 00:00:00,-97.39,27.812
"""998483""",99999,"""S PADRE ISLAND CGS""","""TX""",,10.0,2011-10-06 00:00:00,2025-05-25 00:00:00,-97.177,26.077
"""998484""",99999,"""COPANO BAY""","""TX""",,10.0,2011-10-06 00:00:00,2017-10-06 00:00:00,-97.022,28.118
"""998487""",99999,"""PORT LAVACA""","""TX""",,10.0,2011-10-06 00:00:00,2025-05-25 00:00:00,-96.595,28.64


## Save dataframes

Whole ISD lite is a little over 13 GB

In [15]:
isd_lite_metadata = os.path.join(DATA_DIR, 'isd_lite_cna_metadata.csv')
isd_lite_file = os.path.join(SCRATCH_DIR, 'isd_lite_cna.csv')

metadata.write_csv(isd_lite_metadata)
wind_df.write_csv(isd_lite_file)