The goal is to find matching ICESat-2 ATL03 photon data matching the times and lat/lon of the published Kd values from BGC ARGO floats (10.5281/zenodo.8228242). 

The Kd CSV file was edited after download from Zenodo. Kd calculations were sorted by date. This pre-processing could jsut as easily be done in python but I already had the csv open to look at the contents. Edited version is called Dataset_Kd_Paper_2018. 

Actions:
- Load lat/lon and time for each Kd calculation
- Check if there is a matching pass for ICESat-2 within +- x hours and  x km.
- The notebook saves pickle files of GeoDataFrames (pandas), appending the matching row of the spreadsheet of latlontimes. 
- These GDFs are the output of the icesat2.atl03sp search function. The contents of each GDF can be found here: https://slideruleearth.io/web/rtd/user_guide/ICESat-2.html#photon-segments
- Also saves a new copy of Dataset_Kd_Paper_2018_dep with only the matching kd rows.
 


In [1]:
import pandas as pd
import numpy as np
from sliderule import sliderule, icesat2, earthdata
from datetime import datetime, timedelta
import time
pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)

sliderule.init(verbose=False)


def buoy_bound_box(lat,lon,buffer_km):
    # define a buffer distance around the buoy to search for icesat-2 data
    lat_buff = buffer_km/111 # convert buffer distance to frac of 1 deg lat
    lon_buff = buffer_km/(111*np.cos(lat*np.pi/180)) # convert buffer distance to frac of 1 deg lon
    # define bounding box around the buoy (WSEN)
    # example: bbox = [-108.3, 39.2, -107.8, 38.8]
    # bbox = [lon-lon_buff,lat+lat_buff,lon+lon_buff,lat-lat_buff]
    # region = sliderule.toregion(bbox)
    minx = lon - lon_buff
    miny = lat - lat_buff
    maxx = lon + lon_buff
    maxy = lat + lat_buff

    poly = [{'lon': minx, 'lat': miny},
            {'lon': maxx, 'lat': miny},
            {'lon': maxx, 'lat': maxy},
            {'lon': minx, 'lat': maxy},
            {'lon': minx, 'lat': miny}] # Closing the loop by repeating the first point
    return poly

In [2]:
# load time, lat and lon
df = pd.read_csv("Dataset_Kd_Paper_2018.csv")
# convert matlab time to datetime objects
df["dt_float"] = pd.to_datetime(df["dt_float"]-719529,unit='d',utc=True).round('s')
# remove all rows from before icesat2 launched
df = df[df["dt_float"]>datetime.fromisoformat('2018-10-01T00:00:00Z')]
df.reset_index(drop=True, inplace=True)
# df

In [3]:
df["check_sum"] = False
# these values can be adjusted to broaden/narrow the fit btwn icesat-2 and the ground truth
search_hrs = 3
search_km = 3
for jj in range(len(df)):
    if jj % 100 ==0:
        print('processing '+str(jj) +'/'+str(len(df)))# give a printout every 100 for my sanity
    # define a search region around the buoy 
    lat = df['lat_float'][jj]
    lon = df['lon_float'][jj]
    
    poly = buoy_bound_box(lat,lon,search_km)

    t_start = (df['dt_float'][jj]-timedelta(hours=search_hrs)).strftime("%Y-%m-%dT%H:%M:%SZ")
    t_end = (df['dt_float'][jj]+timedelta(hours=search_hrs)).strftime("%Y-%m-%dT%H:%M:%SZ")

    parms = {"poly": poly,
             "t0": t_start,
             "t1": t_end,
             "track": 0,
             "pass_invalid": True,
             "cnf": -2, # returns all photons
             "srt": icesat2.SRT_OCEAN
            }

    atl_gdb = icesat2.atl03sp(parms)
    if len(atl_gdb)>0:
        df.loc[jj,"check_sum"] = True
        print('no. of photons: '+str(len(atl_gdb)))
        atl_gdb.to_pickle('icesat2_'+str(jj)+'.pkl')
    del atl_gdb
    time.sleep(1) #avoid overloading the cmr server



processing 1200/5129
processing 1300/5129
processing 1400/5129
processing 1500/5129
processing 1600/5129
processing 1700/5129


Exception <-1>: Failure on resource ATL03_20191107102440_06340507_006_01.h5 track 1.0: H5Coro::Future read failure on /gt1l/geolocation/reference_photon_lat
Exception <-1>: Failure on resource ATL03_20191107102440_06340507_006_01.h5 track 1.0: H5Coro::Future read failure on /gt1l/geolocation/reference_photon_lat


processing 1800/5129
processing 1900/5129
processing 2000/5129
processing 2100/5129
processing 2200/5129
processing 2300/5129
processing 2400/5129
processing 2500/5129
processing 2600/5129
processing 2700/5129
processing 2800/5129
processing 2900/5129
processing 3000/5129
processing 3100/5129
processing 3200/5129
no. of photons: 32074
no. of photons: 7391
processing 3300/5129
processing 3400/5129
processing 3500/5129
processing 3600/5129
processing 3700/5129
processing 3800/5129
no. of photons: 29344
no. of photons: 29344
processing 3900/5129


Connection error to endpoint https://sliderule.slideruleearth.io/source/atl03sp ...retrying request
Connection error to endpoint https://sliderule.slideruleearth.io/source/atl03sp ...retrying request
Connection error to endpoint https://sliderule.slideruleearth.io/source/atl03sp ...retrying request
Unable to complete request due to errors
Connection error to endpoint https://sliderule.slideruleearth.io/source/atl03sp ...retrying request


no. of photons: 303380
processing 4000/5129
processing 4100/5129
no. of photons: 54464
no. of photons: 54464
no. of photons: 54464
no. of photons: 54464
processing 4200/5129
processing 4300/5129
processing 4400/5129
processing 4500/5129
processing 4600/5129
processing 4700/5129
processing 4800/5129
processing 4900/5129
processing 5000/5129
no. of photons: 46967
no. of photons: 46967
no. of photons: 71026
no. of photons: 71026
processing 5100/5129


In [4]:
# df = pd.read_pickle('glider_matches.pkl')
df=df[df["check_sum"]==True]
print(len(df))
# df.reset_index(drop=True, inplace=True)
df.to_csv('Dataset_Kd_Paper_2018_dep_3km3h.csv')


13
