# ATL08 Rebinning via Sliderule 
### Adapting tools from ATL08
Early versions of Biomass process included:
1. A step to convert ATL08 100 m granules into ATL08 30 m granules
2. A step to convert the ATL08 30 m granules from .h5 granules to .csvs
3. A step to ID which granules are over a biomass tile
4. A step to filter possible bad ICESat-2 measurements

This version aims to do all of those steps to simplify the process.  This notebook will show:
1. Select a particular tile
2. Use Sliderule to pull out all of the measurements over a specific tile
3. Filter the GPD

## Import libaries
Importing libs, in particular the sliderule.  This is installed onto the workspace with a conda call such as:

`conda install -c conda-forge sliderule`

After the package is installed in the workspace, it can be imported as normal

In [107]:
import matplotlib.pyplot as plt
import matplotlib
import geopandas as gpd
import logging 
import sliderule
from sliderule import icesat2

In [121]:
ICESAT2_BOREAL_REPO_PATH = '/projects/code/icesat2_boreal'               #'/projects/icesat2_boreal' # /projects/Developer/icesat2_boreal/lib
ICESAT2_BOREAL_LIB_PATH = ICESAT2_BOREAL_REPO_PATH + '/lib'
import sys, os
sys.path.append(ICESAT2_BOREAL_LIB_PATH)
import CovariateUtils
import maplib_folium
import mosaiclib
from mosaiclib import *
from pathlib import Path

## Part 1: Read Boreal Tile Geopackage w/ locations of tiles

Tiles define the polygon extents used in the spatial selection of ATL03/08

In [99]:
# Boreal Tiles 
boreal_tiles_model_ready_fn = '/projects/shared-buckets/montesano/databank/boreal_tiles_v004_model_ready.gpkg'
boreal_tiles = gpd.read_file(boreal_tiles_model_ready_fn)

In [100]:
len(boreal_tiles)

4956

### Map results with 'explore'

### Select a specific tile

For this example, it doesn't really matter which one..

In [174]:
FOCAL_TILE_LIST = boreal_tiles.tile_num.to_list()[0:50]
FOCAL_TILE_LIST = [3504, 34210, 3770, 3639, 3752, 1916, 36024, 35355, 2815, 72, 1775, 2443, 2996, 109, 2143, 38433, 4032, 3665, 2749, 3458, 2807, 1834, 1471, 773, 25459, 874, 41995, 2542, 354000, 3836, 20]
FOCAL_TILE = FOCAL_TILE_LIST[0]
print(len(FOCAL_TILE_LIST))

31


In [175]:
boreal_select = boreal_tiles[boreal_tiles.tile_num.isin(FOCAL_TILE_LIST)]

m = boreal_tiles.explore(color='black')
boreal_select.explore(m=m, color='red')

## Part 2: Collect ATL08 using tiles via Sliderule

Next part uses Sliderule to find ATL08 measurements within tile.  Sliderule will also re-aggregate ATL08 data from 100 m alongtrack measurements to 30 m alongtrack measurements using `PhoReal`.  The result will put the results in a geopandas dataframe.


### Initialize Sliderule Client
- For publis use, the default organization's node is `sliderule`;  
- For non-public use and testing, the `utexas` node can be used.  
     + a .netrc file will need to be configured.
- Number of nodes will increase speed, for now we will just use '1'
- More info on initializing is given here: https://slideruleearth.io/web/rtd/api_reference/icesat2.html#init

`NOTE`:  
using public branch of SlideRule  
    + the latest bug fix isnt on this public branch  
    + better to run on our utexas node - but need to figure out permissions issue when run from ADE  
    
flag bug will still be here (2/22/2024) 
+ any flag that uses integers
+ indexing bug - so flag get incorrectly associated to the observations

In [179]:
logging.basicConfig(level=logging.INFO)
icesat2.init("slideruleearth.io", verbose=True, loglevel=logging.INFO, 
             organization='sliderule', #'utexas'
             desired_nodes=5)
#icesat2.init("slideruleearth.io", verbose=True, organization="utexas", desired_nodes=1)

INFO:sliderule.sliderule:400 Client Error: Bad Request for url: https://ps.slideruleearth.io/api/org_num_nodes/sliderule/
INFO:sliderule.sliderule:Provisioning system status request returned error => No token is provided in the header or the header is missing
INFO:sliderule.sliderule:400 Client Error: Bad Request for url: https://ps.slideruleearth.io/api/desired_org_num_nodes_ttl/sliderule/5/60/
ERROR:sliderule.sliderule:Provisioning system update request error => No token is provided in the header or the header is missing


### Processing parameters

For the sliderule query, we need to establish search parameters, these include:
 - Our geographic AOI (the tile extent)
 - t0 and t1 time start and time stop to query.  For testing we are just doing 1 year's worth of data
 - SRT (surface type) is defaulted to land, this is a required paramter but is not important for ATL08 sliderule
 - 'len' is the length of each extent in meters - this should be our bin size
 - 'res'  step distance for successive extents in meters - this needs to be the same as len for our purposes
 - 'pass_invalid' flag indicating whether or not extents that fail validation checks are still used and returned in the results - set 'True'
 - 'atl08_class' is set to ground, canopy, and top of canopy
 - 'atl08_fields' should be set to all fields of interest, in our case fields that we want to later apply filtering
 
 As for fields required, for the current iteration of filtering, `['h_can', 'h_can_unc','h_dif_ref','m','msw_flg','beam_type','seg_snow', 'sig_topo','seg_cover','sol_el','seg_landcov']` are used.  There are a few differences in PhoREAL naming conventions and Sliderule.  PhoREAL naming conventions were created to fit better into other functionality of PhoREAL while SlideRule fields are pulled directly from ATL08 (for the most part).  Here are how these map:
 
```
PhoREAL     : Sliderule
h_can       : h_canopy #(default field, doesn't need to be listed)
h_can_unc   : canopy/h_canopy_uncertainty #(field is in the 'canopy' subfield)
h_dif_ref   : h_dif_ref
m           : time #(month needs to be unpacked)
msw_flag    : msw_flag
beam_type   : spot #(will need to be converted to 'weak' and 'strong')
seg_snow    : snowcover #(default field)
sig_topo    : sigma_topo
seg_cover   : canopy/segment_cover #(field is in the 'canopy' subfield)
sol_el      : solar_elevation
seg_landcov : segment_landcover
rh25        : canopy_h_metrics[3]
rh50        : canopy_h_metrics[8]
rh60        : canopy_h_metrics[10]
rh70        : canopy_h_metrics[12]
rh75        : canopy_h_metrics[13]
rh80        : canopy_h_metrics[14]
rh85        : canopy_h_metrics[15]
rh90        : canopy_h_metrics[16]
rh95        : canopy_h_metrics[17]
```

Also the relative height metrics stored in PhoREAL as `rh25, rh50, rh60` etc. are stored in the `canopy_h_metrics`.  `canopy_h_metrics` store the percentile metrics as shown:
```
0 ,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17
10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95
```

 First we will convert the extent to a polygon object

### Process ATL08
 - wrap SlideRule and downstream custom filtering into a single function that writes a geoparquet  
 - multiprocess locally using Python - to send `n_cpu` simultaneous SlideRule requests

In [110]:
import FilterUtils

In [111]:
def gdf_to_sliderulepoly(gdf):
    '''
    Return a dictionary of from a geodataframe of polygon coordinates needed for sliderule
    works for any polygon
    '''
    dict_list = []
    for i, lat in enumerate(gdf.get_coordinates()['y']):
        d = {'lat':lat, 'lon':gdf.get_coordinates().iloc[i]['x']}
        dict_list.append(d)
        if len(dict_list) == len(gdf.get_coordinates())-1: break

    return dict_list

In [167]:
def process_atl08_boreal(polygon_id, polygon_gdf_fn, id_col_num = 'tile_num', t0_year=2020, t1_year=2020, minmonth=6, maxmonth=9, seg_length = 30, 
                         outdir='/projects/my-private-bucket/data/atl08.v006',
                         atl08_cols_list=['rh25','rh50','rh60','rh70','rh75','rh80','rh90','h_can','h_max_can', 'ter_slp','h_te_best', 'seg_landcov','sol_el','y','m','doy'],
                         RETURN_DF=False):
    
    '''Runs sliderule's implementation of PhoReal to process a clipped and filtered geodataframe of custom ATL08 along-track segments designed for boreal forests
        + finds ATL03 for a polygon_id from geodataframe at path gdf_fn
        + filters by year and month
        + calculates custom ATL08 observations representing along-track segents of length 'seg_length' using sliderule
        + applies quality filtering customized for boreal forests
    '''
    
    # Subset polygon_gdf by tile and reproject to WGS84
    polygon_gdf = gpd.read_file(polygon_gdf_fn)
    polygon_gdf = polygon_gdf[polygon_gdf[id_col_num] == polygon_id].to_crs(4326)
    
    # Set SlideRule parameters
    maxday = 30
    if maxmonth in [2]: maxday = 29
    if maxmonth in [1,3,5,7,8,10,12]: maxday = 31
    
    outdir = os.path.join(outdir,f'{seg_length:03}m')
    out_name = os.path.join(outdir, f'atl08_006_{seg_length:03}m_{t0_year}_{t1_year}_{minmonth:02}_{maxmonth:02}')
    
    params_atl08 = {
            #"output": { "path": f"{out_name}_{polygon_id:04}.parquet", "format": "parquet", "open_on_complete": True }, # open on compelte not working as expected?
            "poly": gdf_to_sliderulepoly(polygon_gdf),
            "t0": f'{t0_year}-{minmonth:02}-01T00:00:00Z',
            "t1": f'{t1_year}-{maxmonth:02}-{maxday}T00:00:00Z',
            "srt": icesat2.SRT_LAND,
            "len": seg_length,
            "res": seg_length,
            "pass_invalid": True, 
            "atl08_class": ["atl08_ground", "atl08_canopy", "atl08_top_of_canopy"],
            "atl08_fields": ["canopy/h_canopy_uncertainty","h_dif_ref","msw_flag","sigma_topo","segment_landcover","canopy/segment_cover","segment_snowcover","terrain/h_te_uncertainty"], #'segment_cover' and 'h_canopy_uncertainty' need to be added
            #"atl08_fields": ["h_dif_ref","msw_flag","sigma_topo","segment_landcover"],
            "phoreal": {"binsize": 1.0, "geoloc": "center", "above_classifier": True, "use_abs_h": False, "send_waveform": True}
        }
    
    # Run SlideRule's implementation of PhoReal processing of ATL03 into custom ATL08
    # https://slideruleearth.io/web/rtd/api_reference/icesat2.html#atl08p
    atl08 = icesat2.atl08p(params_atl08, keep_id=True)
    
    ###############################
    #### Project-specific formatting
    # Format for filtering
    # Unpack the canopy_h_metrics into discrete fields
    atl08['rh25'] = [l[3] for l in atl08['canopy_h_metrics']]
    atl08['rh50'] = [l[8] for l in atl08['canopy_h_metrics']]
    atl08['rh60'] = [l[10] for l in atl08['canopy_h_metrics']]
    atl08['rh70'] = [l[12] for l in atl08['canopy_h_metrics']]
    atl08['rh75'] = [l[13] for l in atl08['canopy_h_metrics']]
    atl08['rh80'] = [l[14] for l in atl08['canopy_h_metrics']]
    atl08['rh85'] = [l[15] for l in atl08['canopy_h_metrics']]
    atl08['rh90'] = [l[16] for l in atl08['canopy_h_metrics']]
    atl08['rh95'] = [l[17] for l in atl08['canopy_h_metrics']]

    # Rename fields with 'canopy'
    atl08['h_canopy_uncertainty'] = atl08['canopy/h_canopy_uncertainty']
    atl08['segment_cover'] = atl08['canopy/segment_cover']
    atl08['h_te_uncertainty'] = atl08['terrain/h_te_uncertainty']

    # Drop 'waveform' and 'canopy_h_metrics' fields 
    atl08 = atl08.drop(columns=['waveform', 'canopy_h_metrics','canopy/h_canopy_uncertainty','canopy/segment_cover','terrain/h_te_uncertainty'])
    
    # Time is the index field
    atl08.reset_index(inplace=True)
    atl08['y'] = atl08.time.dt.year
    atl08['m'] = atl08.time.dt.month
    atl08['d'] = atl08.time.dt.day
    atl08['doy'] = atl08.time.dt.dayofyear
    atl08.drop('time', axis=1, inplace=True) # folium mapping a gdf with a datetime field doesnt work
    
    # Quality filtering designed for boreal forest
    atl08 = FilterUtils.filter_atl08_qual_v5(atl08, atl08_cols_list = atl08_cols_list)
    
    Path(outdir).mkdir(parents=True, exist_ok=True)
    out_atl08_filt_fn = f'{out_name}_filt_{polygon_id:05}.parquet'
    atl08.to_parquet(out_atl08_filt_fn)
    print(f'File written:\t{out_atl08_filt_fn}')
    
    if RETURN_DF:
        return atl08
    else:
        atl08 = None

In [204]:
def extract_atl08_covars(polygon_id, indir='/projects/my-private-bucket/data/atl08.v006/030m',
                         tindex_fn_list=[ TOPO_TINDEX_FN_DICT['c2020updated'], HLS_TINDEX_FN_DICT['c2020v2022updated'], SAR_TINDEX_FN_DICT['2020']] , 
                         RETURN_DF=False):
    
    '''Extract raster values from tiled (polygon extent) gridded stacks to input atl08 observations using a polygon id
     + designed to be multiprocessed with a list of polygon_ids (tile nums)
    '''
    import glob
    
    # Search for *filtered ATL08 v6 30m* file based 
    # This expects a fairly specific file naming convention ...
    SEARCH_STR = os.path.join(indir, f'atl08_006_030m_*_filt_{polygon_id:05}*')
    print(f'Tile: {polygon_id:05}')
    atl08_gdf_fn = glob.glob(SEARCH_STR)[0]
    print(atl08_gdf_fn)
    
    if atl08_gdf_fn.endswith('parquet'):
        atl08 = gpd.read_parquet(atl08_gdf_fn)
    else:
        atl08 = gpd.read_file(atl08_gdf_fn)
    
    print(atl08.shape)
    
    if atl08.shape[0] == 0: sys.exit(f'Tile {polygon_id:05} has 0 observations. Exiting.')
        
    # Get file name
    out_atl08_covars_fn = atl08_gdf_fn.split(f'_{polygon_id:05}.')[0] + f'_covars_{polygon_id:05}.parquet'
        
    # Extract covariates
    print(f'\nExtracting values from {len(tindex_fn_list)} sets of raster stacks and appending as columns to atl08 geodataframe...')
    for tindex_fn in tindex_fn_list: 
        print(tindex_fn)
        covar_fn = CovariateUtils.get_stack_fn(tindex_fn, polygon_id, user=None, col_name='local_path')
        atl08 = ExtractUtils.extract_value_gdf_s3(covar_fn, atl08, None, reproject=True)
        
    atl08.to_parquet(out_atl08_covars_fn)
    print(f'File written:\t{out_atl08_covars_fn}')
    
    if RETURN_DF:
        return atl08

In [192]:
#z = process_atl08_boreal(10, gdf_fn=boreal_tiles_model_ready_fn)

## Extract Covariate Pixel Values to ATL08 observations 

In [27]:
import importlib
import ExtractUtils
importlib.reload(ExtractUtils)
importlib.reload(CovariateUtils)

NASA MAAP


<module 'CovariateUtils' from '/projects/code/icesat2_boreal/lib/CovariateUtils.py'>

In [45]:
HLS_TINDEX_FN_DICT

{'c2020oldv1': 's3://maap-ops-workspace/shared/nathanmthomas/DPS_tile_lists/HLS/spring2022/HLS_tindex_master.csv',
 'c2020oldv2': 's3://maap-ops-workspace/shared/nathanmthomas/DPS_tile_lists/HLS/fall2022/HLS_stack_2022_v2/HLS_tindex_master.csv',
 'c2020v2022nmt': 's3://maap-ops-workspace/shared/nathanmthomas/DPS_tile_lists/HLS/c2020/HLS_stack_2022_v2/HLS_tindex_master.csv',
 'c2020v2022pmm': 's3://maap-ops-workspace/shared/montesano/DPS_tile_lists/HLS/c2020/HLS_stack_2022_v2/HLS_tindex_master.csv',
 'c2020v2022datelines': 's3://maap-ops-workspace/shared/montesano/DPS_tile_lists/HLS/HLS_stack_2023_v1/HLS_L30_c2020/HLS_tindex_master.csv',
 'c2020v2022updated': 's3://maap-ops-workspace/shared/montesano/DPS_tile_lists/HLS/c2020/HLS_stack_2022_v2/HLS_tindex_master_updated.csv',
 'c2020v2022updated_bad': 's3://maap-ops-workspace/shared/montesano/DPS_tile_lists/HLS/HLS_stack_2022_v2/HLS_tindex_master.csv',
 '2015': 's3://maap-ops-workspace/shared/nathanmthomas/DPS_tile_lists/HLS/c2015/HLS_sta

In [193]:
from multiprocessing import Pool
from functools import partial

In [194]:
import multiprocessing as mp
n_cpu = mp.cpu_count() - 1
n_cpu

31

In [195]:
FOCAL_TILE_LIST[0:1]

[3504]

In [None]:
%%time

######
# Multiprocessing is hanging during extract_atl08_covars()

# Solution : this will benefit from DPS
#####

if False:
    with Pool(processes=1) as pool:
        atl08_gdf_list = pool.map(partial(extract_atl08_covars, RETURN_DF=True), 
                                  FOCAL_TILE_LIST # Test selection of 31 tiles
                                  #boreal_tiles.tile_num.to_list() # All 4956 tiles
                                 )
    
else:
    atl08_gdf_list = []
    
    for TILE_NUM in FOCAL_TILE_LIST:
        atl08_gdf_list.append(extract_atl08_covars(TILE_NUM, RETURN_DF=True))
    