## Access and Visualize ICESat-2 ATL08 Data
### Data Query, Download, and Visualization Example Notebook
This notebook illustrates the use of `icepyx` for ICESat-2 data access and download from the NASA NSIDC DAAC (NASA National Snow and Ice Data Center Distributed Active Archive Center) as part of a basic data analysis workflow.
Although it uses ATL08 as the dataset of interest, `icepyx` provides access to all ICESat-2 datasets.
More detailed example notebooks ([DataAccess1](https://github.com/icesat2py/icepyx/blob/development/examples/ICESat-2_DAAC_DataAccess_Example.ipynb); [DataAccess2_Subsetting](https://github.com/icesat2py/icepyx/blob/development/examples/ICESat-2_DAAC_DataAccess2_Subsetting.ipynb)) demonstrate in greater detail the data access, ordering, and subsetting options available when obtaining data using `icepyx`.

#### Objectives
1. Declare search parameters
2. Create an icepyx query object
3. Visualize the area of interest and data prior to ordering
4. Order and download ICESat-2 data
5. Quickly open and visualize the downloaded data

#### Credits
* tutorial by: Jessica Scheick, for the May 2021 ICESat-2 Data Users Quarterly Call
* source material and coauthors: [icepyx examples](https://icepyx.readthedocs.io/en/latest/getting_started/example_link.html) and references therein.


### 0. Set up your computing environment

If you have not already installed `icepyx` and the libraries it depends on, please see our [installation instructions](https://icepyx.readthedocs.io/en/latest/getting_started/install.html) to set up your compute environment.
Note that you will need to restart your notebook kernel for changes to take effect for running this tutorial.

#### Import packages, including icepyx

In [None]:
import icepyx as ipx
import geopandas as gpd
import geoviews as gv
import h5py
import hvplot
import hvplot.pandas
import numpy as np
import os
import pandas as pd
from shapely.geometry import Point
import shutil
%matplotlib inline

### 1. Set search parameters
ICESat-2 data can be searched in several different ways.

Here we will search for ATL08 (Land Water Vegetation Elevation) data using a bounding box.

In [None]:
short_name = 'ATL08' # See https://nsidc.org/data/icesat-2/data-sets for a list of the available datasets.
spatial_extent = [-69.7, 44.8, -69.5, 45]
date_range = ['2018-01-01','2021-04-30']

### 2. Create a Query object with the desired search parameters

We will use these three inputs, `short_name`, `spatial_extent`, and `date_range`, to create an `icepyx.Query` object.
`short_name` and `spatial_extent`, the latter of which can be entered as a bounding box, list of coordinates, or by providing a polygon file, are always required.
A third input, here `date_range`, is also required.
Other potential inputs for this third required input are orbital parameters `cycles` and `tracks`.
Additional, optional inputs, such as start and end times and data set version, can also be provided.

For more details on the required and optional search criteria and acceptable input formats, please see the [DataAccess example tutorial](https://github.com/icesat2py/icepyx/blob/development/examples/ICESat-2_DAAC_DataAccess_Example.ipynb).

**Create the data object using our inputs**

In [None]:
region_a = ipx.Query(short_name, spatial_extent, date_range)

**Formatted parameters and function calls allow us to explore the data object we have created.**

In [None]:
print(region_a.dataset)
print(region_a.dates)
print(region_a.start_time)
print(region_a.end_time)
print(region_a.cycles)
print(region_a.tracks)
print(region_a.dataset_version)

**Built in methods allow us to get more information about our dataset**
In addition to viewing the stored object information shown above, we can also request summary information about the dataset itself or confirm that we have manually specified the latest version.

In [None]:
region_a.dataset_summary_info()
print(region_a.latest_version())

### 3. Pre-order Visualizations

There are two visualization methods you can call from your query object.

The first is a visualization of your spatial extent on a map.
The style of map you will see depends on whether or not you have a certain library, `geoviews`, installed.
Under the hood, this is because the `proj` library must be installed with conda (it is not available from PyPI) to support some `geoviews` dependencies.
With `geoviews`, this plotting function returns a zoomed-in, interactive map.
Otherwise, your spatial extent will plot on a static world map using `matplotlib`.

In [None]:
region_a.visualize_spatial_extent()

The second visualization provides a quick look at the actual ICESat-2 elevations available.
The [OpenAltimetry (OA) API](https://openaltimetry.org/data/swagger-ui/#/) provides a nice way to achieve this directly within our workflow.
By sending metadata from our `Query` object to the OpenAltimetry API, it can return the most recent (relative to your specified end date) set of elevation data almost instantaneously.

**Note: the OA API, and thus this function, currently only supports products `ATL06, ATL07, ATL08, ATL10, ATL12, ATL13`**

For additional information on using icepyx's visualiation module, including additional access options and limitations for large spatial extents, please see the [Data Visualization Example tutorial](https://github.com/icesat2py/icepyx/blob/development/examples/ICESat-2_Data_Visualization_Example.ipynb)

In [None]:
cyclemap, rgtmap = region_a.visualize_elevation()
cyclemap

**Plot elevation for individual RGT**

The visualization tool also provides the option to view elevation data by latitude for each ground track.

In [None]:
rgtmap

### 4. Order and download the ICESat-2 data

For more details on the data ordering, subsetting, and downloading processes, see [ICESat-2_DAAC_DataAccess_Example](https://github.com/icesat2py/icepyx/blob/main/examples/ICESat-2_DAAC_DataAccess_Example.ipynb) and [ICESat-2_DAAC_DataAccess2_Example_Subsetting](https://github.com/icesat2py/icepyx/blob/development/examples/ICESat-2_DAAC_DataAccess2_Subsetting.ipynb).

Obtaining data from NSIDC actually occurs in multiple steps, which you are likely familiar with if you've ever used Earthdata Explorer.
The first step, which the above visualization used behind the scenes, is to query the data and figure out which granules are a potential match.
`icepyx` builds a query for you and submits it to the NSIDC Common Metadata Repository (CMR) to figure out what granules might have data of interest to you.

**You can view a list of available granules that match your search criteria**

In [None]:
region_a.avail_granules()

The next step is to place an order for your data, so that NSIDC will prepare it for download.
To order data, you must first **log in to Earthdata**.
If you do not have an [Earthdata login](https://urs.earthdata.nasa.gov/), you can easily create one for free.
You will be prompted to enter your password, or you can [create a `.netrc` file to store your credentials](https://discourse.pangeo.io/t/earthdata-password-pop-up-box/1358).

In [None]:
earthdata_uid = 'icepyx_devteam'
email = 'icepyx.dev@gmail.com'
region_a.earthdata_login(earthdata_uid, email)

Once you're logged in, you can place your data order.
See the [Subsetting Example](https://github.com/icesat2py/icepyx/blob/development/examples/ICESat-2_DAAC_DataAccess2_Subsetting.ipynb) for more information on taking advantage of the NSIDC subsetting tools to obtain more manageable files and decrease preprocessing.
At this stage, the NSIDC subsetter will actually retrieve your data and crop it to your area of interest.
Thus, your order may contain fewer granules (or even no granules!) depending on how your spatial criteria overlap the granule geospatial metadata (see Figure 1 in the [NSIDC Programmatic Access Guide](https://nsidc.org/support/how/how-do-i-programmatically-request-data-services) for some more information on this).

Once you **place your order**, `icepyx` will provide updates to let you know how it's progressing.

In [None]:
region_a.order_granules()

Once your order is completed, the files are staged by NSIDC and ready for you to download.

**Specify a local data directory and download your granules.**

In [None]:
path = './download'
region_a.download_granules(path)

### 5. Open and visualize the downloaded data

Now that we have downloaded our data, we can move on to our analysis workflow.

A few important notes:
1. Ultimately, we plan to generalize these types of reader functions and wrap them into a new class object within `icepyx`. That development is in the planning stages (this would be a great way to get involved, whether or not you've already written/used your own readers). In the meantime, they're included as hard-coded functions in this section of the notebook.
2. These functions were modified from those originally written by Bidya Yadav.

In [None]:
def read_atl08(icesat2_path):
    """ Read ATL08 HDF file and return a geodataframe.    
        Extract variables of interest and separate the ATL08 file 
        into each beam (ground track) and ascending/descending orbits.
        TODO: Append ascending/descending nodes to output name
        
        All the hdf files inside the specified folder will be parsed
    """
    files = os.listdir(icesat2_path)
    hdf_files = [f for f in files if f.endswith('.h5') and 'ATL08' in f]
    print(f'Number of HDF files: {len(hdf_files)}')
    atl08_gdf = gpd.GeoDataFrame()
    for f in hdf_files:
        print(f'Processing HDF file: {f}')
        hdf_path = f'{icesat2_path}/{f}'
        res_dict = {}
        meta_dict = {} #These will hold metadata required for scalars per ground-track
        group = ['gt1l', 'gt1r', 'gt2l', 'gt2r', 'gt3l', 'gt3r']
        qual_str_count = ''
        with h5py.File(hdf_path, 'r') as fi:
            # subset group based on data
            group = [g for g in list(fi.keys()) if g in group]
            group1 = len(group)
            group = [g for g in group if 'land_segments' in fi[f'/{g}']]
            group2 = len(group)
            if group2<group1:
                print.info(f'Non-empty groups: {group2}/{group1}')
            # NB: Assert if at least one group present else may be error due to enumeration
            if len(group) == 0:
                print(f'No ground track data in this file: {f}')
                continue
            t_ref = fi['/ancillary_data/atlas_sdp_gps_epoch'][:] #scalar 1 value
            for k,g in enumerate(group):
                # 1) Read in data for a single beam #
                lat = fi[f'/{g}/land_segments/latitude']
                lon = fi[f'/{g}/land_segments/longitude']
                t_dt = fi[f'/{g}/land_segments/delta_time']
                layer_flag = fi[f'/{g}/land_segments/layer_flag']
                # Extract everything for terrain and canopy variables
                terrain_keys = fi[f'{g}/land_segments/terrain'].keys()
                terrain_keys = list(terrain_keys)
                terrain_dict = {}
                for tk in terrain_keys:
                    terrain_dict[tk] = fi[f'{g}/land_segments/terrain/{tk}']
                #terrain = pd.DataFrame.from_dict(terrain_dict)
                # Do the same for Canopy
                canopy_keys = fi[f'{g}/land_segments/canopy'].keys()
                canopy_keys = list(canopy_keys)
                # Remove these two keys as they contain some multivalue tuples which are just no-data in version 002
                canopy_keys.remove('canopy_h_metrics')
                canopy_keys.remove('canopy_h_metrics_abs')
                canopy_dict = {}
                for ck in canopy_keys:
                    canopy_dict[ck] = fi[f'{g}/land_segments/canopy/{ck}']                    
                # Remove the canopy subset flag that seems added in version 3 because this is an array but we can't save array to shapefile
                if 'subset_can_flag' in canopy_dict:
                    del canopy_dict['subset_can_flag']
                if 'subset_te_flag' in terrain_dict:
                    del terrain_dict['subset_te_flag']
                #Merge two dictionaries (order should be retained; verify when running again wity Python 3.7)
                terrain_dict.update(canopy_dict)
                
                # 2) To Make Pandas dataframe
                # Collect everythin into one dictionary
                gt_dict = {'lon':lon, 'lat':lat, 't_dt':t_dt, 'layer_flag':layer_flag}
                gt_dict.update(terrain_dict)
                #df = pd.DataFrame({'lon':lon, 'lat':lat, 'h_li': h_li, 'q_flag':q_flag, 't_dt':t_dt})
                df = pd.DataFrame.from_dict(gt_dict)
                nan_value =  fi[f'/{g}/land_segments/terrain/h_te_mean'].attrs['_FillValue']
                df= df.replace(nan_value, np.nan)

                all_points = len(df)
                if len(df)>0:
                    # Assemble ground track into a dictionary
                    res_dict[g] = df
            #----------------------------------------------------------------------------------------------
            # Now that ATL08 data from separate ground tracks are in one dict, merge it to df
            if len(res_dict)>0:
                # To guard againt empty result dictionary created with no icesat2 passing the quality control
                # Combine Dataframes for each of 6 ground-tracks into single Dataframe
                count = 0
                for k in res_dict.keys():
                    if count == 0:
                        df = res_dict[k]
                        df['strip'] = k
                        count += 1
                    else:
                        df1 = res_dict[k]
                        df1['strip'] = k
                        df = pd.concat([df, df1], axis=0)
                                
                df['geometry'] = df[['lon', 'lat']].apply(lambda x: Point(x), axis=1)
                gdf = gpd.GeoDataFrame(df, geometry='geometry', crs = 'epsg:4326')

                else:
                print(f"\tNo Ground Track in this HDF file; csv or shp not created")
            
            atl08_gdf = atl08_gdf.append(gdf)
        
        return atl08_gdf         

**Generate a geodataframe from the downloaded ATL08 data files**

In [None]:
gdf = read_atl08(path+'_new')

In [None]:
gdf.head()

**Use holoviews to quickly visualize the downloaded data**

In [None]:
base = gv.tile_sources.ESRI # Add basemap to get sense of where the data is
gtracks = gdf.hvplot.points(geo=True, color='strip', s=15, width=500, height=700) # Plot ground tracks
# base * gtracks

## Plot terrain and canopy data [# Pick one Ground Track ]
terrain = gdf[gdf.strip == 'gt2l'].hvplot(y='lat', x='h_te_min', kind='scatter', width=400, height=650, color='brown', s=10, alpha=.7).relabel('Terrain')
canopy =  gdf[gdf.strip == 'gt2l'].hvplot(y='lat', x='h_max_canopy_abs', kind='scatter', width=400, height=650, color='green', s=10, alpha=.5, title=f'Elevation', xlabel='meters').relabel('Canopy')

# Compose a Plot with with basemap overlaying ground track and another tile with terrain and canopy 
base * gtracks + terrain * canopy