# GrIMP Subsetter Notebook

## Purpose

This notebook allows users to download subsets of [GrIMP](https://nsidc.org/data/measures/r) image ([NSIDC-0723](https://nsidc.org/data/nsidc-0723)) and velocity ([NSIDC-481](https://nsidc.org/data/nsidc-0481), [0725](https://nsidc.org/data/nsidc-0725), [0727](https://nsidc.org/data/nsidc-0727), [0731](https://nsidc.org/data/nsidc-0731), [0766](https://nsidc.org/data/nsidc-0766)) data. For the Sentinel-based velocity mosaics (0725, 0727, 0731), a user can select a box on a map and choose which components are downloaded (vv, vx, vy, ex, ey, dT) and saved to a netCDF file. Once the download is complete, users can [explore](#Visualizing-the-Data) the data by interactively selecting points that are plotted as time series. In the case of the TSX/OPT products (NSDIC-0481 & NSIDC-0646), given relatively small size of each box (~50-km-by-50-km), the full product "box" is downloaded. Because of the sparse nature of these boxes, only the the products associated with a single box can be downloaded at a time.

While this notebook relies python, it is designed to be usable with no knowledge of python. In most cases, the default behavior will acomplish what most users want and customization can be carried out via minor tweaks of the parameters in the examples. To accomplish this goal, much of the actual python code is in [grimpfunc](https://github.com/fastice/grimpfunc) libary rather than the notebook itself. 

## Environment Setup

_**There are several python packages that need to be installed to execute this notebook and potentially some jupyter lab/notebook extensions. Please follow the procedures outlined in the [**NSIDCLoginNotebook**](https://github.com/fastice/GrIMPNotebooks/blob/master/NSIDCLoginNotebook.ipynb)**_. These instructions assume a [*conda*](https://www.anaconda.com/products/individual) (Anaconda) install but should translate well to *pip* or other package managers. The following cell will load all of the necessary packages once they are installed. If errors occur, make sure all of the packages are installed as described in the [**NSIDCLoginNotebook**](https://github.com/fastice/GrIMPNotebooks/blob/master/NSIDCLoginNotebook.ipynb) are installed. 

The following packages are needed to execute this notebook. The notebook has been tested with the `environment.yml` in the *binder* folder of this repository. Thus, for best results, create a new conda environment to run this and other other GrIMP notebooks from this repository. 

`conda env create -f binder/environment.yml`

`conda activate greenlandMapping`

`python -m ipykernel install --user --name=greenlandMapping`

`jupyter lab`

See [NSIDCLoginNotebook](https://github.com/fastice/GrIMPNotebooks/blob/master/NSIDCLoginNotebook.ipynb) for additional information.

The notebooks can be run on a temporary virtial instance (to start click [**binder**](https://mybinder.org/v2/gh/fastice/GrIMPNotebooks/HEAD?urlpath=lab)). See the github [README](https://github.com/fastice/GrIMPNotebooks#readme) for further details.

## Python Setup

Execute the rest of the cells in this section to load the packages needed to run this notebook.

In [1]:
%load_ext autoreload
%autoreload 2
import grimpfunc as grimp
import panel as pn
pn.extension()
import nisardev as nisar
import rioxarray
import xarray as xr
import holoviews as hv
import os
import dask
import pandas as pd
from dask.diagnostics import ProgressBar
dask.config.set(num_workers=2)  # Avoid problems with too many open connections at NSIDC
import param
import grimpfunc.NASALogin as NASALogin
ProgressBar().register()

## Help

**Note to get help and see options for any of the GrIMP or other functions while the cursor is positioned inside a method's parentheses, click shift+Tab.**

## Trouble Shooting

NSIDC limits the number of simultaneous connections. As a result, a download can sometimes fail, especially if multiple notebooks are downloading or the ```num_workers``` is set to large. In these cases, try rerunning with only a single notebook downloading or ```num_workers=2``` (current default).

## Preliminary Setup

The first step is where users can choose the directory where the subsetted results will be saved. If the directory is not present, it will be created (note the directory where it will be created must exist). The final product filename can be customized [below](#set_filename). 

In [2]:
subsetPath = 'Subsets'  # Modify as needed
if not os.path.exists(subsetPath):
    os.mkdir(subsetPath)  # Will fail if the directories above don't exist

## Login if Needed

This step logs the user in to NSIDC using their NASA [Earth Data Login](https://urs.earthdata.nasa.gov/) credentials and saves the password in the users *.netrc* file (see [**NSIDCLoginNotebook**](https://github.com/fastice/GrIMPNotebooks/blob/master/NSIDCLoginNotebook.ipynb) for details on potential security risks). 

The credentials do not need to be re-entred if they are still present from a prior login.

In [3]:
myLogin = NASALogin()
myLogin.view()

Getting login from ~/.netrc
Already logged in. Proceed.


Update environment to find cookie files.

In [4]:
env = dict(GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/.grimp_download_cookiejar.txt'),
            GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/.grimp_download_cookiejar.txt'))
os.environ.update(env)

## Find Data

The first step is to locate the products of interest, which is done using the same search tool that is used for [**qgisRemoteNotebook**](https://github.com/fastice/GrIMPNotebooks/blob/master/qgisRemoteNotebook.ipynb), but with some key differences. Specifically, to avoid mixing data products of different sizes and resolutions, only a single product type (e.g., NSIDC-0723 *image* mosaics) can be searched for and retrieved at a time. For velocity products, the desired bands (*vv, vy, vx, ex, ey*, and *dT*) can be selected at later [stage](#Preload-Data-and-Select-Bands). Unlike the [**qgisRemoteNotebook**](https://github.com/fastice/GrIMPNotebooks/blob/master/qgisRemoteNotebook.ipynb) search, TSX (NSIDC-0481) boxes are specified by name (e.g., *W69.10N*) rather than by a bounding box. A map of the box locations included in the NSIDC-0481 [User Guide](https://nsidc.org/data/nsidc-0481/versions/3).

To carry out a search, run the next cell to bring up the search panel and perform a search. Once a search has completed and the desired products are listed, proceed to the [next steps](#Spatial-Subsetting). Once the results have been processed and downloaded, a new search can be peformed to find and download additional data. 

In [5]:
# For some environments, the tool is unresponsive (i.e., search button doesn't work) - this can often be fixed by re-running this cell - seems to be fixed 08/19/2025
myUrls = grimp.cmrUrls(mode='subsetter')  # Subsetter mode is required for subsetting.
myUrls.initialSearch(product='NSIDC-0725', firstDate='2014-01-01', lastDate='2026-01-01') # Will do an intial search using these keywords
#myUrls.view()  # uncomment and comment line above to start with no initial search

## Subsetting Basics

Examples are included below for [Velocity Mosaics](#Velocity-Mosaics-(NSIDC-0725,-0727,-and-0731)), [Individual Glaciers](#Indidvidual-Glaciers-(NSIDC-0481)), and [Image Mosaics](#Image-Mosaics). Before jumping to one one of these steps, its important to describe the data formats and to provide some background on the tools used.

### NetCDF and Xarray

The subsetted data are downloaded to NetCDF ([Network Common Data Format](https://www.unidata.ucar.edu/software/netcdf/)) files, which allows storage of data with multiple layers coincident with the meta data describing these layers. For GrIMP data sets it is particularly useful because multiple maps (hundreds or more) in a time series can be stored in a single file and easy and rapid accessed. By contrast, if the data are stored in individual files, potentially hundreds of files need to be opened to access a single point in space and time.

This and subsequent notebooks make extensive use of [xarray](http://xarray.pydata.org/en/stable/why-xarray.html), which is a python library that is especially well suited to working with NetCDF files. In particular, it can read NetCDF files into well-labeled arrays along with meta-data structures. Here we also take advantage of [rioxarray](https://corteva.github.io/rioxarray/stable/), which builds on xarray to add the ability to append coordinate reference system [CRS](https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems) information. It also adds capability from the [rasterio](https://rasterio.readthedocs.io/en/latest/) library for working with geospatial imagery. These programs also utilize [dask](https://dask.org/) to perform parallel operations, which can greatly speed data access. The main focus of this notebook is on subsetting the data rather than working with the downloaded result. A subsequent notebook will provide examples of working with xarray data in greater detail. That said, a tool for visualizing the data for basic inspection is included [below](#Visualizing-the-Data).

The data sets stored as xarrays are organized as 4-D arrays indexed by **time** ([np.datetime64](https://numpy.org/doc/stable/reference/arrays.datetime.html)), **component** (e.g., 'vx', 'vy', 'image'), and **x** and **y**, which are the [EPSG 3413](https://epsg.io/3413)-projection coordinates in meters. The time range is selected as part of the search described above. The appropriate components can be selected [below](#Visualizing-the-Data). For all but NISDC-0481, the spatial subsetting is performed as the next step. 

## Spatial Subsetting

The area to be subsetted is specified as a python dictionary with minimum and maximum *x* and *y* values. Before selecting a box consider the size. For example, with no compression once the data are read, the full NISDC-0723 image data set is >1.700 TB (just for one band). The main benefit of this notebook is that a particular region of interest can be extracted without requiring that the whole data set be downloaded, greatly reducing the volume of data to be downloaded and stored. When not loading a previously created search region, there are two methods for selecting a new region of interest. Before the final subsetting, the user can review the size of the product and iterate as needed to produce a reasonably sized data set. While the region of interest can be selected, all products are downloaded at their original grid spacing.

_**Note due to the small size of the NSIDC-0481 (aka TSX) products, the entire products are downloaded with no subsetting. Thus, if NSIDC-0481 products were selected, the subsetting steps will be bypassed.**_

### Method 1: Manual Selection

The coordinates for bounding box, `bbox`, can be manually entered by modifying the cell below with the desired values. Even if not using interactive [selection](#Method-2:-Interactive-Selection), running that step displays the manually selected box coordinates on radar map of Greenland. Note by default, coordinates are rounded to the nearest kilometer. This behavior can be modified [below](#rounding_info).

In [6]:
bbox = {'minx': -243500, 'miny': -2295000, 'maxx': -149000, 'maxy': -2255000}  # Modifiy values as needed
boxPicker = grimp.boxPicker(bbox=bbox)  # Create a map for possible viewing

If a box was saved as part of a prior [download](#Subset-and-Download-Data), it can be reloaded here by uncommenting and modifying the cell below with the name of the file.

In [7]:
# boxPicker = grimp.boxPicker(boxFile='changetoboxname.yaml')

### Method 2: Interactive Selection

Run the next the tool below to select the bounding box (or modify a manually selected box), which will display a SAR image map. Depending on network speed, it could take a few seconds to a minute to load the basemap. Use the box tool in the plot menu to select a region of interest. Then proceed to the next [step](#Preload-Data-and-Select-Bands).

In [8]:
boxPicker.boxBounds()

{'minx': -244000.0, 'miny': -2295000.0, 'maxx': -149000.0, 'maxy': -2255000.0}

In [9]:
if 'boxPicker' not in locals(): # Only create if not defined above
    boxPicker = grimp.boxPicker()
boxPicker.plotMap(show=(not myUrls.checkIDs(['NSIDC-0481']) and not myUrls.checkIDs(['NSIDC-0646'])))  # Skips map if a 481 product

[########################################] | 100% Completed | 7.67 ss
[                                        ] | 0% Completed | 84.88 us

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


[########################################] | 100% Completed | 107.16 ms
[########################################] | 100% Completed | 106.73 ms
[########################################] | 100% Completed | 106.71 ms
[########################################] | 100% Completed | 106.91 ms
[########################################] | 100% Completed | 106.80 ms


Once the desired box has been selected, proceed to appropriate section for the selected product type:  [Velocity Mosaics](#Velocity-Mosaics-(NSIDC-0725,-0727,-and-0731)), [Individual Glaciers](#Indidvidual-Glaciers-(NSIDC-0481)), and [Image Mosaics](#Image-Mosaics-(NSIDC-0723)).

## Preload Data and Select Bands

The cells in this section read the cloud-optimized geotiffs ([COG](https://www.cogeo.org/)) headers and create ```nisarVelSeries``` or ```nisarImageSeries``` objects for velocity or image data, respectively. The actual data are not downloaded at this stage, but the ```xarray``` internal to each object will read the header data of each product so it can efficiently access the data during later downloads. The bands (e.g., vx, vy) can be selected at this stage.

More detail can found on working with these tools in the [workingWithGrIMPVelocityData](https://github.com/fastice/GrIMPNotebooks/blob/master/workingWithGrIMPVelocity.ipynb) and [workingWithGrIMPImageData](https://github.com/fastice/GrIMPNotebooks/blob/master/workingWithGrIMPImageData.ipynb) notebooks.

Note that the product-specific cells will only run if the appropriate data type was selected. As a result, if everything else has been configured correctly, the rest of the notebook can be executed and it will skip unecessary cells.

### Velocity Mosaics (NSIDC 0725, 0727, 0731, 0766)

If velocity mosaic products were selected, this cell will read the COG headers and create an xarray inside a `nisarVelSeries` object. The `bands` list below can be edited to select which bands will be downloaded (see [user guide](https://nsidc.org/data/nsidc-0725) for band info).

In [10]:
if myUrls.checkIDs(['NSIDC-0725', 'NSIDC-0727', 'NSIDC-0731', 'NSIDC-0766']):
    products = nisar.nisarVelSeries()
    products.readSeriesFromTiff(myUrls.getCogs(replace='vv', removeTiff=True), url=True, readSpeed=False, useErrors=True, useDT=True)
    print('Velocity Mosaic Data Selected')

Velocity Mosaic Data Selected


In [11]:
myUrls.getIDs()

array(['NSIDC-0725'], dtype='<U10')

If this block ran successfully the meta data for all relevant images to create the xarray, but the actual data themselves were not downloaded. This result is now ready to be subsetted and [downloaded](#Subset-and-Download-Data).

### Indidvidual Glaciers (NSIDC-0481 and NSIDC-0646)

If individual glacier products were selected, this cell will read the COG headers and create an xarray inside a ```nisarVelSeries``` object. The bands list below can be edited to select which bands will be downloaded (see [user guide](https://nsidc.org/data/nsidc-0481/versions/3) for band info).

In [12]:
if myUrls.checkIDs(['NSIDC-0481']) or myUrls.checkIDs(['NSIDC-0646']):
    # Edit to add or remove bands
    products = nisar.nisarVelSeries()
    products.readSeriesFromTiff(myUrls.getCogs(replace='vx', removeTiff=True), url=True, readSpeed=False, useErrors=True)
    print('Individual Glacier Data Selected')

If this block ran successfully the meta data for all relevant images to create the xarray, but the actual data themselves were not downloaded. This result is now ready to be [downloaded](#Subset-and-Download-Data).

### Image Mosaics (NSIDC-0723)

The procedure for image mosaiacs is similar to that for the velocity data. The major exceptions are that only a single band can be downloaded at a time to avoid downloading bands with different sizes and a ```nisarImageSeries``` object is used. The single band is defined through the product filter in [search panel](#Find-Data). Run the notebook again to download other bands.

In [13]:
def productType(cog):
    ''' Extract product type from file name '''
    return cog.split('/')[-1].split('_')[-3]

if myUrls.checkIDs(['NSIDC-0723']):
    products = nisar.nisarImageSeries()
    products.readSeriesFromTiff(myUrls.getCogs(removeTiff=True), url=True, chunkSize=2048)
else:
    print('No Image Mosaic data selected')

No Image Mosaic data selected


If this block ran successfully the meta data for all relevant images to create the xarray, but the actual data themselves were not downloaded. This result is now ready to be subsetted and [downloaded](#Subset-and-Download-Data).

## Subset and Download Data

Before applying the final subset, its useful to examine the size of the full data (virtual) array. If the `loadDataArray` step was sucessful, this next cell will provide details on the size and organization of the full xarray (prior to any download).

In [14]:
products.subset # Add ; at the end to suppress output

Unnamed: 0,Array,Chunk
Bytes,20.90 GiB,4.00 MiB
Shape,"(9, 6, 13700, 7585)","(1, 1, 1024, 1024)"
Dask graph,6048 chunks in 425 graph layers,6048 chunks in 425 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 20.90 GiB 4.00 MiB Shape (9, 6, 13700, 7585) (1, 1, 1024, 1024) Dask graph 6048 chunks in 425 graph layers Data type float32 numpy.ndarray",9  1  7585  13700  6,

Unnamed: 0,Array,Chunk
Bytes,20.90 GiB,4.00 MiB
Shape,"(9, 6, 13700, 7585)","(1, 1, 1024, 1024)"
Dask graph,6048 chunks in 425 graph layers,6048 chunks in 425 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Subset

For products other than those in NISDIC-0481, this next step will clip the data set to the bounding box created [above](#Spatial-Subsetting) and display the organization of the resulting subset. 

The box coordinates are rounded to the nearest km, which can be altered by changing the value of `decimals` below ([see numpy.around](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.around.html#numpy.around)). <a id='rounding_info'></a>

In [15]:
if not myUrls.checkIDs(['NSIDC-0481']) and not myUrls.checkIDs(['NSIDC-0646']):  # Anything but a 481
    products.subSetData(boxPicker.boxBounds(decimals=-3))  # -3 rounds to 1km, -2 to 100m...
else:
    print('NSIDC-0481 - so entire data set will be saved')
products.subset # Add ; at the end to suppress output

Unnamed: 0,Array,Chunk
Bytes,19.71 MiB,210.11 kiB
Shape,"(9, 6, 201, 476)","(1, 1, 113, 476)"
Dask graph,108 chunks in 426 graph layers,108 chunks in 426 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 19.71 MiB 210.11 kiB Shape (9, 6, 201, 476) (1, 1, 113, 476) Dask graph 108 chunks in 426 graph layers Data type float32 numpy.ndarray",9  1  476  201  6,

Unnamed: 0,Array,Chunk
Bytes,19.71 MiB,210.11 kiB
Shape,"(9, 6, 201, 476)","(1, 1, 113, 476)"
Dask graph,108 chunks in 426 graph layers,108 chunks in 426 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


If the subset product seems a reasonable size, execute the next statement to save the data. If not, repeat the [search](#Find-Data) to select different set of products or date range, repeat the box selection to choose a different sized [bounding box](#Spatial-Subsetting), or change the [bands](#Preload-Data-and-Select-Bands) that were selected. The next cell is used to specify the name, which will default to *subsetPath/prefix.NSIDC-0XXX.nc*. For NSIDC-0481 products, the box name (e.g., W69.1N) will be appended. This cell can be modified to override the default name as needed. In particular, the prefix can be updated to create names unique to each search. <a id='set_filename'></a>

In [16]:
prefix = 'GrIMPSubset'  # Rename as needed
# Add path defined above and append with productID (e.g., NSIDC-0723)
subsetFile = f'{subsetPath}/{prefix}.{myUrls.getIDs()[0]}{myUrls.findTSXBoxes()[0]}.nc'
if not myUrls.checkIDs(['NSIDC-0481']):  # Anything but a 481
    boxPicker.saveBox(subsetFile.replace('nc', 'yaml'))   # Save box for non 481 products
if 'NSIDC-0723' in subsetFile:
    subsetFile = subsetFile.replace('NSIDC-0723', f'NSIDC-0723.{str(products.subset.band.data[0])}')
print(subsetFile)

Subsets/GrIMPSubset.NSIDC-0725.nc


### Download and Save to NetCDF

The data are now ready for download, which will be accomplished by running the next cell. Note, this step uses *dask* to improve download speed with multiple parallel streams ('workers'). While the NISDC limit on simultaneous connections is 15, errors can cause timeouts so that the job may crash with far few workers, particularly with some of the older COG formats (all products should be upated by late Spring 2021). Too many workers may also saturate the available network bandwidth. With 2 workers, 609 W69.1N (aka Jakobshavn) products with all components (7.46 GB) downloaded sucessfully in 3.3 minutes through the University of Washington network.

In [17]:
products.loadRemote()
products.toNetCDF(subsetFile)  # Use lower number if crashes, bump up <15 to increase download speed.

[########################################] | 100% Completed | 20.67 s


'Subsets/GrIMPSubset.NSIDC-0725.nc'

## Visualizing the Data

The next cell demonstrates how to read the data back into a new series object and display inspect time series at points selected with the cursor.

In [18]:
if myUrls.checkIDs(['NSIDC-0725', 'NSIDC-0727', 'NSIDC-0731', 'NSIDC-0766', 'NSIDC-0481', 'NSIDC-0646']):
    myProduct = nisar.nisarVelSeries()
else:
    myProduct = nisar.nisarImageSeries()
#
myProduct.readFromNetCDF(subsetFile)
myProduct.subset
myProduct.inspect(imgOpts={'title': 'X'})


The next cell allows the user to select points on the map and click to product a timeseries plot of one of the components in the data set. Change the `component` and rerun to visualize other components.

If running on a remote server such as binder, uncomment, alter name if needed, and run this line to save the results in a single _zip_ file for download.

In [19]:
#!zip forDownload.zip {subsetPath}/*

This notebook is now complete. If everything ran sucessfully the data is ready for analysis in other notebooks or applications.

See the other notebooks in this repository for more examples of working with the data to produce publication ready figures.