# Create a QGIS Project that Remotely Accesses GrIMP Products at NSIDC

## Notebook Purpose

This noteboook allows users to search for and select [GRiMP](https://nsidc.org/data/measures/grimp) products in the [NSIDC](https://nsidc.org/) archive to build a [*QGIS*](https://qgis.org/en/site/) project file that allows browsing of the remote data at NSIDC. Once the appropriate steps for authentication have been executed (see [**NSIDCLoginNotebook**](https://github.com/fastice/GRiMPNotebooks/blob/master/NSIDCLoginNotebook.ipynb)), the resulting [*QGIS*](https://qgis.org/en/site/) project can be opened to display products located remotely at NSIDC. The GrIMP products are stored as Cloud Optimized Geotiffs ([COGs](https://www.cogeo.org/)), which allows relatively fast display even over slow internet connections. Remote access through QGIS is often more convenient  than downloading the full data sets, some of which exceed 1TB. (See also [**GrIMPSubsetterNotebook**](https://github.com/fastice/GrIMPNotebooks/blob/master/GrIMPSubsetterNotebook.ipynb) for working with subsets of the data.)

Running this notebook allows a user to:
* Search the NSIDC archive for GrIMP products,
* Create a QGIS project from the search results that allows viewing the data remotel in QGIS,
* Have some control over how the data are organized in the project (e.g., by year), and
* Save the results as layer definition files so they can be imported into pre-existing QGIS projects.

Note, it is best to restart the kernel each time this notebook runs, otherwise QGIS can cause the kernel to crash.

## Setup

The following packages are needed to execute this notebook. The notebook has been tested with the `environment.yml` in the *binder* folder of this repository. Thus, for best results, create a new conda environment to run this and other other GrIMP notebooks from this repository. 

`conda env create -f binder/environment.yml`

`conda activate greenlandMapping`

`python -m ipykernel install --user --name=greenlandMapping`

`jupyter lab`

See [NSIDCLoginNotebook](https://github.com/fastice/GrIMPNotebooks/blob/master/NSIDCLoginNotebook.ipynb) for additional information.

The notebooks can be run on a temporary virtial instance (to start click [**binder**](https://mybinder.org/v2/gh/fastice/GrIMPNotebooks/HEAD?urlpath=lab)). See the github [README](https://github.com/fastice/GrIMPNotebooks#readme) for further details.

**Note be sure to install the conda version of QGIS even when running _QGIS_ from a standalone installation**. In principle, the _QGIS_ API should be accessible from the installed application, but it often hard to find so its often easier to install seperately with conda.

In [None]:
import os
import sys
# This path may need altering, or the entire command may be unnecessary.
print(f'{os.environ["CONDA_PREFIX"]}/share/qgis/python')
sys.path.append(f'{os.environ["CONDA_PREFIX"]}/share/qgis/python')
import qgis.core as qc
import qgis.gui as qg
import dask
from dask.diagnostics import ProgressBar
ProgressBar().register()
dask.config.set(num_workers=2)  # Avoid problems with too many open connections at NSIDC
import panel as pn
pn.extension()

These [grimpfunc](https://github.com/fastice/grimpfunc) and [grimpqgis](https://github.com/fastice/grimpqgis) packages will have been automatically installed if the conda installation instructions were followed. They can also be installed directly with pip (if the pip install fails, try upgrading pip should fix the problem). 

In [None]:
import grimpfunc as grimp
import grimpqgis as grimpq

## Help

**Note to get help and see options for any of the GrIMP or other functions while the cursor is positioned inside a method's parentheses, click shift+Tab.**

## Login

This procedure will authenticate to NSIDC (see [**NSIDCLoginNotebook**](https://github.com/fastice/GrIMPNotebooks/blob/master/NSIDCLoginNotebook.ipynb)).

In [None]:
env = dict(GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/.grimp_download_cookiejar.txt'),
            GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/.grimp_download_cookiejar.txt'))
os.environ.update(env)
myLogin = grimp.NASALogin()
myLogin.view()

## Search for Data

Running the cell below will pull up a panel that allows users to search for products at NISDC. 

* Products can be filtered by date range and band groups (e.g., velocity or velocity with errors). 
* Selecting a new product will append the result to the prior search. 
* Select a product family by pressing the appropriate button (e.g., *NSIDC-0723*), set the filter paramers and date range (default is all dates). 
* Then press **Search** to execute the search. The search results will be pre-pended to the result from any prior search. 
* To start over, press **Clear** to remove the entire search result. The search results are saved in `myUrls` for later use. 

In [None]:
# For some environments the tool is unresponsive (i.e., search button doesn't work) - this can often be fixed by re-running this cell
myUrls = grimp.cmrUrls()
myUrls.view()

Once the search is complete, advance to the next [step](#Setup-and-Start-a-Project). If everything has been setup correctly, the user can run the rest of the notebook (**Run->Run Selected Cell and All Below**).

## Product Hierarchy for QGIS

Products are saved in a QGIS project as groups and layers with the following hierarchy.
* **Category1** (optional, e.g., Velocity or Images) 
    * **Product Family1** (e.g., annualVelocity, NSIDC0725)
        * **Year** (optional)
            * **ProductPrefix-YYYY-mm-dd.YYYY-mm-dd (e.g., Vel-2020-01-01.Vel-2020-01-06)**
                * **band1** (e.g., browse, vx, vy)
                * **band2**
                * **...**
    * **Product Family 2**
        * **ProductPrefix-band-YYYY-mm-dd.YYY-mm-dd** (example with only one band, e.g., sigma0)
        * **ProductPrefix-band-YYYY-mm-dd.YYY-mm-dd**
        **...**
* **Category2**
    * **...**
        
A `productFamily` represents collection of products of the same type such as annual velocity (NSIDC-0725) mosaics or SAR image mosaics (NSIDC-0723) that that the user can group together in QGIS a project. Different product families can be established for the same product line. For example, the SAR image mosaics could be organized as two product families with one called **images** for the image data and another product family called **gamma0** for the $\gamma_o$ products. Several product families may be grouped under a `Category` such as **Velocity** or **Images**. 

Each product can have a single (e.g,. *vv*) or multiple bands (*vv*, *vx*, and *vy*). For a single-band product family, the results are saved as `ProductPrefix`-`band`**-YYYY-mm-dd.YYY-mm-dd**. If the product has only a single date, the second date string is ommitted. Multi-band products are grouped together with the name `ProductPrefix`**-YYYY-mm-dd.YYYY-mm-dd**. In this case, each **band** is saved under its band name (e.g., *vx* and *vy*).

The `Category`, `productFamily`, and `productPrefix` are completely user selectable. The `band` names are those used in the product names, but the user can choose which bands are included.

By default the products under a product family are grouped by year. This feature can be turned off with `byYear=False`. 

The products for each family should all be of the same file *type*. Currently supported file types are *tif* and *shp*.

An archive search as carried out above returns a list of urls stored in `myUrls`, which may contain data for multiple product families. These products can be separated appropriately using the the parameters `productFilePrefix` (e.g., *GL_vel_mosaic_Annual* and *GL_S1bks*) and `bands` (e.g., *vx*, *vy*, and *image*) as demonstrated below.      

## Setup and Start a Project

The first step in setting up a QGIS project is to create an instance of `Qgis

ProjectSetup` class as follows, which is used to define how products are organized within the QGIS project.

In [None]:
myProjectSetup = grimpq.QgisGrimpProjectSetup()

Examples for [terminus positions](#Terminus-Positions) (NSIDC-0642), individual glacier [TerraSAR-X](#Individual-Glacier-Velocities-(TSX)) (NSIDC-0481), [velocity mosaics](#Velocity-Mosaics) (NSIDC-0725, NSIDC-0727, NSIDC-0731), and [SAR mosaics](#SAR-Image-Mosaics) (NSIDC-0723), are included below. If a search yields results for a particular product family, the results will be included. If not, the QGIS project may contain the Category/productFamily with no data under it. Such 'blanks' can be avoided by tweaking the examples below to ensure only the desired product types are included.

In the next step the path to where the QGIS project will be saved is specified (if not modified, this example will save projects *qgisProjects* in the directory above the one where this notebook is being run. If *qgisProjects* does not exist, it will be created. The full project path also should be specified with no extension (e.g. *qgisProject* as in this example).

In [None]:
qgisPath = 'qgisProjects'  # Modify as needed
QgisProjectFileName = f'{qgisPath}/qgisProject'
if not os.path.exists(qgisPath):
    os.mkdir(qgisPath)  # Will fail if the directories above don't exist
print(f'Project will be saved as: {QgisProjectFileName}')

Further customization can be performed by modifying the examples below. Note the order of the products in the QGIS legend is determined by the order they are added below (for python $\geq$ 3.7). So in the examples below, the terminus positions are included first so they plot on top of the velocity and image data. The smaller TSX scenes are included next so they are not obscured by the larger full-Greenland mosaics.

### Terminus Positions

After creating `myProjectSetup`, product families need to be defined. The GrIMP [glacier termini products](https://nsidc.org/data/nsidc-0642/versions/1) are delivered as shape files (*shp*), and they will only be included if explicitly searched for [above](#Search-for-Data). They are added as follows:

In [None]:
if myUrls.checkIDs(['NSIDC-0642']):
    myProjectSetup.addProductFamilies('Termini',
                                      productFilePrefix='termini',
                                      bands=['termini'],
                                      fileType='shp',
                                      byYear=False,
                                      productPrefix='Greenland')
    print('Including Terminus Products')

There is only one terminus product for each year, so it doesn't really help to organize them by year (`byYear=False`). They will be organized as a product family `'Termini'`. The program will extract them from the list of returned urls using `productFilePrefix='termini'`. For this product, there is only one band (`'termini'`) with `fileType='shp'`. No category defined for this product since there is only one productFamily that would be filed under it.

### Individual Glacier Velocities (TSX)

The following cells will setup a product family for the [Selected Glacier Site Velocity Maps from Insar](https://nsidc.org/data/nsidc-0481/) (NSIDC-0481). The default display options are:

In [None]:
displayOptions = myProjectSetup.defaultDisplayOptions()  # Create a copy of the default options
for key, value in displayOptions.items():
    print(f'{key}: {value}' )

As an example, the options in this dictionary can then be modified as follows:

In [None]:
displayOptions['vv']['colorTable'] = 'Inferno'  # Change color for speed
displayOptions['vv']['maxV'] = 4000  # Change max value at which speed is clipped

The customized options are then passed back along with the other options.

In [None]:
tsxProperties = {'category': 'TSX', 'productPrefix': 'TSX',
                 'bands': ['vv'], 'byYear': True, 'fileType': 'tif',
                 'displayOptions': displayOptions}  # Add the display options

The TSX products consist of many regional boxes as opposed to full Greenland mosaics like the products decribed above. Thus, it would be nice to organize them each under their respective 'box' (e.g., E61.10N - see Figure on Page 4 of the [User Guide](https://nsidc.org/data/nsidc-0481)). The names of the boxes found by the search can be recovered using `myUrls.findTSXBoxes()`.

In [None]:
if myUrls.checkIDs(['NSIDC-0481']):
    for productFamilyName in myUrls.findTSXBoxes():  # Note findTSXboxes can be replaced with a list of boxes (e.g., ['W72.90N',...])
        myProjectSetup.addProductFamilies(productFamilyName,
                                          productFilePrefix=f'TSX_{productFamilyName}',
                                          category='TSX',
                                          productPrefix='TSX',
                                          bands=['vv'],
                                          byYear=True,
                                          fileType='tif',
                                          displayOptions=displayOptions)
    print('Added TSX products')

In this example, the code cycles through all of the TSX box names returned by the search (`myUrls.findTSXBoxes()`) to create a separate `productFamily` for each box, which are all then organized under the category `'TSX'`. The command `myUrls.findTSXBoxes()` could be replaced with an explicit list of box names if desired. For this example, only one `band` (`'vv'` aka speed) was included, but the code can easily be modified to include others (e.g., `'vx'`, `'vy'`, `'ex'`, and `'ey'`). Because there could be several hundred products for a given box, this example opts to organize the results by year (`byYear=True`).

### Velocity Mosaics

The following cell will setup product families for the [annual](https://nsidc.org/data/nsidc-0725), [quarterly](https://nsidc.org/data/nsidc-0727), and [monthly](https://nsidc.org/data/nsidc-0731) velocity products. Note this step adds an additional level of filtering through `bands`. If a band was selected in the search above (e.g., `vx`) it will not be included unless its explicitly included in with the bands keyword (e.g., `bands=['vx']`). In this example, the *browse*, *vv*, *vx*, and *vy* products are all included by default. The default for the *Monthly* product is overridded to drop the *vx* and *vy* products. Note this example takes advantage of python's ability to specify keywords via a dictionary.

In [None]:
 # Parameters that apply to all velocity products initially (can be overridden later as is done for the Monthly product)
velProperties = {'category': 'Velocity', 'productPrefix': 'Vel',
                 'bands': ['browse', 'vv', 'vx', 'vy'], 'byYear': True, 'fileType': 'tif'}
if myUrls.checkIDs(['NSIDC-0725', 'NSIDC-0727', 'NSIDC-0731', 'NSIDC-0766']): # Modify above to include or exclude this step
    # Create a product family for each mosaic type
    for productFamilyName in ['Annual', 'Quarterly', 'Monthly', 's1cycle']:
        myProjectSetup.addProductFamilies(productFamilyName,
                                          productFilePrefix=f'GL_vel_mosaic_{productFamilyName}',
                                          **velProperties)
    # Modify the bands included for the monthly products
    myProjectSetup.productFamilies['Monthly']['bands'] = ['browse', 'vv']
    print('Including Velocity Products')

### SAR Image Mosaics

This example sets up a product family for [image mosaic](https://nsidc.org/data/nsidc-0723) products (NSIDC-0723). The code can be edited to change which bands are selected (*image*, *sigma0*, and *gamma0*). As originally written, all bands (*image*, *sigma0* and *gamma0*) are included. If only a subset of these bands was included in the search its best to edit out the ones not being used to avoid blank headings.

In [None]:
productFamilyName = 'Image Mosaics'
if myUrls.checkIDs(['NSIDC-0723']):
    myProjectSetup.addProductFamilies(productFamilyName, category='Imagery',
                                      productPrefix='SAR',
                                      productFilePrefix='GL_S1bks',
                                      bands=['image', 'gamma0', 'sigma0'],
                                      byYear=True,
                                      fileType='.tif')
    print('Including image products')

In the QGIS project, the image products will be organized under the Category **Imagery**, the Product Family **Image Mosaics** and the year of aquisition. Each product will be named **SAR-band-YYYY-mm-dd.YYYY-mm-dd**.

### Adding the Data

In the search process above, a set of urls (links) to the data were collected and stored in `myUrls`. The steps immediately above defined how these products should be organized in the *QGIS* project. The next step is to link the products pointed to by the *urls* to the configuration defined by `myProjectSetup`. This step is performed separately for *tif* (aka cog) and *shp* products as follows:

In [None]:
if myUrls.checkIDs(['NSIDC-481', 'NSIDC-0723', 'NSIDC-0725', 'NSIDC-0727', 'NSIDC-0731', 'NSIDC-0766']): # Only do if relevant products are included
    myProjectSetup.getProductFamilies(urls=myUrls.getCogs())  # Tif products requested so add cogs
if myUrls.checkIDs(['NSIDC-0642']):
    myProjectSetup.getProductFamilies(urls=myUrls.getShapes())  # Add the shps if requested

Based on the product family definitions above, the step above will organize each url-linked product under the appropriate heading in the project. Once this step has run, the QGIS API can be use to build the project.

## Build the QGIS Project

**Note at least on some systems, the QGIS API does not exit gracefully. As a result, once the steps below are run, it is best to restart the kernel before running the notebook again.**

Note some conda qgis installations do not set up the paths for the style directories properly. Lacking knowledge good knowledge of their location, we have included the appropriate files in the `share` directory of this notebook's repository. The line in the next cell `qc.QgsApplication.setPrefixPath('.', True)`, will cause *QGIS* to use `./share/.../`. This directory should either accompany the notebook or the path should be updated appropriately. It can be commented out of this paths are 

The first step is start up the stand-alone version of QGIS core routines as: 

In [None]:
qc.QgsApplication.setPrefixPath('.', True)
qgs = qc.QgsApplication([], False)
qgs.initQgis()

Then create a gimp.QgisProject object, which will take the setup information from myProjectSetup to create myProject, which will use it to automatically generate a QGIS project. **This step could half an hour or more if there are several hundred products.**

In [None]:
# This may produce read-only mode errors but it will still work
myProject = grimpq.QgisGrimpProject(myProjectSetup)

Now that the project exists, save the individual [layer definition](#Layer-Definition-Files) files (comment out if not needed).

In [None]:
myProject.saveLayerDefinitions(QgisProjectFileName, saveCategories=False)  # Save by product (e.g, annual, quarterly ...)
myProject.saveLayerDefinitions(QgisProjectFileName, saveCategories=True)  # Save by product type (e.g., velocity, image...)

In [None]:
myProject.saveProject(QgisProjectFileName)
qgs.exitQgis()
qgs.exit()

This completes the generation of a project file, which can be opened with QGIS (see the following sections for help).

## Performance Notes For Viewing Remote Data with QGIS

Browsing a single remote product at time can be remarkably fast even with a home connection due the efficient access possible with the [COG](https://www.cogeo.org/) format. But remote access does have its drawbacks. In particular, the time it takes QGIS to load when it starts up scales with the number of products since the program must verify each product by reading its header information. For 300 products this can take anywhere from 45 seconds to more than 5 minutes, depending on network speed. As a result, its not generally a good idea to include too many products in a project. For a partial mitigation of this problem, see working with layer definition files below. To debug a problem opening QGIS files, create a project with only a few files so that it will time out more quickly if there is an authentication problem ([**NSIDCLoginNotebook**](https://github.com/fastice/GrIMPNotebooks/blob/master/NSIDCLoginNotebook.ipynb)). See also these pre-built [projects](https://github.com/fastice/GrIMPQGISProjects).

Due to network limitations, problems can occur when too many images are selected as visible (checked) at once so that QGIS tries to render them all simultaneously. This situation can cause network errors, especially if the number of NDIDC connections (15) is exceeded. To avoid such problems, the automatically generated *QGIS* projects have most of the layers unchecked at first. As best practice, avoid **Check All and Its Children** in *QGIS* unless there only a handful of products. Since generally only one or a few products can be viewed at once, navigate to the product of interest, check it to view it, then uncheck when moving to a different product (a few images can viewed simulaneously - for example to flicker pairs of images).

**When network errors occur, an image may not render properly. If this occurs, uncheck excess images, then zoom in and out, which will trigger a reload that usually will re-render the image correctly (the error flag next to the product may persist even after the image loads correctly).**

Assuming there are no authentication errors, then network errors usually are caused by the number of network accesses exceeding the number that NSIDC allows.  
**In QGIS this problem can largely be eliminated by setting** `Preferences->Rendering->Max Cores to Use` **to <=2.**

### QGIS Troubleshooting

The setup procedure has been verified to work with *QGIS* V3.16 and above. It did not work with at least one instance of V3.10. 

## Layer Definition Files

[Layer definition](https://getspatial.com/gisblog/tip-of-the-day-create-layer-definition-files-for-reuse-and-consistency/) files allow the individual Categories or ProductFamilies included in the automatically generated QGIS product to be imported to pre-existing projects. Or if a project takes too long to load because it has too many layers, layer definition files allow collections of layers to be saved (in *QGIS* right click on the group and select **Export**) then removed (right click on the group and select **Remove Group**) to speed things up. The layer definition file can then be used to re-import the results later as needed (in QGIS **Layers->Add From Layer Definition File**). In the above example, all of the **Categories** are saved *.qlr* files.

## Final Notes

If all went well, running this produced a working QGIS project. In order for it to work, authentication via a valid [*Earth Data Login*](https://urs.earthdata.nasa.gov/) is needed to access the data at *NSIDC*. The necessary authentication files can be setup by following the procedures in the [**NSIDCLoginNotebook**](https://github.com/fastice/GrIMPNotebooks).