# CMR Python Library
The CMR Python library is a set of tools designed to make using CMR in python easier. Using the
library will relive the calling code from the need to manage HTTP headers, HTTP calls, or deal with
page results. With the library, a calling function simply feeds a map of parameters and gets back
a list of records.

# Using the Juypter Notebooks with Visual Studio Code

Visual Studio Code has built in support for Juypter Notebooks. This demonstration will show how to get
started with the CMR Python library to start making queries against CMR, all under Visual Studio Code

1. Download Visual Studio from https://visualstudio.microsoft.com
1. Launch Visual Studio Code and open a new window.
1. Either pick "Get Started with Python development" from the "Get Started" screen, or select
`Select Interpreter to Start Juypter Server`. This action will start the process of loading the
needed software to host a notebook.
    * If you have multiple versions of Python, you will be asked to pick a version. Pick one you
      intend to use and install libraries into. The next step will require you to use the same
      version picked on this step.
    * Call the following in the bottom Terminal pane if you have one or in a shell:

`pip3 install https://github.com/nasa/eo-metadata-tools/releases/download/latest-master/eo_metadata_tools_cmr-0.0.1-py3-none-any.whl`

This is all that is needed to start using the CMR Python library.

When you return to VSC, or want to create a new notebook, you do not need to go through all these
steps, just create a new file as normal and make sure you save the file with the `.ipynb` extention.

# Collection Search
Collections and granule APIs are devided into modules. To get started with collections, import the
collection module:

In [None]:
from cmr.search import collection


## Basic Search
Here is a basic example of a CMR query using the library.
1. Import the collection search API, then pass it a dictionary of CMR parameters and values.
    * Anything in the [Collection Search by Parameter](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#collection-search-by-parameters) section of the CMR API can be used here.
1. A list is returned

In [None]:

# get records that mention fish
result = collection.search({'keyword':'fish'})

print(f"Record Count: {str(len(result))}")

## Search Options
Search API calls take parameters to help filter down results or to choose which envirnment is to be used.
* `limit` set the max many records returned (from 1 to 2000)
* `config` is used for making internal settings to the API. The most common configuration is to set the CMR envirnment
    * default is `production`
    * uat or "User Acceptance Testing" can also be selected for those who participate in CMR tests.

In [None]:
# search for a few fishy UAT records
result = collection.search({'keyword':'fish'}, limit=2, config={'env':'uat'})
print(f"Record Count: {str(len(result))}")

## Results
Results are returned as a list of maps. To illustrate the structuer, the results from above will be
dumped out showing the name of the keys in the map.

In [None]:
def key_tree(node, depth=0):
    # recursivly walk a tree and print out the names of each node, indent them
    if isinstance(node, type(list)):
        node = node[0] # grab the first node of a list
    if isinstance(node, dict):
        for key, item in node.items():
            print("\t"*depth, key)
            key_tree(item, depth+1)
key_tree(result)

## Filtering Results

Now the same results will be filtered down by th esearch function using one of the many built in fulters. In this case the meta filter which only returns the meta tag from the results:

In [None]:
result = collection.search({'keyword':'fish'}, filters=[collection.umm_fields,
    collection.drop_fields('CollectionCitations'),
    collection.drop_fields('SpatialExtent'),
    collection.drop_fields('ScienceKeywords'),
    collection.drop_fields('RelatedUrls'),
    collection.drop_fields('DataCenters'),
    collection.drop_fields('ContactGroups'),
    collection.drop_fields('ContactPersons'),
    collection.drop_fields('MetadataSpecification'),
    collection.drop_fields('TemporalExtents'),
    collection.drop_fields('Abstract'),
    collection.drop_fields('Purpose'),
    collection.drop_fields('UseConstraints')],
    limit=1, config={'env':'uat'})
print(result[0].keys())
print(f"Record Count: {result}")

# Granule Searches
Granule searches are much like collection searches. Just import the library:

In [None]:
from cmr.search import granule

### Basic Granule search
Granule searches require at least one search criteria such as a collection concept id, or provider
name:

In [None]:

# basic granule search bound by a provider
result = granule.search({'provider':'ORNL_DAAC'})

print(result[0])

### Find Granules with Collections
There is also a method that allows you to use collection searches to find granules. For each
collection found, a sample (the first few, in an undefined order) of granules is selected and
returned.

In [None]:
# find collections about water and grab a sample of granules from those collections
result = granule.sample_by_collections({'keyword':'water'})
print(len(result))

### Filtering granule results
Also like collections, granule result columns can be filtered down:
* use the `filters` parameter to drop columns
* use the `limits` parameter to set how many collections should be used for sampling granules

In [None]:
# limit to 2 granules per collection and 5 collections in all
result3 = granule.sample_by_collections({'keyword':'water'}, filters=[granule.meta_fields], limits=[2,5])

print(len(result3))

# Finding providers
For people doing curration on metadata or who want to look at just single providers, an API exists
to pull the provider list out of CMR. This CMR call is not documented everywhere and is one of the
reasons the CMR Python library exists, to make CMR easier to use:

In [None]:
from cmr.search import providers

# list all providers and pull out their ids into one list
result = providers.search()
list = []
for prov in result:
    list.append(prov.get('provider-id'))
print(list)

print("The last provider is printed out here as an example of what all is returned.")
print(prov)

## Query for providers
There is a `search_by_id()` which allows you to filter providers by Regular Expression:

In [None]:
providers.search_by_id('.*GHRC.*')

# Feedback

* Code - https://github.com/nasa/eo-metadata-tools
* Slack - #umm-powerhouse
* Techincal Questions - thomas.a.cherry@nasa.gov
* Suggestions - erich.e.reiter@nasa.gov
* NASA rep - valerie.dixon@nasa.gov