# The basics of querying CMR
[Reference](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)

The simplest query to CMR is a collection-level search with no filter constraints:

    GET https://cmr.earthdata.nasa.gov/search/collections

We will use this query to demonstrate the basic functionality of CMR search.
What can we do with this query?

In [None]:
import requests
import xml.dom.minidom

# The simplest search request possible
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections")

The default response is an xml document of collection result references. Let's pretty print the output.

In [None]:
# The default response is an xml document of result references. Let's pretty print the output
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

## Parsing the xml results
We can extract information from the results in a programatic way using xpath

How many collection results are there?

In [None]:
import xml.etree.ElementTree as et
doc = et.fromstring(response.text)

print("Total number of collection results: " + doc.findtext('hits'))   


How many collection results were returned?

In [None]:
references = doc.findall('references/reference')
print("No. of results returned: " + str(len(references)))  


Where is the first collection result?

In [None]:
print("First result reference: " + str(references[0].find('location').text))

What is the human-readable name of the first collection result?

In [None]:
print("First result name: " + str(references[0].find('name').text))

What is the unique ID (concept-id) of the first collection result?

In [None]:
print("First result concept id: " + str(references[0].find('id').text))

## Going further than the first 10 results
CMR search supports paging parameters to iterate through 'pages' of results.
You can select your page size and your page number. Page size can be a positive integer between 1 and 2000. Page number can be a positive integer.
Note: for harvesting use cases we do not recommend paging through results. See: [scrolling](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#scrolling-details)

Let's try getting 20 results instead of the default value of 10.

    GET https://cmr.earthdata.nasa.gov/search/collections?page_size=20

In [None]:
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections?page_size=20")
doc = et.fromstring(response.text)
print("No. of results returned: " + str(len(references))) 
references = doc.findall('references/reference')
print("11th result concept id: " + str(references[10].find('id').text)) 

We have retrieved the first 20 results instead of the default 10.
Now let's try getting the next 20 results

    GET https://cmr.earthdata.nasa.gov/search/collections?page_size=20&page_num=2

In [None]:
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections?page_size=20&page_num=2")
doc = et.fromstring(response.text)
references = doc.findall('references/reference')
print("11th result concept id: " + str(references[10].find('id').text)) 

Notice that the 11th result is different because we are looking at a different page

## Getting metadata by reference
Let's get the metadata for the 11th result using the reference

In [None]:
print("11th result reference: " + str(references[10].find('location').text)) 

response = requests.get(str(references[10].find('location').text))
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

## Getting metadata in the format you want
The above is the native format of the metadata. The format the data was ingested in. CMR can give us this metadata in other formats.

We can specify our desired format by file extension or an 'Accept' header in our request.
Here is a list of populat formats. For an exhaustive list see [extensions](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#extensions)

| Format  | Extension | Accept Header                               |
|---------|-----------|---------------------------------------------|
| native  | N/A       | "application/metadata+xml"                  |
| html    | .html     | "text/html"                                 |
| json    | .umm_json | "application/vnd.nasa.cmr.umm_results+json" |
| echo10  | .echo10   | "application/echo10+xml"                    | 

Let's try using a file extension to get the collection metadata in ECHO10 format

In [None]:
response = requests.get(str(references[10].find('location').text) + ".echo10")
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

Let's do the same thing but using a header to specify the desired format

In [None]:
headers = {'Accept': 'application/echo10+xml'}
response = requests.get(str(references[10].find('location').text), headers=headers)
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

Let's try it in json format

In [None]:
import json
import jq

response = requests.get(str(references[10].find('location').text+ ".umm_json"))
doc = json.loads(response.text)
print(json.dumps(doc, indent=2))

We can parse the JSON response programmatically. What is the short name of this collection?

In [None]:
print("Short name: " + jq.compile(".ShortName").input(doc).first())

## Filtering metadata by search constraints

You can filter your results using HTTP query parameters.