# The basics of querying CMR
[Reference](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)

The simplest query to CMR is a collection-level search with no filter constraints:

    GET https://cmr.earthdata.nasa.gov/search/collections

We will use this query to demonstrate the basic functionality of CMR search.
What can we do with this query?

In [11]:
import requests
import xml.dom.minidom

response = requests.get("https://cmr.earthdata.nasa.gov/search/collections")

The default response is an xml document of collection result references. Let's pretty print the output.

In [12]:
# The default response (xml reference) is an xml document of result references. Let's pretty print the output
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

<?xml version="1.0" ?>
<results>
	<hits>28349</hits>
	<took>8</took>
	<references>
		<reference>
			<name>&quot;The Omnivores Dilemma&quot;: The Effect of Autumn Diet on Winter Physiology and Condition of Juvenile Antarctic Krill</name>
			<id>C1934541400-SCIOPS</id>
			<location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1934541400-SCIOPS/1</location>
			<revision-id>1</revision-id>
		</reference>
		<reference>
			<name>'Latent reserves' within the Swiss NFI</name>
			<id>C1931110427-SCIOPS</id>
			<location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1931110427-SCIOPS/5</location>
			<revision-id>5</revision-id>
		</reference>
		<reference>
			<name>(U-Th)/He ages from the Kukri Hills of southern Victoria Land</name>
			<id>C1214587974-SCIOPS</id>
			<location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1214587974-SCIOPS/4</location>
			<revision-id>4</revision-id>
		</reference>
		<reference>
			<name>0.5 hour 1 M HCl extraction data for the Windmill Islands

## Parsing the xml results
We can extract information from the results in a programatic way using xpath

How many collection results are there?

In [13]:
import xml.etree.ElementTree as et
doc = et.fromstring(response.text)

print("Total number of collection results: " + doc.findtext('hits'))   

Total number of collection results: 28349


How many collection results were returned?

In [14]:
references = doc.findall('references/reference')
print("No. of results returned: " + str(len(references)))  

No. of results returned: 10


Where is the first collection result?

In [15]:
print("First result reference: " + str(references[0].find('location').text))

First result reference: https://cmr.earthdata.nasa.gov:443/search/concepts/C1934541400-SCIOPS/1


What is the human-readable name of the first collection result?

In [16]:
print("First result name: " + str(references[0].find('name').text))

First result name: "The Omnivores Dilemma": The Effect of Autumn Diet on Winter Physiology and Condition of Juvenile Antarctic Krill


What is the unique ID (concept-id) of the first collection result?

In [17]:
print("First result concept id: " + str(references[0].find('id').text))

First result concept id: C1934541400-SCIOPS


## Going further than the first 10 results
CMR search supports paging parameters to iterate through 'pages' of results.
You can select your page size and your page number. Page size can be a positive integer between 1 and 2000. Page number can be a positive integer.
Note: for harvesting use cases we do not recommend paging through results. See: [scrolling](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#scrolling-details)

Let's try getting 20 results instead of the default value of 10.

    GET https://cmr.earthdata.nasa.gov/search/collections?page_size=20

In [18]:
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections?page_size=20")
doc = et.fromstring(response.text)
print("No. of results returned: " + str(len(references))) 
references = doc.findall('references/reference')
print("11th result concept id: " + str(references[10].find('id').text)) 

No. of results returned: 10
11th result concept id: C1214422215-SCIOPS


We have retrieved the first 20 results instead of the default 10.
Now let's try getting the next 20 results

    GET https://cmr.earthdata.nasa.gov/search/collections?page_size=20&page_num=2

In [19]:
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections?page_size=20&page_num=2")
doc = et.fromstring(response.text)
references = doc.findall('references/reference')
print("11th result concept id: " + str(references[10].find('id').text)) 

11th result concept id: C1214610584-SCIOPS


Notice that the 11th result is different because we are looking at a different page

## Getting metadata by reference
Let's get the metadata for the 11th result using the reference

In [20]:
print("11th result reference: " + str(references[10].find('location').text)) 

response = requests.get(str(references[10].find('location').text))
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

11th result reference: https://cmr.earthdata.nasa.gov:443/search/concepts/C1214610584-SCIOPS/3
<?xml version="1.0" ?>
<DIF xmlns="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v10.2.xsd">
	
            
	<Entry_ID>
		
                
		<Short_Name>PASSCAL_BROOKS2</Short_Name>
		
                
		<Version>Not provided</Version>
		
            
	</Entry_ID>
	
            
	<Entry_Title>1990 PASSCAL/USGS/GSC Brooks Range Seismic Survey</Entry_Title>
	
            
	<Dataset_Citation>
		
                
		<Dataset_Creator>J. Murphy, G.S. Fuis, W. J. Lutter, E. E. Criley - USGS; A. R. Levander &amp; S. A. Henrys - Rice University; I. Asudeh - GSC; J. C. Fowler - IRIS/PASSCAL</Dataset_Creator>
		
                
		<Dataset_Title>Data Report for the 1990 Seismic Reflection/Re

## Getting metadata in the format you want
The above is the native format of the metadata. The format the data was ingested in. CMR can give us this metadata in other formats.

We can specify our desired format by file extension or an 'Accept' header in our request.
Here is a list of populat formats. For an exhaustive list see [extensions](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#extensions)

| Format  | Extension | Accept Header                               |
|---------|-----------|---------------------------------------------|
| native  | N/A       | "application/metadata+xml"                  |
| html    | .html     | "text/html"                                 |
| json    | .umm_json | "application/vnd.nasa.cmr.umm_results+json" |
| echo10  | .echo10   | "application/echo10+xml"                    |
| iso     | .iso      | "application/iso19115+xml"                  |

Let's try using a file extension to get the collection metadata in ECHO10 format

In [21]:
response = requests.get(str(references[10].find('location').text) + ".echo10")
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

<?xml version="1.0" ?>
<Collection>
	<ShortName>PASSCAL_BROOKS2</ShortName>
	<VersionId>Not provided</VersionId>
	<InsertTime>2021-06-30T16:40:30.124Z</InsertTime>
	<LastUpdate>2021-06-30T16:40:30.124Z</LastUpdate>
	<LongName>Not provided</LongName>
	<DataSetId>1990 PASSCAL/USGS/GSC Brooks Range Seismic Survey</DataSetId>
	<Description>In the summer of 1990, the U.S. Geological Survey, Rice University and
      the Geological Survey of Canada conducted a seismic experiment across
      the Brooks Range, Alaska. The goal of the experiment was to produce a
      high-resolution image of the crust and upper mantle of the Brooks
      Range and flanking geologic provinces by combining reflection and
      refraction techniques. Five deployments of vertical sensors recorded
      63 shots at 44 different locations along a 315 km profile. The nominal
      station spacing is 100 meters. Shot sizes varied from 100 lbs to 4000
      lbs and the offsets varied from 0 to 200 km.</Description>
	<

Let's do the same thing but using a header to specify the desired format

In [22]:
headers = {'Accept': 'application/echo10+xml'}
response = requests.get(str(references[10].find('location').text), headers=headers)
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

<?xml version="1.0" ?>
<Collection>
	<ShortName>PASSCAL_BROOKS2</ShortName>
	<VersionId>Not provided</VersionId>
	<InsertTime>2021-06-30T16:40:34.032Z</InsertTime>
	<LastUpdate>2021-06-30T16:40:34.032Z</LastUpdate>
	<LongName>Not provided</LongName>
	<DataSetId>1990 PASSCAL/USGS/GSC Brooks Range Seismic Survey</DataSetId>
	<Description>In the summer of 1990, the U.S. Geological Survey, Rice University and
      the Geological Survey of Canada conducted a seismic experiment across
      the Brooks Range, Alaska. The goal of the experiment was to produce a
      high-resolution image of the crust and upper mantle of the Brooks
      Range and flanking geologic provinces by combining reflection and
      refraction techniques. Five deployments of vertical sensors recorded
      63 shots at 44 different locations along a 315 km profile. The nominal
      station spacing is 100 meters. Shot sizes varied from 100 lbs to 4000
      lbs and the offsets varied from 0 to 200 km.</Description>
	<

Let's try it in json format

In [23]:
import json
import jq

response = requests.get(str(references[10].find('location').text+ ".umm_json"))
doc = json.loads(response.text)
print(json.dumps(doc, indent=2))

{
  "CollectionCitations": [
    {
      "Creator": "J. Murphy, G.S. Fuis, W. J. Lutter, E. E. Criley - USGS; A. R. Levander & S. A. Henrys - Rice University; I. Asudeh - GSC; J. C. Fowler - IRIS/PASSCAL",
      "Title": "Data Report for the 1990 Seismic Reflection/Refraction Experiment in the Brooks Range, Arctic Alaska",
      "Publisher": "IRIS",
      "ReleaseDate": "1993-01-01T00:00:00.000Z"
    }
  ],
  "LocationKeywords": [
    {
      "Category": "GEOGRAPHIC REGION",
      "Type": "ARCTIC"
    },
    {
      "Category": "GEOGRAPHIC REGION",
      "Type": "POLAR"
    },
    {
      "Category": "SOLID EARTH",
      "Type": "CRUST"
    },
    {
      "Category": "SOLID EARTH",
      "Type": "MANTLE",
      "DetailedLocation": "UPPER MANTLE"
    },
    {
      "Category": "CONTINENT",
      "Type": "NORTH AMERICA",
      "Subregion1": "UNITED STATES OF AMERICA",
      "Subregion2": "ALASKA",
      "DetailedLocation": "BROOKS RANGE"
    }
  ],
  "MetadataDates": [
    {
      "Date"

We can parse the JSON response programmatically. What is the short name of this collection?

In [24]:
print("Short name: " + jq.compile(".ShortName").input(doc).first())

Short name: PASSCAL_BROOKS2


## Getting metadata directly
You can also get your metadata in a single request by specifiying a format in the initial search request. Rather than getting a list of metadata locations, you will get a list of metadata records. Let's try this using UMM-JSON;

    GET https://cmr.earthdata.nasa.gov/search/collections.umm_json

In [25]:
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections.umm_json")
doc = json.loads(response.text)
print(json.dumps(doc, indent=2))

{
  "hits": 28349,
  "took": 10,
  "items": [
    {
      "meta": {
        "revision-id": 1,
        "deleted": false,
        "format": "application/vnd.nasa.cmr.umm+json",
        "provider-id": "SCIOPS",
        "user-id": "sritz",
        "has-formats": false,
        "has-spatial-subsetting": false,
        "native-id": "dmmt_collection_8463",
        "has-transforms": false,
        "has-variables": false,
        "concept-id": "C1934541400-SCIOPS",
        "revision-date": "2020-09-02T13:46:02.515Z",
        "granule-count": 0,
        "has-temporal-subsetting": false,
        "concept-type": "collection"
      },
      "umm": {
        "SpatialExtent": {
          "SpatialCoverageType": "HORIZONTAL",
          "HorizontalSpatialDomain": {
            "Geometry": {
              "CoordinateSystem": "GEODETIC",
              "BoundingRectangles": [
                {
                  "NorthBoundingCoordinate": -64,
                  "WestBoundingCoordinate": -65,
               

You can see that we now have the actual metadata for each collection, rather than a reference to it.
Let's look at the first record;

In [26]:
print(print(json.dumps(jq.compile(".items[0]").input(doc).first(), indent=2)))

{
  "meta": {
    "revision-id": 1,
    "deleted": false,
    "format": "application/vnd.nasa.cmr.umm+json",
    "provider-id": "SCIOPS",
    "user-id": "sritz",
    "has-formats": false,
    "has-spatial-subsetting": false,
    "native-id": "dmmt_collection_8463",
    "has-transforms": false,
    "has-variables": false,
    "concept-id": "C1934541400-SCIOPS",
    "revision-date": "2020-09-02T13:46:02.515Z",
    "granule-count": 0,
    "has-temporal-subsetting": false,
    "concept-type": "collection"
  },
  "umm": {
    "SpatialExtent": {
      "SpatialCoverageType": "HORIZONTAL",
      "HorizontalSpatialDomain": {
        "Geometry": {
          "CoordinateSystem": "GEODETIC",
          "BoundingRectangles": [
            {
              "NorthBoundingCoordinate": -64,
              "WestBoundingCoordinate": -65,
              "EastBoundingCoordinate": -62,
              "SouthBoundingCoordinate": -65
            }
          ]
        }
      },
      "GranuleSpatialRepresentation": 

## Filtering metadata by search constraints

You can filter your results using HTTP query parameters. Please see **this** notebook for filtering your collection search.