# The basics of querying CMR
The simplest query to CMR is a collection-level search with no filter constraints:

    GET https://cmr.earthdata.nasa.gov/search/collections

We will use this query to demonstrate the basic functionality of CMR search.
What can we do with this query?

In [9]:
import requests
import xml.dom.minidom

# The simplest search request possible
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections")

The default response is an xml document of collection result references. Let's pretty print the output.

In [10]:
# The default response is an xml document of result references. Let's pretty print the output
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

<?xml version="1.0" ?>
<results>
	<hits>28348</hits>
	<took>9</took>
	<references>
		<reference>
			<name>&quot;The Omnivores Dilemma&quot;: The Effect of Autumn Diet on Winter Physiology and Condition of Juvenile Antarctic Krill</name>
			<id>C1934541400-SCIOPS</id>
			<location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1934541400-SCIOPS/1</location>
			<revision-id>1</revision-id>
		</reference>
		<reference>
			<name>'Latent reserves' within the Swiss NFI</name>
			<id>C1931110427-SCIOPS</id>
			<location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1931110427-SCIOPS/5</location>
			<revision-id>5</revision-id>
		</reference>
		<reference>
			<name>(U-Th)/He ages from the Kukri Hills of southern Victoria Land</name>
			<id>C1214587974-SCIOPS</id>
			<location>https://cmr.earthdata.nasa.gov:443/search/concepts/C1214587974-SCIOPS/4</location>
			<revision-id>4</revision-id>
		</reference>
		<reference>
			<name>0.5 hour 1 M HCl extraction data for the Windmill Islands

## Parsing the xml results
We can extract information from the results in a programatic way using xpath

How many collection results are there?

In [22]:
import xml.etree.ElementTree as et
doc = et.fromstring(response.text)
# Hits
print("Total number of collection results: " + doc.findtext('hits'))   


Total number of collection results: 28348


How many collection results were returned?

In [23]:
# No. of results returned
references = doc.findall('references/reference')
print("No. of results returned: " + str(len(references)))  


No. of results returned: 20


Where is the first collection result?

In [24]:
# First reference
print("First result reference: " + str(references[0].find('location').text))

First result reference: https://cmr.earthdata.nasa.gov:443/search/concepts/C1214610401-SCIOPS/3


What is the human-readable name of the first collection result?

In [25]:
# First name
print("First result name: " + str(references[0].find('name').text))

First result name: 1982 Commodity Output by State and Input-Output Sector


What is the unique ID (concept-id) of the first collection result?

In [26]:
print("First result concept id: " + str(references[0].find('id').text))

First result concept id: C1214610401-SCIOPS


## Going further than the first 10 results
CMR search supports paging parameters to iterate through 'pages' of results.
You can select your page size and your page number. Page size can be a positive integer between 1 and 2000. Page number can be a positive integer.
Note: for harvesting use cases we do not recommend paging through results. See: foo

Let's try getting 20 results instead of the default value of 10.

    GET https://cmr.earthdata.nasa.gov/search/collections?page_size=20

In [27]:
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections?page_size=20")
doc = et.fromstring(response.text)
print("No. of results returned: " + str(len(references))) 
references = doc.findall('references/reference')
print("11th result concept id: " + str(references[10].find('id').text)) 

No. of results returned: 20
11th result concept id: C1214422215-SCIOPS


We have retrieved the first 20 results instead of the default 10.
Now let's try getting the next 20 results

    GET https://cmr.earthdata.nasa.gov/search/collections?page_size=20&page_num=2

In [28]:
response = requests.get("https://cmr.earthdata.nasa.gov/search/collections?page_size=20&page_num=2")
doc = et.fromstring(response.text)
references = doc.findall('references/reference')
print("11th result concept id: " + str(references[10].find('id').text)) 

11th result concept id: C1214610584-SCIOPS


Notice that the 11th result is different because we are looking at a different page

## Getting metadata by reference
Let's get the metadata for the 11th result using the reference

In [35]:
print("11th result reference: " + str(references[10].find('location').text)) 

response = requests.get(str(references[10].find('location').text))
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

11th result reference: https://cmr.earthdata.nasa.gov:443/search/concepts/C1214610584-SCIOPS/3
<?xml version="1.0" ?>
<DIF xmlns="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v10.2.xsd">
	
            
	<Entry_ID>
		
                
		<Short_Name>PASSCAL_BROOKS2</Short_Name>
		
                
		<Version>Not provided</Version>
		
            
	</Entry_ID>
	
            
	<Entry_Title>1990 PASSCAL/USGS/GSC Brooks Range Seismic Survey</Entry_Title>
	
            
	<Dataset_Citation>
		
                
		<Dataset_Creator>J. Murphy, G.S. Fuis, W. J. Lutter, E. E. Criley - USGS; A. R. Levander &amp; S. A. Henrys - Rice University; I. Asudeh - GSC; J. C. Fowler - IRIS/PASSCAL</Dataset_Creator>
		
                
		<Dataset_Title>Data Report for the 1990 Seismic Reflection/Re

The above is the native format of the metadata. The format the data was ingested in. CMR can give us this metadata in other formats.


We can specify our desired format by file extension or an 'Accept' header in our request.

| Format  | Extension | Accept Header              |
|---------|-----------|----------------------------|
| native  | N/A       | "application/metadata+xml" |
| html    | .html     | "text/html"                |
| json    | .json     | "application/json"         |
| echo10  | .echo10"  | "application/echo10+xml"   | 

Let's try using a file extension to get the collection metadata in ECHO10 format

In [36]:
response = requests.get(str(references[10].find('location').text) + ".echo10")
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

<?xml version="1.0" ?>
<Collection>
	<ShortName>PASSCAL_BROOKS2</ShortName>
	<VersionId>Not provided</VersionId>
	<InsertTime>2021-06-29T21:31:05.054Z</InsertTime>
	<LastUpdate>2021-06-29T21:31:05.054Z</LastUpdate>
	<LongName>Not provided</LongName>
	<DataSetId>1990 PASSCAL/USGS/GSC Brooks Range Seismic Survey</DataSetId>
	<Description>In the summer of 1990, the U.S. Geological Survey, Rice University and
      the Geological Survey of Canada conducted a seismic experiment across
      the Brooks Range, Alaska. The goal of the experiment was to produce a
      high-resolution image of the crust and upper mantle of the Brooks
      Range and flanking geologic provinces by combining reflection and
      refraction techniques. Five deployments of vertical sensors recorded
      63 shots at 44 different locations along a 315 km profile. The nominal
      station spacing is 100 meters. Shot sizes varied from 100 lbs to 4000
      lbs and the offsets varied from 0 to 200 km.</Description>
	<

Let's do the same thing but using a header to specify the desired format

In [39]:
headers = {'Accept': 'application/echo10+xml'}
response = requests.get(str(references[10].find('location').text), headers=headers)
response_as_dom = xml.dom.minidom.parseString(response.text)
xml_reference_response = response_as_dom.toprettyxml()
print(xml_reference_response)

<?xml version="1.0" ?>
<Collection>
	<ShortName>PASSCAL_BROOKS2</ShortName>
	<VersionId>Not provided</VersionId>
	<InsertTime>2021-06-29T21:37:35.951Z</InsertTime>
	<LastUpdate>2021-06-29T21:37:35.951Z</LastUpdate>
	<LongName>Not provided</LongName>
	<DataSetId>1990 PASSCAL/USGS/GSC Brooks Range Seismic Survey</DataSetId>
	<Description>In the summer of 1990, the U.S. Geological Survey, Rice University and
      the Geological Survey of Canada conducted a seismic experiment across
      the Brooks Range, Alaska. The goal of the experiment was to produce a
      high-resolution image of the crust and upper mantle of the Brooks
      Range and flanking geologic provinces by combining reflection and
      refraction techniques. Five deployments of vertical sensors recorded
      63 shots at 44 different locations along a 315 km profile. The nominal
      station spacing is 100 meters. Shot sizes varied from 100 lbs to 4000
      lbs and the offsets varied from 0 to 200 km.</Description>
	<

In [None]:
## Getting metadata by search constraints

## Getting metadata in the format you want