# Sample queries for datasets

The DataCollection class queries for datasets (collections in NASA terminology) and can use a variety of criteria.
The basics are the spatio temporal parameters but we can also search based on the data center (or DAAC), the dataset version or cloud hosted data.

This notebook provides some examples of how to search for datasets using different parameters.

Collection search parameters

**dataset origin and location**
* archive_center
* data_center
* daac
* provider
* cloud_hosted

**spatio temporal parameters**
* bounding_box
* temporal
* point
* polygon
* line

**dataset metadata parameters**
* concept_id 
* entry_title
* keyword
* version
* short_name

Once the query has been formed with one or more search parameters we can get the results by using either `hits()` or `get()`.

* **hits()**: gets the counts for our query, if the search didn't match any result then is 0
* **get()**: gets the metadata records for those collections that matched our criteria, we can specify a max i.e. `get(10)`, if we do not specify the default number is 2000



## Example #1, querying for cloud enabled data from a given data center (DAAC)

In [1]:
from earthdata import DataCollections

# We only need to specify the DAAC and if we're looking for cloud hosted data
query = DataCollections().daac("PODAAC").cloud_hosted(True)
# we use hits to get a count for the collections that match our query
query.hits()

346

In [2]:
# We can print a small summary of the dataset, here for the first 10 collections
from pprint import pprint

collections = query.get(10)
summaries = [collection.summary() for collection in collections]

for summary in summaries:
    pprint(summary)
    print("\n")

{'cloud-info': {'Region': 'us-west-2',
                'S3BucketAndObjectPrefixNames': ['podaac-ops-cumulus-public/MODIS_A-JPL-L2P-v2019.0/',
                                                 'podaac-ops-cumulus-protected/MODIS_A-JPL-L2P-v2019.0/'],
                'S3CredentialsAPIDocumentationURL': 'https://archive.podaac.earthdata.nasa.gov/s3credentialsREADME',
                'S3CredentialsAPIEndpoint': 'https://archive.podaac.earthdata.nasa.gov/s3credentials'},
 'concept-id': 'C1940473819-POCLOUD',
 'file-type': "[{'Format': 'netCDF-4', 'FormatType': 'Native', "
              "'AverageFileSize': 25.0, 'AverageFileSizeUnit': 'MB', "
              "'TotalCollectionFileSizeBeginDate': "
              "'2002-06-01T00:00:00.000Z'}]",
 'get-data': ['https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1940473819-POCLOUD',
              'https://search.earthdata.nasa.gov/search/granules?p=C1940473819-POCLOUD'],
 'short-name': 'MODIS_A-JPL-L2P-v2019.0',
 'version': '2019.0'}


{'c

In [3]:
collections[0]

{
  "meta": {
    "revision-id": 41,
    "deleted": false,
    "format": "application/vnd.nasa.cmr.umm+json",
    "provider-id": "POCLOUD",
    "user-id": "mgangl",
    "has-formats": true,
    "associations": {
      "variables": [
        "V1997812737-POCLOUD",
        "V1997812697-POCLOUD",
        "V2112014688-POCLOUD",
        "V1997812756-POCLOUD",
        "V1997812688-POCLOUD",
        "V1997812670-POCLOUD",
        "V1997812724-POCLOUD",
        "V2112014684-POCLOUD",
        "V1997812701-POCLOUD",
        "V1997812681-POCLOUD",
        "V2112014686-POCLOUD",
        "V1997812663-POCLOUD",
        "V1997812676-POCLOUD",
        "V1997812744-POCLOUD",
        "V1997812714-POCLOUD"
      ],
      "services": [
        "S1962070864-POCLOUD",
        "S2004184019-POCLOUD",
        "S2153799015-POCLOUD",
        "S2227193226-POCLOUD"
      ],
      "tools": [
        "TL2108419875-POCLOUD",
        "TL2092786348-POCLOUD"
      ]
    },
    "s3-links": [
      "podaac-ops-cumulus-pub


### Searching using keywords

> **Note**: Some DAACs don't have cloud hosted collections yet, some have cloud collections but do not allow direct access

In [4]:
# Now let's search using keyword and daac
query = DataCollections().keyword("fi*e").daac("LPDAAC")
# we use hits to get a count for the collections that match our query
query.hits()

498

In [5]:
# Now let's search using keyword and daac
query = DataCollections().keyword("fire").daac("LPDAAC")
# we use hits to get a count for the collections that match our query
query.hits()

17

In [6]:
# Let's get only the info on the first 10 collections and filter the fields
collections = query.get(10)
# let's print just the first collection, do you really want to look at all the metadata ?
collections[0]

{
  "meta": {
    "revision-id": 18,
    "deleted": false,
    "format": "application/vnd.nasa.cmr.umm+json",
    "provider-id": "LPDAAC_ECS",
    "user-id": "keinerjones",
    "has-formats": false,
    "has-spatial-subsetting": false,
    "native-id": "mmt_collection_8495",
    "has-transforms": false,
    "has-variables": false,
    "concept-id": "C1621383535-LPDAAC_ECS",
    "revision-date": "2022-02-07T16:13:58.268Z",
    "granule-count": 2286960,
    "has-temporal-subsetting": false,
    "concept-type": "collection"
  },
  "umm": {
    "CollectionCitations": [
      {
        "Creator": "Louis Giglio, Christopher Justice",
        "OnlineResource": {
          "Linkage": "https://doi.org/10.5067/MODIS/MOD14.061",
          "Name": "DOI Landing Page"
        },
        "OtherCitationDetails": "The DOI landing page provides citations in APA and Chicago styles.",
        "Publisher": "NASA EOSDIS Land Processes DAAC",
        "ReleaseDate": "2021-02-11T00:00:00.000Z",
        "Series

In [7]:
# We can print a small summary of the dataset, here for the first 10 collections again
summaries = [collection.summary() for collection in collections]
summaries

[{'short-name': 'MOD14',
  'concept-id': 'C1621383535-LPDAAC_ECS',
  'version': '061',
  'file-type': "[{'Format': 'HDF4', 'FormatType': 'Native', 'FormatDescription': 'Hierarchical Data Format Version 4', 'Media': ['HTTP'], 'AverageFileSize': 0.25, 'AverageFileSizeUnit': 'MB', 'TotalCollectionFileSizeBeginDate': '2000-02-24T00:00:00.000Z'}]",
  'get-data': ['https://e4ftl01.cr.usgs.gov/MOLT/MOD14.061/',
   'https://search.earthdata.nasa.gov/search?q=C1621383535-LPDAAC_ECS',
   'https://earthexplorer.usgs.gov/']},
 {'short-name': 'MYD14',
  'concept-id': 'C1621434243-LPDAAC_ECS',
  'version': '061',
  'file-type': "[{'Format': 'HDF4', 'FormatType': 'Native', 'FormatDescription': 'Hierarchical Data Format Version 4', 'Media': ['HTTP'], 'AverageFileSize': 0.25, 'AverageFileSizeUnit': 'MB', 'TotalCollectionFileSizeBeginDate': '2002-07-04T00:00:00.000Z'}]",
  'get-data': ['https://e4ftl01.cr.usgs.gov/MOLA/MYD14.061/',
   'https://search.earthdata.nasa.gov/search?q=C1621434243-LPDAAC_ECS',


In [8]:
query = DataCollections().cloud_hosted(True).bounding_box(-25.31,63.23,-11.95,66.65)

query.hits()

422