In [2]:
%pip install earthaccess

Collecting earthaccess
  Downloading earthaccess-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting importlib-resources>=6.3.2 (from earthaccess)
  Using cached importlib_resources-6.5.2-py3-none-any.whl.metadata (3.9 kB)
Collecting multimethod>=1.8 (from earthaccess)
  Using cached multimethod-2.0-py3-none-any.whl.metadata (9.2 kB)
Collecting pqdm>=0.1 (from earthaccess)
  Using cached pqdm-0.2.0-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting python-cmr>=0.10.0 (from earthaccess)
  Using cached python_cmr-0.13.0-py3-none-any.whl.metadata (10 kB)
Collecting s3fs>=2022.11 (from earthaccess)
  Using cached s3fs-2025.3.2-py3-none-any.whl.metadata (1.9 kB)
Collecting tinynetrc>=1.3.1 (from earthaccess)
  Using cached tinynetrc-1.3.1-py2.py3-none-any.whl.metadata (2.9 kB)
Collecting bounded-pool-executor (from pqdm>=0.1->earthaccess)
  Using cached bounded_pool_executor-0.0.3-py3-none-any.whl.metadata (2.7 kB)
Collecting tqdm (from pqdm>=0.1->earthaccess)
  Using cached tqdm-4.67.1-py3

In [4]:
%pip install IProgress 

Collecting IProgress
  Downloading IProgress-0.4-py3-none-any.whl.metadata (2.1 kB)
Downloading IProgress-0.4-py3-none-any.whl (11 kB)
Installing collected packages: IProgress
Successfully installed IProgress-0.4
Note: you may need to restart the kernel to use updated packages.


# Sample queries for datasets

The DataCollection class queries for datasets (collections in NASA terminology) and can use a variety of criteria.
The basics are the spatio temporal parameters but we can also search based on the data center (or DAAC), the dataset version or cloud hosted data.

This notebook provides some examples of how to search for datasets using different parameters.

Collection search parameters

**dataset origin and location**
* archive_center
* data_center
* daac
* provider
* cloud_hosted

**spatio temporal parameters**
* bounding_box
* temporal
* point
* polygon
* line

**dataset metadata parameters**
* concept_id 
* entry_title
* keyword
* version
* short_name

Once the query has been formed with one or more search parameters we can get the results by using either `hits()` or `get()`.

* **hits()**: gets the counts for our query, if the search didn't match any result then is 0
* **get()**: gets the metadata records for those collections that matched our criteria, we can specify a max i.e. `get(10)`, if we do not specify the default number is 2000



## Example #1, querying for cloud enabled data from a given data center (DAAC)

In [5]:
from earthaccess import DataCollections

# We only need to specify the DAAC and if we're looking for cloud hosted data
query = DataCollections().daac("LPDAAC").cloud_hosted(False)
# we use hits to get a count for the collections that match our query
query.hits()

153

In [6]:
# Now we get the collections' metadata
collections = query.get(10)
# let's print only the first collection, uncomment the next line
# collections[0]

In [7]:
# We can print a small summary of the dataset, here for the first 10 collections
summaries = [collection.summary() for collection in collections]
summaries

[{'short-name': 'AST_L1T',
  'concept-id': 'C1000000320-LPDAAC_ECS',
  'version': '003',
  'file-type': "[{'Format': 'HDF-EOS2', 'FormatType': 'Native', 'FormatDescription': 'Hierarchical Data Format - Earth Observing System Version 2', 'Media': ['HTTPS'], 'AverageFileSize': 209.4, 'AverageFileSizeUnit': 'MB', 'TotalCollectionFileSizeBeginDate': '2000-03-04T00:00:00.000Z'}, {'Format': 'GeoTIFF', 'FormatType': 'Supported', 'FormatDescription': 'Georeferenced Tagged Image File Format', 'Media': ['HTTPS']}]",
  'get-data': ['https://e4ftl01.cr.usgs.gov/ASTT/AST_L1T.003/',
   'https://search.earthdata.nasa.gov/search?q=C1000000320-LPDAAC_ECS']},
 {'short-name': 'VNP43IA4',
  'concept-id': 'C1407099497-LPDAAC_ECS',
  'version': '001',
  'file-type': "[{'Format': 'HDF-EOS5', 'FormatType': 'Native', 'FormatDescription': 'Hierarchical Data Format - Earth Observing System Version 5', 'Media': ['https'], 'AverageFileSize': 16.7, 'AverageFileSizeUnit': 'MB', 'TotalCollectionFileSizeBeginDate': '2


### Searching using keywords

> **Note**: Some DAACs don't have cloud hosted collections yet, some have cloud collections but do not allow direct access

In [10]:
# Now let's search using keyword and daac
# from earthaccess import DataCollections

query = DataCollections().keyword("*ice*").daac("NSIDC")
# we use hits to get a count for the collections that match our query
query.hits()

564

In [13]:
# Now let's search using keyword and daac
query = DataCollections().keyword("sea ice concentration").daac("NSIDC")
# we use hits to get a count for the collections that match our query
query.hits()

16

In [14]:
# Let's get only the info on the first 10 collections and filter the fields
collections = query.get(10)
# let's print just the first collection, do you really want to look at all the metadata ?

In [15]:
# We can print a small summary of the dataset, here for the first 10 collections again
summaries = [collection.summary() for collection in collections]
summaries

[{'short-name': 'AU_SI12',
  'concept-id': 'C1542606326-NSIDC_ECS',
  'version': '1',
  'file-type': "[{'FormatType': 'Native', 'Format': 'HDF-EOS5', 'FormatDescription': 'HTTPS'}, {'FormatType': 'Native', 'Format': 'ASCII', 'FormatDescription': 'HTTPS'}]",
  'get-data': ['https://n5eil01u.ecs.nsidc.org/AMSA/AU_SI12.001',
   'https://search.earthdata.nasa.gov/search?q=AU_SI12+V001',
   'https://nsidc.org/data/data-access-tool/AU_SI12/versions/1/',
   'https://n5eil01u.ecs.nsidc.org/AMSA/AU_SI12.001',
   'https://search.earthdata.nasa.gov/search?q=AU_SI12+V001',
   'https://nsidc.org/data/data-access-tool/AU_SI12/versions/1/']},
 {'short-name': 'NSIDC-0051',
  'concept-id': 'C2399557265-NSIDC_ECS',
  'version': '2',
  'file-type': "[{'FormatType': 'Native', 'Format': 'PNG', 'FormatDescription': 'HTTPS'}, {'FormatType': 'Native', 'Format': 'netCDF-4', 'FormatDescription': 'HTTPS'}]",
  'get-data': ['https://n5eil01u.ecs.nsidc.org/PM/NSIDC-0051.002/',
   'https://search.earthdata.nasa.gov

In [16]:
query = DataCollections().cloud_hosted(True).bounding_box(-180, 70, 180, 85)

query.hits()

3725

In [19]:
query = (
    DataCollections()
    #.cloud_hosted(True)
    .short_name("RDEFT4")
)
for c in query.get(40):
    print(c.summary(), "\n")

{'short-name': 'RDEFT4', 'concept-id': 'C1431413941-NSIDC_ECS', 'version': '1', 'file-type': "[{'FormatType': 'Native', 'Format': 'PNG', 'FormatDescription': 'HTTPS'}, {'FormatType': 'Native', 'Format': 'NetCDF', 'FormatDescription': 'HTTPS'}]", 'get-data': ['https://nsidc.org/icebridge/portal/', 'https://n5eil01u.ecs.nsidc.org/ICEBRIDGE/RDEFT4.001/', 'https://search.earthdata.nasa.gov/search?q=RDEFT4+V001', 'https://nsidc.org/data/data-access-tool/RDEFT4/versions/1/', 'https://nsidc.org/icebridge/portal/', 'https://n5eil01u.ecs.nsidc.org/ICEBRIDGE/RDEFT4.001/', 'https://search.earthdata.nasa.gov/search?q=RDEFT4+V001', 'https://nsidc.org/data/data-access-tool/RDEFT4/versions/1/']} 



In [20]:
query.hits()

1