## <b>How do I programmatically search for new ICESat-2 granules?</b>

The National Snow and Ice Data Center (NSIDC) isn't able to regularly update users of additions to the ICESat-2 collection outside of major version updates. To keep users informed of new ICESat-2 granules, we provide a programmatic method for querying the most recent ICESat-2 data using the Common Metadata Repository (CMR) API ([Learn more](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)).

The methods are broken up into *1. Collection*, *2. Granule*, and *3. Reference Ground Track (RGT)* query parameter sections.

In [None]:
import requests
import json

In [None]:
# URLs
granule_search_url = 'https://cmr.earthdata.nasa.gov/search/granules'
collection_search_url = 'https://cmr.earthdata.nasa.gov/search/collections'

# If using scrolling, it is good practice to clear the session when done to free up resources on the server using this url
clear_scroll_url = 'https://cmr.earthdata.nasa.gov/search/clear-scroll'

### <b>1. Query Collections</b>

#### *What is the latest <b>version</b> for a collection?*

In [None]:
params = {
    'short_name': 'ATL06',
}
headers={'Accept': 'application/json'}

response = requests.get(collection_search_url, params=params, headers=headers)
results = response.json()

versions = [el['version_id'] for el in results['feed']['entry']]
latest_version = max(versions)  # may need to adjust this if collection versions aren't in lexographic order
print(f'Latest version {latest_version}')

#### *'has_granules_created_at'*

Check which ICESat-2 collections have new granules since a given time.
Collections can be queried for whether they have new granules in a given time-range using the 'has_granules_created_at' parameter

In [None]:
last_query_datetime = '2020-12-01T00:00:00Z'
params = {
    'short_name': 'ATL??',
    'version': '003',
    'options[short_name][pattern]': 'true',
    'has_granules_created_at': f'{last_query_datetime},'  # Note: empty upper bound defaults to today
}
headers={'Accept': 'application/json'}

response = requests.get(collection_search_url, params=params, headers=headers)
results = response.json()['feed']['entry']

collections = [el['short_name'] for el in results]
print(f'Have granules created since {last_query_datetime}: {collections}')

#### *'has_granules_revised_at'*

Check which ICESat-2 collections have new or updated granules since a given time. To similarly query collections for both new or updated granules in a given time-range, use the 'has_granules_revised_at' parameter

In [None]:
last_query_datetime = '2020-12-01T00:00:00Z'
params = {
    'short_name': 'ATL??',
    'version': '003',
    'options[short_name][pattern]': 'true',
    'has_granules_revised_at': f'{last_query_datetime},'  # Note: empty upper bound defaults to today
}
headers={'Accept': 'application/json'}

response = requests.get(collection_search_url, params=params, headers=headers)
results = response.json()['feed']['entry']

collections = [el['short_name'] for el in results]
print(f'Have granules created since {last_query_datetime}: {collections}')

### <b>2. Query Granules</b>

#### *'updated_since'*

Checking for recently updated and/or added granules for a given bounding box. The parameter 'updated_since' will return granules added or updated since the given date

In [None]:
short_name = 'ATL03'
last_query_datetime = '2020-11-01T00:00:00Z'
page_size = 100
params = {
    'short_name': 'ATL03',
    'version': '003',
    'bounding_box': '-109.060253,36.992426,-102.041524,41.003444',  # Colorado
    'updated_since': last_query_datetime,
    'scroll': 'true',  # initiates a session for this query, ensuring efficient accurate retrieval of results
    'page_size': page_size
}
headers={'Accept': 'application/json'}

granules = []
scroll_id = None
count = 0
while True:
    response = requests.get(granule_search_url, params=params, headers=headers)
    if scroll_id is None:
        scroll_id = response.headers['CMR-Scroll-Id']
        headers['scroll_id'] = scroll_id
        hits = int(response.headers['CMR-Hits'])

    results = response.json()['feed']['entry']
    count += len(results)
    granules.extend(results)

    if count == hits:
        break

# Clear scroll id to release resources on servers
requests.post(clear_scroll_url, json={'scroll_id': scroll_id}, headers={'Content-Type': 'application/json'})

print(f'Granules updated for {short_name} since {last_query_datetime}: {len(granules)}')

#### Get information on size and number of updated granules

In [None]:
granule_sizes = [float(granule['granule_size']) for granule in granules]
print(f'The average size of each granule is {mean(granule_sizes):.1f} MB and the total size of all {len(granules)} granules is {sum(granule_sizes):.1f} MB')

In [None]:
granule_list = [granule['producer_granule_id'] for granule in granules]
print ("The first and last updated granules are: " +  str(granule_list[::len(granule_list)-1])) 

#### *'revision_date'*

Checking for added/updated granules for a given bounding box and time-range. The parameter 'revision_date' can also be used to query a time-range for added/revised granules

In [None]:
short_name = 'ATL03'
time_range = '2020-11-01T00:00:00Z,2021-01-01T00:00:00Z'
page_size = 100
params = {
    'short_name': 'ATL03',
    'version': '003',
    'bounding_box': '-109.060253,36.992426,-102.041524,41.003444',  # Colorado
    'revision_date': time_range,
    'scroll': 'true',  # initiates a session for this query, ensuring efficient accurate retrieval of results
    'page_size': page_size
}
headers={'Accept': 'application/json'}

granules = []
scroll_id = None
count = 0
while True:
    response = requests.get(granule_search_url, params=params, headers=headers)
    if scroll_id is None:
        scroll_id = response.headers['CMR-Scroll-Id']
        headers['scroll_id'] = scroll_id
        hits = int(response.headers['CMR-Hits'])

    results = response.json()['feed']['entry']
    count += len(results)
    granules.extend(results)

    if count == hits:
        break

# Clear scroll id to release resources on servers
requests.post(clear_scroll_url, json={'scroll_id': scroll_id}, headers={'Content-Type': 'application/json'})

print(f'Updated granules for {short_name} during the time-range {time_range}: {len(granules)}')

### <b>3. Query Reference Ground Track (RGT)</b>

*Note*: ground tracks are not available as a query parameter in CMR, but this can still be accomplished for ICESat-2 using 'readable_granule_name' patterns with the orbit number calculated from cycle and RGT numbers

#### Query granules by RGT

In [None]:
RGT = 472
short_name = 'ATL03'
page_size = 100
params = {
    'short_name': short_name,
    'version': '003',
    'options[readable_granule_name][pattern]': 'true',
    'readable_granule_name': f'{short_name}_??????????????_{str(RGT).zfill(4)}????_*',
    'scroll': 'true',  # initiates a session for this query, ensuring efficient accurate retrieval of results
    'page_size': page_size
}
headers={'Accept': 'application/json'}
granules = []
count = 0
while True:
    response = requests.get(granule_search_url, params=params, headers=headers)
    if 'CMR-Scroll-Id' not in headers:
        headers['CMR-Scroll-Id'] = response.headers['CMR-Scroll-Id']
        hits = int(response.headers['CMR-Hits'])
    results = response.json()['feed']['entry']
    count += len(results)
    granules.extend(results)
    if count == hits:
        break
# Clear scroll id to release resources on servers
requests.post(clear_scroll_url, json={'scroll_id': headers['CMR-Scroll-Id']}, headers={'Content-Type': 'application/json'})
print(f'Granules for RGT {RGT}: {len(granules)}')

#### Query granules by RGT and Cycle

In [None]:
RGT = 472
short_name = 'ATL03'
cycle = 9
page_size = 100
params = {
    'short_name': short_name,
    'version': '003',
    'options[readable_granule_name][pattern]': 'true',
    'readable_granule_name': f'{short_name}_??????????????_{str(RGT).zfill(4)}{str(cycle).zfill(2)}??_*',
    'scroll': 'true',  # initiates a session for this query, ensuring efficient accurate retrieval of results
    'page_size': page_size
}
headers={'Accept': 'application/json'}
granules = []
count = 0
while True:
    response = requests.get(granule_search_url, params=params, headers=headers)
    if 'CMR-Scroll-Id' not in headers:
        headers['CMR-Scroll-Id'] = response.headers['CMR-Scroll-Id']
        hits = int(response.headers['CMR-Hits'])
    results = response.json()['feed']['entry']
    count += len(results)
    granules.extend(results)
    if count == hits:
        break
# Clear scroll id to release resources on servers
requests.post(clear_scroll_url, json={'scroll_id': headers['CMR-Scroll-Id']}, headers={'Content-Type': 'application/json'})
print(f'Granules for RGT {RGT} in Cycle {cycle}: {len(granules)}')

In [None]:
granule_list = [granule['producer_granule_id'] for granule in granules]
print ("The first and last updated granules are: " +  str(granule_list[::len(granule_list)-1])) 

In [None]:
print(*granule_list, sep = "\n")