# NASA Common Metadata Repository

[NASA's Common Metadata Repository (CMR)]{https://www.earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/cmr} is a high-performance, high-quality, continuously evolving metadata system that catalogs all data and service metadata records for NASA's Earth Observing System Data and Information System (EOSDIS) system and will be the authoritative management system for all EOSDIS metadata. These metadata records are registered, modified, discovered, and accessed through programmatic interfaces leveraging standard protocols and APIs.

You can search the CMR system online here: https://cmr.earthdata.nasa.gov/

In [1]:
import rsgislib.dataaccess.nasa_cmr
import os
import datetime
import pprint

## Get information on a Product

You can query the CMR for metadata on a product using the function below. You can look up the available products and their names here:
https://earthdata.nasa.gov/eosdis/science-system-description/eosdis-standard-products


In [2]:
# MODIS NDVI Product
prod_name="MOD13Q1"

In [3]:
info_dict = rsgislib.dataaccess.nasa_cmr.get_prods_info(prod_short_name=prod_name)


In [4]:
pprint.pprint(info_dict)

[{'archive_center': 'LP DAAC',
  'boxes': ['-90 -180 90 180'],
  'browse_flag': True,
  'cloud_hosted': True,
  'collection_data_type': 'SCIENCE_QUALITY',
  'consortiums': ['GEOSS', 'EOSDIS'],
  'coordinate_system': 'CARTESIAN',
  'data_center': 'LPCLOUD',
  'dataset_id': 'MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid '
                'V061',
  'has_combine': False,
  'has_formats': False,
  'has_spatial_subsetting': False,
  'has_temporal_subsetting': False,
  'has_transforms': False,
  'has_variables': False,
  'id': 'C1748066515-LPCLOUD',
  'links': [{'href': 'https://doi.org/10.5067/MODIS/MOD13Q1.061',
             'hreflang': 'en-US',
             'rel': 'http://esipfed.org/ns/fedsearch/1.1/metadata#'},
            {'href': 'https://lpdaac.usgs.gov/',
             'hreflang': 'en-US',
             'rel': 'http://esipfed.org/ns/fedsearch/1.1/metadata#'},
            {'href': 'https://e4ftl01.cr.usgs.gov/MOLT/MOD13Q1.061/',
             'hreflang': 'en-US',
        

## Find Most Recent Version

You can find the most recent version of the product you are interested in using the function below:

In [5]:
max_version = rsgislib.dataaccess.nasa_cmr.get_max_prod_version(prod_short_name=prod_name)
max_version

'061'

## Search for Data

Use the following function to search for the data you are interested in. You can specify the boundary box but also limit the search by date

In [6]:
bbox = [39.1, 39.5, -8, -7.6]
prod_data_granules = rsgislib.dataaccess.nasa_cmr.find_granules(prod_short_name=prod_name, 
                                                                version=max_version, 
                                                                only_dnwld = True, 
                                                                bbox = bbox, 
                                                                pt = None, 
                                                                start_date = datetime.datetime(year=2020, month=4, day=1), 
                                                                end_date = datetime.datetime(year=2020, month=4, day=30), 
                                                                cloud_min = 0, 
                                                                cloud_max = None, 
                                                                sort_date = True, 
                                                                sort_desc = True, 
                                                                page_size = 100, 
                                                                page_num = 1, 
                                                                other_params = None)


In [7]:
# Print out the number of data granules found
len(prod_data_granules)

3

### Print a single product to check the data type to download. 

Look up the file that you will want to download (e.g., ATL08_20200517210904_07990708_006_01.h5) and note the data type (e.g., application/x-hdf5) as you'll need it later.

In [8]:
pprint.pprint(prod_data_granules[0])

{'browse_flag': True,
 'cloud_cover': '28.0',
 'collection_concept_id': 'C1748066515-LPCLOUD',
 'coordinate_system': 'GEODETIC',
 'data_center': 'LPCLOUD',
 'dataset_id': 'MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid '
               'V061',
 'day_night_flag': 'DAY',
 'granule_size': '222.992',
 'id': 'G2192936668-LPCLOUD',
 'links': [{'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/MOD13Q1.061/MOD13Q1.A2020113.h21v09.061.2020333054308/MOD13Q1.A2020113.h21v09.061.2020333054308.hdf',
            'hreflang': 'en-US',
            'rel': 'http://esipfed.org/ns/fedsearch/1.1/data#',
            'title': 'Download MOD13Q1.A2020113.h21v09.061.2020333054308.hdf'},
           {'href': 's3://lp-prod-protected/MOD13Q1.061/MOD13Q1.A2020113.h21v09.061.2020333054308/MOD13Q1.A2020113.h21v09.061.2020333054308.hdf',
            'hreflang': 'en-US',
            'rel': 'http://esipfed.org/ns/fedsearch/1.1/s3#',
            'title': 'This link provides direct downlo

## How much data is to be downloaded?

The following function sums the file sizes of the identified data to download:

In [9]:
total_file_size = rsgislib.dataaccess.nasa_cmr.get_total_file_size(granule_lst=prod_data_granules)
total_file_size

671.273

## Get the Download URLs

The next step is to build a [pysondb](https://github.com/pysonDB/pysonDB) database of the data file to be downloaded

In [10]:
# The file name of the pysondb JSON database
db_file_name = f"{prod_name}_scns_db.json"

In [11]:
urls_not_found = rsgislib.dataaccess.nasa_cmr.create_cmr_dwnld_db(
    db_json = db_file_name, 
    granule_lst=prod_data_granules, 
    dwnld_file_mime_type = "application/x-hdfeos"
)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 16215.09it/s]


In [12]:
# Check whether there were data for which the URL could not be found.
len(urls_not_found)

3

In [13]:
urls_not_found

['MOD13Q1.A2020113.h21v09.061.2020333054308',
 'MOD13Q1.A2020097.h21v09.061.2020332103549',
 'MOD13Q1.A2020081.h21v09.061.2020335013855']

## Download the Data

### Create Username and Password File

You need to be careful with your username and password and therefore you should not write them into the notebook. RSGISlib provide a tool/functions for doing a basic encoding of the username and password so they are not stored as free text (Note. be careful as the simple encoding is not secure). 

To create the encoded file, you can use the command line tool `rsgisuserpassfile.py` as shown below:

`rsgisuserpassfile.py userinfo.txt`

Once you have created the `userinfo.txt` file, you can read it into your notebook/script using the `get_username_password` function shown below:

### Create Output Directory

In [14]:
out_dwnld_dir = "MOD13Q1_data"
if not os.path.exists(out_dwnld_dir):
    os.mkdir(out_dwnld_dir)

### Run Download

In [15]:
rsgislib.dataaccess.nasa_cmr.download_granules_use_dwnld_db(
    db_json = db_file_name, 
    out_path = out_dwnld_dir, 
    user_pass_file = "userinfo.txt", 
    use_wget = False
)