# Extract catalogs using the API

The following code demonstrates how to query JSON catalogs using the API. Feel free to borrow from it as a skeleton for your actual code.

We begin by defining some constants and helper functions. The USERNAME is your web portal username. The API_KEY is a REST API key which you can generate by navigating to Tools -> REST API Key Manager on the website and copying the generated key. HOST is the server the API will run requests against.

The run_query_progress_checks function will be used later to check how far along a query is at regular time intervals.


In [1]:
from bdc_api import BdcApi
import time

# TODO: Replace this with your username.
USERNAME = 'example_username'
# TODO: Replace this with your API key.
API_KEY = '83212a59-291c-4fad-a6ea-590edbea9fdd'
HOST = 'https://minos.lbl.gov'

def run_query_progress_checks(query_id):
    """Run subsequent checks every 3 seconds to see if a query succeeded.

    Parameters:

        :query_id: Valid query ID.

    Returns:

        - Named tuple object QueryInfo where the `progress` field
          contains the percentage progress of the query and the `status`
          field includes the job status.

    Raises:

        - None.
    """
    print ('Checking query progress every 3 seconds for query {0}'.format(query_id))
    query_info = BdcApi.QueryInfo(progress='0%', status='in progress')
    while query_info.progress != BdcApi.COMPLETE_QUERY and \
            query_info.status not in BdcApi.COMPLETE_QUERY_STATUS:
        time.sleep(3)
        query_info = api.check_query_progress(query_id)
        print ('Current query progress: {0}'.format(query_info.progress))
        print ('Current job status: {0}'.format(query_info.status))
    return query_info

We now instantiate an instance of BdcApi which will be used throughout for all API calls. Note how it depends only on username, a REST API key and a hostname.

In [2]:
# 0. Setup BdcApi object:
api = BdcApi(username=USERNAME, api_key=API_KEY,
        hostname=HOST)

The following call returns the catalogs (JSON files) for all accessible data collections. Note that the catalogs are ordered by data collection so we parse the returned JSON into catalog_list which will include a list of JSON files we can then request downloading:

In [3]:
# 1. Get list of accessible catalogs:
print('Requesting all accessible catalogs')
response = api.get_files(extensions='json')
catalog_list = []
for coll in response:
    catalog_list += response[coll]
print('Found {0} accessible catalogs'.format(len(catalog_list)))

Requesting all accessible catalogs
Found 577 accessible catalogs


We now make a query for those files with api.start_files_query. Then we save the results using api.save_file. The newly generated zip file will then be available in your JupyterHub home directory.

In [4]:
# 2. Download the returned list:
file_query_id = api.start_files_query(files=catalog_list)
print (
    """Submitted files query request with query ID {0}
        to download {1} catalogs""".format(file_query_id, len(catalog_list)))
query_info = run_query_progress_checks(file_query_id)
if query_info.progress == BdcApi.COMPLETE_QUERY:
    print ('Query completed, moving results to user JupyterHub home')
    result = api.save_file(file_query_id, jupyterhub=True)

Submitted files query request with query ID 5f722e743d28ed7a0e5fbb75
        to download 577 catalogs
Checking query progress every 3 seconds for query 5f722e743d28ed7a0e5fbb75
Current query progress: 0%
Current job status: processing
Current query progress: 0%
Current job status: processing
Current query progress: 3%
Current job status: processing
Current query progress: 100%
Current job status: success
Query completed, moving results to user JupyterHub home
