# Cloud Catalog Demo

The cloudcatalog standard enables any Python script that can see AWS to access the Petabytes of data stored in the GSFC, APL and other HelioCloud data repositories.  Both data from the NASA public and TOPS (Transform to Open Science) AWS data stores and data that other HelioClouds have made public are equally accessible via this interface.  Currently cloudcatalog lets you access datasets you are aware of, or walk through the public listing of available datasets.  Search and improved findability are expected future features.

## Setup

If you are not in HelioCloud and wish to run this demo, you may have to type "%pip install cloudcatalog" into a code cell to import the catalog-access package that finds HelioCloud analysis cache datasets.

In [1]:
import cloudcatalog
import cdflib
import matplotlib.pyplot as plt
from pprint import pprint

## Params

## Searching for data with EntireCatalogSearch

In [None]:
search = cloudcatalog.EntireCatalogSearch()

In [None]:
search.search_by_id('mms1_feeps')

In [None]:
search.search_by_id('srvy_ion')

In [None]:
search.search_by_title('mms1/fpi/b')

In [None]:
search.search_by_title('des-dist')[:2]

In [None]:
search.search_by_keywords(['mms2', 'brst', 'apples'])[:3]

## Working with the global catalog (..the name)

In [None]:
cr = cloudcatalog.CatalogRegistry()

In [None]:
cr.get_catalog()

In [None]:
cr.get_registry()

In [None]:
cr.get_entries()

In [None]:
endpoint = cr.get_endpoint('GSFC HelioCloud Public Temp')
endpoint

In [None]:
cr.catalog

## Working with a local catalog

In [None]:
fr = cloudcatalog.CloudCatalog(endpoint, cache=True)

In [None]:
fr.get_catalog()

In [None]:
fr_id1 = 'mms1_feeps_brst_electron'
fr_id2 = 'mmm2_feeps_brst_electron'
start_date = '2020-02-01T00:00:00Z'
stop_date =   '2020-02-02T00:00:00Z'

In [None]:
pprint(fr.get_entry(fr_id1))

In [None]:
cloud_catalog1 = fr.request_cloud_catalog(fr_id1, start_date=start_date, stop_date=stop_date, overwrite=False)

In [None]:
cloud_catalog1

In [None]:
print('Python Hash of File | Start Date | File Size')
fr.stream(cloud_catalog1, lambda bo, d, f: print(hash(bo.read()), d.replace(' ', 'T')+'Z', f))

In [None]:
def plot_cdf(s3_uri, d, f):
    print(len(cdflib.CDF(s3_uri).cdf_info()["zVariables"]), d.replace(' ', 'T')+'Z', f)
    
    cdf = cdflib.CDF(s3_uri)

    # Get the variable name and its data
    var_name = cdf.cdf_info()["zVariables"][2]
    var_data = cdf.varget(var_name)

    # Plot the variable
    plt.figure()
    plt.plot(var_data)
    plt.xlabel("Index")
    plt.ylabel(var_name)
    plt.title(f"Plot of {var_name}")
    plt.show()

print('# of zVariables | Start Date | File Size')
fr.stream_uri(cloud_catalog1[:2], lambda s3_uri, d, f: plot_cdf(s3_uri, d, f))