# EUVML dataset is STEREO EUVI + SOHO EIT
Alex Antunes, June 2024

This is an example of accessing a user-contributed S3 dataset, in this case just over 2 million 1MB 512x512 FITS files using STEREO header definitions for both STEREO EUVI and SOHO EIT.

SOHO is 1996-2024, STEREOA is 2007-2024, STEREOB is 2007-2014.  Wavelengths are 171A, 195A, 284A, and 304A.

Accessible via AWS 'S3' storage, which allows free access to HDRL data holdings.  Primary recommended interface is Python using the CloudCatalog protocol for querying file listings.  VSO intends to mirror traditionally as well.

Datasets are: euvml, euvml_soho, euvml_stereoa, euvml_stereob, euvml_171, euvml_195, euvml_284, euvml_304, and euvml_\<spacecraft>_\<wavelength>

In [None]:
try:
    import cloudcatalog
except:
    %pip install cloudcatalog --upgrade
    
import astropy.io.fits
import sunpy.map

cloud_endpoint=cloudcatalog.CloudCatalog("s3://gov-nasa-hdrl-data1/")
frID = "euvml"

# Get metadata
myjson = cloud_endpoint.get_entry(frID)
print(f"Metadata for {frID} is {myjson}")
start, stop = myjson['start'], myjson['stop']

# Or hard-code times
start, stop = '2011-01-01T00:00:00Z', '2011-01-02T23:59:59Z'

# Get full file registry including metadata, also convert to just the file list.
file_registry1 = cloud_endpoint.request_cloud_catalog(frID, start_date=start, stop_date=stop)
filelist = file_registry1['datakey'].to_list()
print(f"\nThere are {len(filelist)} files for ID {frID} in time range {start} to {stop}")

# Sample AstroPy reader and SunPy Map reader

AstroPy can be compiled to include the ability to read from cloud storage same as with filenames.  SunPy is developing that; at the time of this Notebook I find it easier to read in via AstroPy then convert to a SunPy Map object.

In [None]:
hdul = astropy.io.fits.open(filelist[0])

euvml_map = sunpy.map.Map(hdul[0].data,hdul[0].header)
euvml_map.peek()

print(f"Spacecraft is {euvml_map.meta['TELESCOP']} for wavelength {euvml_map.meta['WAVELNTH']}")

file_registry1

# Streaming through files

Here are two streaming examples. The first runs a quick inline command using the API spec that guarantees the first four fields are s3key, start and stop times, and file size.  The second is a similar stream, in this case plotting the files.  Note in constructing the lambda that the 'stream' and 'stream_uri' commands mandate 4 variables be sent, but the catching functions can use or ignore any of them.   We send only the first ten files for this demo, but in a production setup you can send the entire filelist if desired.

In [None]:
cloud_endpoint.stream(file_registry1[:10], lambda s3key, start, stop, fsize: print(f"{hash(s3key.read())}\t{start}\t{stop}\t{fsize}"))

In [None]:
def plot_euvml(fname):
    hdul = astropy.io.fits.open(filelist[0])
    euvml_map = sunpy.map.Map(hdul[0].data,hdul[0].header)
    euvml_map.peek()

cloud_endpoint.stream_uri(file_registry1[:3], lambda fname, start, stop, fsize: plot_euvml(fname))

# Finding catalogs

You can do a search of the entire HelioCloud network to find what datasets exist.  Here we do searches on the ID, Title and keyword metadata within HelioCloud.  The last search, for example, will find all datasets tagged with '171' or '194' in their IDs, and finds AIA as well as our SOHO and STEREO datasets.

In [None]:
mysearch = cloudcatalog.EntireCatalogSearch()
print(f"Number of SOHO datasets, search by id: {len(mysearch.search_by_id('soho'))}")
print(f"\nIDs for all AIA datasets, search by title: {[s['id'] for s in mysearch.search_by_title('aia')]}")
print(f"\nIDs of all datasets whose id contain '193' or '194': {[s['id'] for s in mysearch.search_by_keywords(['193','194'])]}")
print(f"\nIDs of all 'euvml' datasets: {[s['id'] for s in mysearch.search_by_id('euvml')]}")