# sxs_metadata_example.ipynb

This notebook demonstrates how to use the `sxs` python library to interact with the SXS Catalog and its metadata on zenodo. The catalog is available at https://black-holes.org/waveforms and is described in https://arxiv.org/abs/1904.04831. 

This notebook produces the same output as `GetURLsForCatalogJSON.ipynb` and `ParseCatalogJson.ipynb`, but much faster and more easily.

Specifically, it creates a dictionary containing the Zenodo metadata for every simulation in the public SXS Catalog. The metadata is placed in a dictionary called `catalog_json`, whose keys are SXS ID numbers (e.g. `SXS:BBH:0444`).

In [1]:
import sxs
from sxs import zenodo as zen
import json
import datetime
import numpy as np

This cell sends a single zenodo search query to find all public SXS simulations. It takes minutes, whereas directly sending a query for each piece of metadata as in `GetURLsForCatalogJSON.ipynb` takes closer to an hour.

In [None]:
%%time
md = zen.api.Records.search(q='communities:sxs AND access_right:open')

The search returns a list of json files. Here loop over each returned result. Make sure that the title of the page ends in an SXS ID by checking that splitting the last word of the title by `:` returns 3 elements. Then, if the type is `BBH`, add the key and metadata to a dictionary.

In [None]:
catalog_json = {}
for simulation in md:
    sxs_id = simulation['title'].split(' ')[-1]
    if (len(sxs_id.split(':')) == 3):
        if (sxs_id.split(':')[-2] == 'BBH'):
            catalog_json[sxs_id] = simulation

In [None]:
with open("sxs_catalog_zen.json", 'w') as file:
    file.write(json.dumps(catalog_json))

In [None]:
print("There are " + str(len(catalog_json.keys())) + " BBH simulations in the catalog.")

In [None]:
def resolutions_for_simulation(sxs_id):
    resolutions = []
    files = catalog_json[sxs_id]['files']
    for file in files:
        split_filename = file['filename'].split('/')
        if (str(split_filename[-1]) == "Horizons.h5"):
            resolutions.append(int(split_filename[-2].split('Lev')[-1]))
    return sorted(resolutions)

In [None]:
resolutions_available = {}
for simulation in catalog_json:
    resolutions_available[simulation] = resolutions_for_simulation(simulation)
    
with open("sxs_catalog_zen_resolutions_available.json", 'w') as file:
    file.write(json.dumps(resolutions_available))

## Available metadata

Here are the available metadata keys for a given simulation.

In [None]:
catalog_json['SXS:BBH:0444'].keys()

Here is an example metadata record. The `files` key is a dictionary containing filenames, checksums, and download links to each file for the simulation. You can retrieve a file individually, e.g., using the `requests` library in python, although you can more easily download data using the `sxs` library (see the `sxs_catalog_download_example` notebook).

In [None]:
catalog_json['SXS:BBH:0444']

Here are some functions that can translate the Zenoda dates into python datetime objects. In this example, we print out a table of simulations, sorted by modification date.

In [None]:
def datetime_from_zenodo_datetime(zenodo_datetime):
    return datetime.datetime.strptime(zenodo_datetime.split('+')[0], "%Y-%m-%dT%H:%M:%S.%f")

def date_from_zenodo_date(zenodo_date):
    return datetime.datetime.strptime(zenodo_date, "%Y-%m-%d")

In [None]:
# Read basic info from the metadata.
simulations_list = []
modified_list = []
published_list = []
urls_list = []
names_list = []
for simulation in sorted(catalog_json.keys()):
    simulations_list.append(simulation)
    modified_list.append(datetime_from_zenodo_datetime(catalog_json[simulation]['modified']))
    published_list.append(date_from_zenodo_date(catalog_json[simulation]['metadata']['publication_date']))
    urls_list.append(catalog_json[simulation]['links']['latest_html'])
    names_list.append(catalog_json[simulation]['metadata']['title'].split(' ')[-1])
simulations = np.array(simulations_list)
modified = np.array(modified_list)
published = np.array(published_list)
urls = np.array(urls_list)
names = np.array(names_list)
  

In [None]:
sort_by = modified
for i,simulation in enumerate(simulations[np.argsort(sort_by)]):
    print(simulations[np.argsort(sort_by)][i] + "    " + str(modified[np.argsort(sort_by)][i]) + "    " \
          + str(published[np.argsort(sort_by)][i]) + "    " + str(urls[np.argsort(sort_by)][i]))