# PyONCat (ONCat API from Python)

## Introduction

### About

ONCat is a metadata catalog built to store information about neutron experiment data at HFIR / SNS.  The contents of the catalog can be viewed at https://oncat.ornl.gov.

An API is available to allow programmatic access to the metadata stored in the catalog.  Documentation for the API is at https://oncat.ornl.gov/#/build.

This notebook outlines the usage of "PyONCat", a Python module built to make communicating with the API a little easier.

<p><font color='green'>**(Questions / requests / feedback?  Please contact ONCat Support: oncat-support@ornl.gov.)**</font></p>

### Installation

The latest version of PyONCat should already be installed on https://jupyter.sns.gov as well as instrument / analysis machines, but if you are using a machine without it then it can be installed using `pip` as follows:

```
pip install https://oncat.ornl.gov/packages/pyoncat-1.1-py3-none-any.whl
```

## Usage

### 1 - Initial Setup

#### Main `ONCat` Object Creation

In [None]:
import pyoncat

# This is a temporary "client ID" intended for use in this tutorial **only**.
# For your own work, please contact ONCat Support to be issued your own credentials.
CLIENT_ID = "c0686270-e983-4c71-bd0e-bfa47243a47f"

# Let's use the testing version of ONCat for this tutorial.
ONCAT_URL = "https://oncat-testing.ornl.gov"

oncat = pyoncat.ONCat(
    ONCAT_URL,
    client_id=CLIENT_ID,
    flow=pyoncat.RESOURCE_OWNER_CREDENTIALS_FLOW,
)

#### Logging in With the User's Credentials

In [None]:
import getpass

username = getpass.getuser()

oncat.login(username, getpass.getpass("Enter password for \"" + username + "\":"))

### 2 - Basic Facility / Instrument Information

#### Printing the Names of the Facilities Supported by ONCat

In [None]:
facilities = oncat.Facility.list()

for facility in facilities:
    print(facility.get("name"))

#### Printing the Names of the Instruments Support by ONCat for a Single Facility

In [None]:
instruments = oncat.Instrument.list(facility="SNS")

for instrument in instruments:
    print(instrument.get("name"))

#### EXERCISE A

Print out the names of the **HFIR** instruments known to ONCat.

### 3 - Experiment Information

#### Retrieving All Experiments for an Instrument

In [None]:
experiments = oncat.Experiment.list(facility="SNS", instrument="NOM")

for experiment in experiments:
    print(experiment.get("name"))

In general, only calibration experiments and the experiments you have worked on should be visible in the list above.

Certain staff members (e.g. NOMAD instrument scientists) will be able to see all the experiments in the list.

#### EXERCISE B

Print out the names of all the experiments for an instrument you have worked on, either at SNS or HFIR.

#### Getting All the Information We Have for a given Experiment

In [None]:
# Let's use a calibration experiment that everyone has access to.
nom_cal_exp = oncat.Experiment.retrieve(
    "IPTS-19564",
    facility="SNS",
    instrument="NOM"
)

nom_cal_exp

Note that the object we got back was an `ONCatRepresentation`.  This is just a slightly more convenient wrapper around the information we got back from the API, which has a nested, "dictionary of dictionaries" structure.

#### Accessing Fields Using `.get()`

Similarly to when we printed out the names of facilities and instruments, we can directly access the various bits of information above using `.get()`, for example when printing the title:

In [None]:
nom_cal_exp.get("title")

This syntax allows you to "drill down" into the deeply-nested fields within the structure using dot-delimited paths.  For example, there is a string containing a comma-separated list of all the runs that were taken during the experiment.  This is at `indexed` -> `run_number` -> `ranges` in the structure above, and we can directly access that information as follows:

In [None]:
nom_cal_exp.get("indexed.run_number.ranges")

#### EXERCISE C

As a quick way to get an idea of when an experiment took place, we store the the `earliest` and `latest` time at which a datafile has been `created`.

Look again at all the experiment information, and use `.get()` to print out both of those times on the `nom_cal_exp` object.

### 4 - Datafile Information

#### Retrieving All Datafiles for an Experiment

Let's get all the datafiles for the same calibration experiment we looked at previously:

In [None]:
datafiles = oncat.Datafile.list(
    facility="SNS",
    instrument="NOM",
    experiment="IPTS-19564",
)

len(datafiles)

Let's take a look at what a single entry for a datafile contains:

In [None]:
datafiles[0]

So, quite a lot of stuff...  This is because we ingest as much of the metadata as we can from the datafiles as they come off the beamlines.

#### Accessing Information on Datafiles 

The datafile objects we get from the API can be accessed in much the same way as the experiment objects we looked at before, except different information is stored.

Every datafile has a location:

In [None]:
datafiles[0].get("location")

SNS instruments (and WAND<sup>2</sup>) work in terms of "runs", so datafiles will have a corresponding run number:

In [None]:
datafiles[0].get("indexed.run_number")

We store when datafiles are created:

In [None]:
datafiles[0].get("created")

But the vast majority of the remaining info is nested inside the metadata field:

In [None]:
datafiles[0].get("metadata")

#### Easily Seeing All Fields at a Glace

With all that metadata it can be hard to find what you're looking for.

Luckily, there is an easier way to see all the dot-delimited paths in a given datafile:

In [None]:
datafiles[0].nodes()

#### EXERCISE D

Find which dot-limited path corresponds to the average rotation speed of chopper 1, and then use that path to print out the speed for all the runs in the experiment.

#### Filtering by Fields Using "Projections"

It is also possible to ask for a much smaller sub-set of information for each datafile, using something called a projection.  This means that we can ask for exactly what we need for a very large number of datafiles and not have to wait too long for a result.

A projection is just a list of strings, where each item is the same kind of dot-delimeted paths we were working with previously.  For example, a projection that retrieved run number and sample name for each datafile might look like this:

In [None]:
projection = [
    "indexed.run_number",
    "metadata.entry.sample.name",
]

We could then use that projection to get run number and sample name information back for *all* the experiments we have access to:

In [None]:
print("This may takea a little while...  Please wait.")

datafiles = oncat.Datafile.list(
    facility="SNS",
    instrument="NOM",
    projection=projection,
)

print("Done!")

Each datafile will now contain far less information:

In [None]:
datafiles[0]

### FINAL EXERCISE - Putting it All Together

Now is the time to take everything you have learned and to put it together to carry out a search for a particular set of runs.

Searching across **all** runs available to you on NOMAD, print the **proton charge**, **total counts** and **location** of the runs that meet **all three** of the following criteria:

1. a sample name equal to `"V rod"`,

2. a proton charge of between `1.9e12` and `2.1e12`; and

3. a total count greater than `1.0e9`.

#### Step 1

Get a list of all datafiles which includes all the information we'd like to search through:

#### Step 2

Loop over each datafile in the list and only print the required information for the runs we need: