# SciCat workshop exercise

This exercise walks you through downloading a dataset and data files from SciCat and uploading processed data to SciCat.
It uses a basic, contrived workflow to process the data using [Scipp](https://scipp.github.io/).

In [None]:
import scipp as sc
from scitacean import Client, Dataset
from scitacean.transfer.ssh import SSHFileTransfer

%matplotlib widget

## Setup

The first cell contains some workshop-specific configuration.

The production instance of SciCat is currently located at `"https://scicat.ess.eu/api/v3"`.
But we use the staging instance here.
So we can mess around without having to worry about breaking anything important.

The source folder is where the data files for a SciCat dataset are stored.
In production, it will typically be under `/ess/data` with a path which encodes instrument, date, and proposal.
But that is for permanent storage.
Here, we use a path that we have full control over and can play around with.
(The `pid.pid` placeholder will ensure that every dataset gets its own folder.

In [None]:
scicat_url = "https://staging.scicat.ess.eu/api/v3"
source_folder = "/mnt/groupdata/scicat/upload/workshop/20230322/{pid.pid}"

Get your access token from SciCat

1. Log in at `https://staging.scicat.ess.eu`
2. Click on your user icon in the top-right corner and go to 'Settings'.
3. Copy 'Catamel Token' as a string to the `token` variable below.

In [None]:
token = "<YOUR TOKEN>"

Set the host name that you use to connect to 'login' with SSH.
Your `ssh-agent` must be set up to connect to this host without asking for a password / passphrase on the terminal.
See `setup.md` for details.

In [None]:
ssh_host = "login.esss.dk"

## Fetch the input data

Create a client to talk to the SciCat server and file server:

In [None]:
client = Client.from_token(
    url=scicat_url,
    token=token,
    file_transfer=SSHFileTransfer(
        host=ssh_host,
        source_folder=source_folder,
    ),
)

Find the ID of the raw dataset in the web interface of SciCat:

In [None]:
input_pid = "<TODO>"

1. Download the dataset with the given PID.
2. Inspect the dataset to make sure it is the correct one.
3. Download its files to a local folder of your choice.

Check out https://scicatproject.github.io/scitacean/ to find out how to do this.

In [None]:
# TODO

## Process the data

The data is a crude mock up of a wavelength spectrum.
Your task is to 

1. Load the data (using `scipp.io.load_hdf5(filename)`).
2. Inspect the data, e.g. by plotting it.
3. Determine the background and subtract it from the raw data.
   Don't go too crazy, just find a decent estimate.
4. Find and normalise (divide by) the proton charge.
5. Inspect the thus corrected data.

In [None]:
# TODO

## Save the derived data

1. Use [DataArray.save_hdf5](https://scipp.github.io/generated/classes/scipp.DataArray.html#scipp.DataArray.save_hdf5) to save the corrected data to file.
2. Make a derived dataset from the input dataset and the file you just wrote.
   (Tip: Use [Dataset.derive](https://scicatproject.github.io/scitacean/generated/classes/scitacean.Dataset.html#scitacean.Dataset.derive).)
3. Inspect the derived dataset in Jupyter.
    - Do all fields make sense?
    - Is the file path correct?
    - Is the scientific metadata meaningful and did you include everything that you might want to access in the future?

In [None]:
# TODO

## Upload to SciCat

1. Upload the derived dataset and data file to SciCat (using the client from before).
   - Use [client.upload_new_dataset_now](https://scicatproject.github.io/scitacean/generated/classes/scitacean.Client.html#scitacean.Client.upload_new_dataset_now).
   - Capture the returned datasets and inspect it in Jupyter.
2. Inspect the dataset in the web interface and the file with SSH.

<div class="alert alert-warning">

**Warning**

Every time you call `client.upload_new_dataset_now`, it will create a new dataset in SciCat and upload a copy of the file.
Ideally, do not keep a call to this function around in the notebook so you don't accidentally end up uploading lots of duplicate data.
</div>

In [None]:
# TODO