# Access Individual Dataset
## DMSC Summer School
  
This notebook contains an example on how to load a single dataset from SciCat, view it and retrieve the data files

URL of the scicat instance containing the data

In [None]:
scicat_instance = "https://staging.scicat.ess.eu/api/v3"

Valid Authentication token  
(Also called access token or SciCat token)  
_Follow the steps listed below to obtain the token_, 
- visit (ESS SciCat staging environment)[https://staging.scicat.ess.eu]
- log in using the credentials provided
- go to User->settings page, 
- and click on the __copy to clipboard__ icon added at the end of the __SciCat Token__ .

![SciCat User Settings](./scicat_user_settings.png)

Access token example:  
`eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJfaWQiOiI2MzliMmE1MWI0MTU0OWY1M2RmOWVjMzYiLCJyZWFsbSI6ImxvY2FsaG9zdCIsInVzZXJuYW1lIjoiaW5nZXN0b3IiLCJlbWFpbCI6InNjaWNhdGluZ2VzdG9yQHlvdXIuc2l0ZSIsImVtYWlsVmVyaWZpZWQiOnRydWUsImF1dGhTdHJhdGVneSI6ImxvY2FsIiwiaWQiOiI2MzliMmE1MWI0MTU0OWY1M2RmOWVjMzYiLCJpYXQiOjE2OTIwODc0ODUsImV4cCI6MTY5MjA5MTA4NX0.Phca4UF7WKY367-10Whgwd5jaFjiPku6WsgiPeDh_-o`
(You need to use your own token, this one won't work for you.)

In [None]:
token = "<YOUR_SCICAT_TOKEN>"

Dataset pid

In [None]:
dataset_pid = "20.500.12269/0445cf2d-53a3-4f3a-8714-be6ea2aeccf2"

User name and access key used to access files.
The ssh key file is provided at the beginning of the session.
Note that the key filename only works on the School's JupyterHub.

In [None]:
sftp_username = "dss2024"
sftp_key_filename = "/home/jovyan/.ssh/id_summerschool2024"

Local folder where the downloaded data should be saved

In [None]:
local_data_folder = "./data"

Import Scitacean.
For more information please check the official [repository](https://github.com/SciCatProject/scitacean) and [documentation](https://scicatproject.github.io/scitacean/)

In [None]:
from scitacean import Client
from scitacean.transfer.sftp import SFTPFileTransfer

Function to perform some magic and establish connection to the data repository

In [None]:
def connect(host, port):
    from paramiko import SSHClient, AutoAddPolicy

    client = SSHClient()
    client.load_system_host_keys()
    client.set_missing_host_key_policy(AutoAddPolicy())
    client.connect(
        hostname=host, 
        username=sftp_username,
        key_filename=sftp_key_filename,
        timeout=1)
    return client.open_sftp()

Instantiate scitacean client

In [None]:
client = Client.from_token(
    url=scicat_instance,
    token=token,
    file_transfer=SFTPFileTransfer(
        host="sftpserver2.esss.dk",
        connect=connect
    ))

Load the scicat dataset.  
_Important_: you need to know the dataset pid  
  
In this notebook, we are going to use the pid of the dataset containing the SANS notebook and libraries prepared for this course:  
[DMSC Summer School SANS Code](https://staging.scicat.ess.eu/datasets/20.500.12269%2F0445cf2d-53a3-4f3a-8714-be6ea2aeccf2)

In [None]:
dataset = client.get_dataset(dataset_pid)

Explore all the _metadata_ loaded from scicat.  
  
Click on the file section to view th elist of files associated with this datasets

In [None]:
dataset

Expand the Scientific Metadata section.  
As you can see, no metadata has been associated with this dataset.  
This fact should raise an alarm because it makes the files less FAIR.  
If you find a dataset like this, you should contact the data curator or the owner and let them know the lack of Scientific Metadata.  

Let's focus on the files.  
Expand the __Files__ section and review how many files are associated with this dataset.

Let's download the main jupyter notebook

In [None]:
dataset = client.download_files(
    dataset, 
    target=local_data_folder, 
    select="SANS_from_function.ipynb"
)

Check if the file is downloaded and the local path

In [None]:
dataset

Now the file is ready to be used by generate new simulated data on your local storage

In [None]:
dataset.files[0].local_path