# Access Individual Dataset
## DMSC Summer School
  
This notebook contains an example on how to load a single dataset from SciCat, view it and retrieve the data files

Load standard libraries

In [1]:
import sys
import os

URL of the scicat instance containing the data

In [2]:
scicat_instance = "https://staging.scicat.ess.eu/api/v3"

Valid Authentication token  
(Also called access token or SciCat token)  
_To obtain the token, log in on your scicat instance, go to User->settings page, and click on the __copy to clipboard__ icon added at the end of the __SciCat Token__ ._

![SciCat User Settings](scicat_user_settings.png)

Access token example:  
`eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJfaWQiOiI2MzliMmE1MWI0MTU0OWY1M2RmOWVjMzYiLCJyZWFsbSI6ImxvY2FsaG9zdCIsInVzZXJuYW1lIjoiaW5nZXN0b3IiLCJlbWFpbCI6InNjaWNhdGluZ2VzdG9yQHlvdXIuc2l0ZSIsImVtYWlsVmVyaWZpZWQiOnRydWUsImF1dGhTdHJhdGVneSI6ImxvY2FsIiwiaWQiOiI2MzliMmE1MWI0MTU0OWY1M2RmOWVjMzYiLCJpYXQiOjE2OTIwODc0ODUsImV4cCI6MTY5MjA5MTA4NX0.Phca4UF7WKY367-10Whgwd5jaFjiPku6WsgiPeDh_-o`

In [3]:
token="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJfaWQiOiI2MzliMmE1MWI0MTU0OWY1M2RmOWVjMzYiLCJyZWFsbSI6ImxvY2FsaG9zdCIsInVzZXJuYW1lIjoiaW5nZXN0b3IiLCJlbWFpbCI6InNjaWNhdGluZ2VzdG9yQHlvdXIuc2l0ZSIsImVtYWlsVmVyaWZpZWQiOnRydWUsImF1dGhTdHJhdGVneSI6ImxvY2FsIiwiaWQiOiI2MzliMmE1MWI0MTU0OWY1M2RmOWVjMzYiLCJpYXQiOjE2OTMyMDg5OTYsImV4cCI6MTY5MzIxMjU5Nn0.OVP5fBuJehVUBKUGLIzpqPwD7NNOkpSgyVwm71cOAvM"

Dataset pid

In [4]:
dataset_pid = "20.500.12269/761fd17f-e0a8-4bd4-9e70-67ff8647b3f4"

Import Scitacean
For more information please check the official [repository](https://github.com/SciCatProject/scitacean) and [documentation](https://scicatproject.github.io/scitacean/)

In [5]:
from scitacean import Client
from scitacean.transfer.ssh import SSHFileTransfer

  "class": algorithms.Blowfish,


Instantiate scitacean client

In [6]:
sct_client = Client.from_token(
    url=scicat_instance,
    token=token,
    file_transfer=SSHFileTransfer(
        host="login.esss.dk"
    ))

Load the scicat dataset.  
_Important_: you need to know the dataset pid

In [7]:
dataset = sct_client.get_dataset("20.500.12269/761fd17f-e0a8-4bd4-9e70-67ff8647b3f4")

Explore all the _metadata_ loaded from scicat.  
  
Click on the file section to view th elist of files associated with this datasets

In [8]:
dataset

Unnamed: 0,Name,Type,Value,Description
*,creation_time,datetime,2023-07-12 12:21:28+0000,"Time when dataset became fully available on disk, i.e. all containing files have been written. Format according to chapter 5.6 internet date/time format in RFC 3339. Local times without timezone/offset info are automatically transformed to UTC using the timezone of the API server."
*,input_datasets,list[PID],"[PID(prefix='20.500.12269', pid='2f9d92c5-b1b5-491f-901d-91ff2ea30d64'), PID(prefix='20.500.12269', pid='86413203-cd36-4559-92c3-c07b9ec4fede')]",Array of input dataset identifiers used in producing the derived dataset. Ideally these are the global identifier to existing datasets inside this or federated data catalogs. This field is required if the dataset is a Derived dataset.
*,source_folder,RemotePath,RemotePath('/ess/data/loki/2022/legacy/6cff1b19-ba80-43bd-ba3c-e96dacbce772'),"Absolute file path on file server containing the files of this dataset, e.g. /some/path/to/sourcefolder. In case of a single file dataset, e.g. HDF5 data, it contains the path up to, but excluding the filename. Trailing slashes are removed."
,description,str,Calibrated nexus from EFU processing with initial calibration. Trans 60394. Sans 60395,Free text explanation of contents of dataset.
,name,str,Calibrated nexus 60394 60395,"A name for the dataset, given by the creator to carry some semantic meaning. Useful for display purposes e.g. instead of displaying the pid. Will be autofilled if missing using info from sourceFolder."
,pid,PID,20.500.12269/761fd17f-e0a8-4bd4-9e70-67ff8647b3f4,Persistent Identifier for datasets derived from UUIDv4 and prepended automatically by site specific PID prefix like 20.500.12345/

0,1,2,3,4
*,contact_email,str,max.novelli@ess.eu,"Email of the contact person for this dataset. The string may contain a list of emails, which should then be separated by semicolons."
*,investigator,str,Judith Houston,"First name and last name of the person or people pursuing the data analysis. The string may contain a list of names, which should then be separated by semicolons."
*,owner,str,Judith Houston,"Owner or custodian of the dataset, usually first name + last name. The string may contain a list of persons, which should then be separated by semicolons."
*,owner_group,str,loki,"Defines the group which owns the data, and therefore has unrestricted access to this data. Usually a pgroup like p12151"
*,used_software,list[str],['EFU processing with initial calibration'],"A list of links to software repositories which uniquely identifies the pieces of software, including versions, used for yielding the derived data. This field is required if the dataset is a Derived dataset."
,access_groups,list[str],"['ecdc', 'swap', 'dram', 'ess']",Optional additional groups which have read access to the data. Users which are members in one of the groups listed here are allowed to access this data. The special group 'public' makes data available to all users.
,api_version,str,1.0,Version of the API used in creation of the dataset.
,classification,str,"IN=medium,AV=low,CO=low","ACIA information about AUthenticity,COnfidentiality,INtegrity and AVailability requirements of dataset. E.g. AV(ailabilty)=medium could trigger the creation of a two tape copies. Format 'AV=medium,CO=low'"
,comment,str,,Comment the user has about a given dataset.
,created_at,datetime,2023-07-27 14:38:58+0000,Date and time when this record was created. This property is added and maintained by mongoose.

Local,Remote,Size
,RemotePath('60395-2022-02-28_2215.nxs'),51.03 MiB
,RemotePath('60394-2022-02-28_2215.nxs'),35.03 MiB

Name,Value
sample,ISIS polymer []
trans id,60394 []
sans id,60395 []
original comment,60395 = negative tof []
trans intensity,None []
sans intensity,None []
run,THIRD []
trans peak intensity,None []
sans peak intensity,None []
trans negative tof,False []


Download one of the files 

In [9]:
dataset = sct_client.download_files(dataset, target="../data", select="60395-2022-02-28_2215.nxs")



You need to authenticate to access login.esss.dk


Username:  max.novelli
Password:  ········


Check if the file is downloaded and the local path

In [10]:
dataset

Unnamed: 0,Name,Type,Value,Description
*,creation_time,datetime,2023-07-12 12:21:28+0000,"Time when dataset became fully available on disk, i.e. all containing files have been written. Format according to chapter 5.6 internet date/time format in RFC 3339. Local times without timezone/offset info are automatically transformed to UTC using the timezone of the API server."
*,input_datasets,list[PID],"[PID(prefix='20.500.12269', pid='2f9d92c5-b1b5-491f-901d-91ff2ea30d64'), PID(prefix='20.500.12269', pid='86413203-cd36-4559-92c3-c07b9ec4fede')]",Array of input dataset identifiers used in producing the derived dataset. Ideally these are the global identifier to existing datasets inside this or federated data catalogs. This field is required if the dataset is a Derived dataset.
*,source_folder,RemotePath,RemotePath('/ess/data/loki/2022/legacy/6cff1b19-ba80-43bd-ba3c-e96dacbce772'),"Absolute file path on file server containing the files of this dataset, e.g. /some/path/to/sourcefolder. In case of a single file dataset, e.g. HDF5 data, it contains the path up to, but excluding the filename. Trailing slashes are removed."
,description,str,Calibrated nexus from EFU processing with initial calibration. Trans 60394. Sans 60395,Free text explanation of contents of dataset.
,name,str,Calibrated nexus 60394 60395,"A name for the dataset, given by the creator to carry some semantic meaning. Useful for display purposes e.g. instead of displaying the pid. Will be autofilled if missing using info from sourceFolder."
,pid,PID,20.500.12269/761fd17f-e0a8-4bd4-9e70-67ff8647b3f4,Persistent Identifier for datasets derived from UUIDv4 and prepended automatically by site specific PID prefix like 20.500.12345/

0,1,2,3,4
*,contact_email,str,max.novelli@ess.eu,"Email of the contact person for this dataset. The string may contain a list of emails, which should then be separated by semicolons."
*,investigator,str,Judith Houston,"First name and last name of the person or people pursuing the data analysis. The string may contain a list of names, which should then be separated by semicolons."
*,owner,str,Judith Houston,"Owner or custodian of the dataset, usually first name + last name. The string may contain a list of persons, which should then be separated by semicolons."
*,owner_group,str,loki,"Defines the group which owns the data, and therefore has unrestricted access to this data. Usually a pgroup like p12151"
*,used_software,list[str],['EFU processing with initial calibration'],"A list of links to software repositories which uniquely identifies the pieces of software, including versions, used for yielding the derived data. This field is required if the dataset is a Derived dataset."
,access_groups,list[str],"['ecdc', 'swap', 'dram', 'ess']",Optional additional groups which have read access to the data. Users which are members in one of the groups listed here are allowed to access this data. The special group 'public' makes data available to all users.
,api_version,str,1.0,Version of the API used in creation of the dataset.
,classification,str,"IN=medium,AV=low,CO=low","ACIA information about AUthenticity,COnfidentiality,INtegrity and AVailability requirements of dataset. E.g. AV(ailabilty)=medium could trigger the creation of a two tape copies. Format 'AV=medium,CO=low'"
,comment,str,,Comment the user has about a given dataset.
,created_at,datetime,2023-07-27 14:38:58+0000,Date and time when this record was created. This property is added and maintained by mongoose.

Local,Remote,Size
../data/60395-2022-02-28_2215.nxs,RemotePath('60395-2022-02-28_2215.nxs'),51.03 MiB
,RemotePath('60394-2022-02-28_2215.nxs'),35.03 MiB

Name,Value


Now the file is ready to be used by your analysis

In [None]:
dataset.files[0].local_path