# Scicat Exercise

## Upload - Scitacean

### Authentication

Go to [scicat user page](https://staging.scicat.ess.eu/user) to copy the token.

In [1]:
import getpass

TOKEN = getpass.getpass("Enter your token: ")

### Create Dataset Object

In [2]:
from scitacean import Client, Dataset, RemotePath
from scitacean import DatasetType
from scitacean.transfer.copy import CopyFileTransfer

MY_NAME = "YOUR_NAME"
MY_EMAIL = "YOUR_EMAIL"
PROPOSAL_ID = "YOUR_PROPOSAL_ID"

source_folder = RemotePath("/PATH/TO/THE/SOURCE/FOLDER/TO/DATASET")
client = Client.from_token(
    url="https://staging.scicat.ess.eu/api/v3",
    token=TOKEN,
    file_transfer=CopyFileTransfer(source_folder=source_folder),
)

raw_dataset = Dataset(
    type=DatasetType.DERIVED,
    contact_email="sunyoung.yoo@ess.eu",
    investigator="MY_NAME",
    owner=MY_NAME,
    owner_email=MY_EMAIL,
    used_software=["scipp", "pymuhrec"],
    data_format="tiff",
    is_published=False,
    owner_group=PROPOSAL_ID,
    access_groups=[PROPOSAL_ID],
    instrument_id=None,
    techniques=[],
    keywords=["DMSC Summer School 2025", "Powder"],
    license="unknown",
    proposal_id=PROPOSAL_ID,
    source_folder=source_folder.posix,
    name="Summer School Reduced Dataset",
    description="Awesome reduced dataset from the DMSC Summer School 2025",
)
raw_dataset

Unnamed: 0,Name,Type,Value,Description
*,creation_time,datetime,2025-08-21 09:51:51+0000,"Time when dataset became fully available on disk, i.e. all containing files have been written, or the dataset was created in SciCat.<br>It is expected to be in ISO8601 format according to specifications for internet date/time format in RFC 3339, chapter 5.6 (https://www.rfc-editor.org/rfc/rfc3339#section-5).<br>Local times without timezone/offset info are automatically transformed to UTC using the timezone of the API server."
*,input_datasets,list[PID],,Array of input dataset identifiers used in producing the derived dataset. Ideally these are the global identifier to existing datasets inside this or federated data catalogs.
*,source_folder,RemotePath,RemotePath('/PATH/TO/THE/SOURCE/FOLDER/TO/DATASET'),"Absolute file path on file server containing the files of this dataset, e.g. /some/path/to/sourcefolder. In case of a single file dataset, e.g. HDF5 data, it contains the path up to, but excluding the filename. Trailing slashes are removed."
,description,str,Awesome reduced dataset from the DMSC Summer School 2025,Free text explanation of contents of dataset.
,name,str,Summer School Reduced Dataset,"A name for the dataset, given by the creator to carry some semantic meaning. Useful for display purposes e.g. instead of displaying the pid. Will be autofilled if missing using info from sourceFolder."
,pid,PID,,Persistent identifier of the dataset.
,proposal_id,str,YOUR_PROPOSAL_ID,The ID of the proposal to which the dataset belongs.

0,1,2,3,4
*,contact_email,str,sunyoung.yoo@ess.eu,"Email of the contact person for this dataset. The string may contain a list of emails, which should then be separated by semicolons."
*,investigator,str,MY_NAME,"First name and last name of the person or people pursuing the data analysis. The string may contain a list of names, which should then be separated by semicolons."
*,owner,str,YOUR_NAME,"Owner or custodian of the dataset, usually first name + last name. The string may contain a list of persons, which should then be separated by semicolons."
*,owner_group,str,YOUR_PROPOSAL_ID,Name of the group owning this item.
*,used_software,list[str],"['scipp', 'pymuhrec']","A list of links to software repositories which uniquely identifies the pieces of software, including versions, used for yielding the derived data."
,access_groups,list[str],['YOUR_PROPOSAL_ID'],List of groups which have access to this item.
,api_version,str,,Version of the API used in creation of the dataset.
,classification,str,,"ACIA information about AUthenticity,COnfidentiality,INtegrity and AVailability requirements of dataset. E.g. AV(ailabilty)=medium could trigger the creation of a two tape copies. Format 'AV=medium,CO=low'"
,comment,str,,Comment the user has about a given dataset.
,created_at,datetime,,Date and time when this record was created. This field is managed by mongoose with through the timestamp settings. The field should be a string containing a date in ISO 8601 format (2024-02-27T12:26:57.313Z)

Local,Remote,Size

Name,Value


## Add file path to the dataset object

In [3]:
raw_dataset.add_local_files("/PATH/TO/THE/FILE")

## Upload the dataset!

In [None]:
client.upload_new_dataset_now(dataset=raw_dataset)

## Upload Dataset using SFTP Connection
### Authenticate to SFTP server

In [4]:
SFTP_USER_NAME = input("Enter the SFTP username: ")
SFTP_SERVER_PASSWORD = getpass.getpass("Enter the SFTP server password: ")

### Create file transfer, scicat client and dataset object

In [5]:
from scitacean import Client, Dataset, RemotePath
from scitacean import DatasetType
from scitacean.transfer.sftp import SFTPFileTransfer

MY_NAME = "YOUR_NAME"
MY_EMAIL = "YOUR_EMAIL"
PROPOSAL_ID = "YOUR_PROPOSAL_ID"

source_folder = RemotePath("/PATH/TO/THE/SOURCE/FOLDER/TO/DATASET")
client = Client.from_token(
    url="https://staging.scicat.ess.eu/api/v3",
    token=TOKEN,
    file_transfer=SFTPFileTransfer(
        host="SFTP_HOST",
        username=SFTP_USER_NAME,
        password=SFTP_SERVER_PASSWORD,
        source_folder=source_folder,
    ),
)

raw_dataset = Dataset(
    type=DatasetType.DERIVED,
    contact_email="sunyoung.yoo@ess.eu",
    investigator="MY_NAME",
    owner=MY_NAME,
    owner_email=MY_EMAIL,
    used_software=["scipp", "pymuhrec"],
    data_format="tiff",
    is_published=False,
    owner_group=PROPOSAL_ID,
    access_groups=[PROPOSAL_ID],
    instrument_id=None,
    techniques=[],
    keywords=["DMSC Summer School 2025", "Powder"],
    license="unknown",
    proposal_id=PROPOSAL_ID,
    source_folder=source_folder.posix,
    name="Summer School Reduced Dataset",
    description="Awesome reduced dataset from the DMSC Summer School 2025",
)
raw_dataset

Unnamed: 0,Name,Type,Value,Description
*,creation_time,datetime,2025-08-21 09:52:00+0000,"Time when dataset became fully available on disk, i.e. all containing files have been written, or the dataset was created in SciCat.<br>It is expected to be in ISO8601 format according to specifications for internet date/time format in RFC 3339, chapter 5.6 (https://www.rfc-editor.org/rfc/rfc3339#section-5).<br>Local times without timezone/offset info are automatically transformed to UTC using the timezone of the API server."
*,input_datasets,list[PID],,Array of input dataset identifiers used in producing the derived dataset. Ideally these are the global identifier to existing datasets inside this or federated data catalogs.
*,source_folder,RemotePath,RemotePath('/PATH/TO/THE/SOURCE/FOLDER/TO/DATASET'),"Absolute file path on file server containing the files of this dataset, e.g. /some/path/to/sourcefolder. In case of a single file dataset, e.g. HDF5 data, it contains the path up to, but excluding the filename. Trailing slashes are removed."
,description,str,Awesome reduced dataset from the DMSC Summer School 2025,Free text explanation of contents of dataset.
,name,str,Summer School Reduced Dataset,"A name for the dataset, given by the creator to carry some semantic meaning. Useful for display purposes e.g. instead of displaying the pid. Will be autofilled if missing using info from sourceFolder."
,pid,PID,,Persistent identifier of the dataset.
,proposal_id,str,YOUR_PROPOSAL_ID,The ID of the proposal to which the dataset belongs.

0,1,2,3,4
*,contact_email,str,sunyoung.yoo@ess.eu,"Email of the contact person for this dataset. The string may contain a list of emails, which should then be separated by semicolons."
*,investigator,str,MY_NAME,"First name and last name of the person or people pursuing the data analysis. The string may contain a list of names, which should then be separated by semicolons."
*,owner,str,YOUR_NAME,"Owner or custodian of the dataset, usually first name + last name. The string may contain a list of persons, which should then be separated by semicolons."
*,owner_group,str,YOUR_PROPOSAL_ID,Name of the group owning this item.
*,used_software,list[str],"['scipp', 'pymuhrec']","A list of links to software repositories which uniquely identifies the pieces of software, including versions, used for yielding the derived data."
,access_groups,list[str],['YOUR_PROPOSAL_ID'],List of groups which have access to this item.
,api_version,str,,Version of the API used in creation of the dataset.
,classification,str,,"ACIA information about AUthenticity,COnfidentiality,INtegrity and AVailability requirements of dataset. E.g. AV(ailabilty)=medium could trigger the creation of a two tape copies. Format 'AV=medium,CO=low'"
,comment,str,,Comment the user has about a given dataset.
,created_at,datetime,,Date and time when this record was created. This field is managed by mongoose with through the timestamp settings. The field should be a string containing a date in ISO 8601 format (2024-02-27T12:26:57.313Z)

Local,Remote,Size

Name,Value


### Add Files and Upload

In [None]:
raw_dataset.add_local_files("/PATH/TO/THE/FILE")
client.upload_new_dataset_now(dataset=raw_dataset)