# Access Multiple Datasets
## DMSC Summer SChool
  
This notebook show how to load an arbitrary number of datasets from SciCat, access their information, and download programmatically the first file of each dataset.

Load standard libraries

In [None]:
import sys
import os

URL of the scicat instance containing the data

In [None]:
scicat_instance = "https://staging.scicat.ess.eu/api/v3"

Valid Authentication token  
(Also called access token or SciCat token)  
_To obtain the token, log in on your scicat instance, go to User->settings page, and click on the __copy to clipboard__ icon added at the end of the __SciCat Token__ ._

![SciCat User Settings](scicat_user_settings.png)

Access token example:  
`eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJfaWQiOiI2MzliMmE1MWI0MTU0OWY1M2RmOWVjMzYiLCJyZWFsbSI6ImxvY2FsaG9zdCIsInVzZXJuYW1lIjoiaW5nZXN0b3IiLCJlbWFpbCI6InNjaWNhdGluZ2VzdG9yQHlvdXIuc2l0ZSIsImVtYWlsVmVyaWZpZWQiOnRydWUsImF1dGhTdHJhdGVneSI6ImxvY2FsIiwiaWQiOiI2MzliMmE1MWI0MTU0OWY1M2RmOWVjMzYiLCJpYXQiOjE2OTIwODc0ODUsImV4cCI6MTY5MjA5MTA4NX0.Phca4UF7WKY367-10Whgwd5jaFjiPku6WsgiPeDh_-o`

In [None]:
token="<YOUR_SCICAT_TOKEN>"

We want to work with all the notebooks that have been prepared for this course and are available in SciCat.  
The list of the dataset's pids is in the following cell.  
  
If you are courious how this list has been obtained, below is the linux command line:
```bash
curl \
  -X 'GET' \
  'https://staging.scicat.ess.eu/api/v3/datasets/fullquery?limits=%7B%20%22skip%22%3A%200%2C%20%22limit%22%3A%2025%2C%20%22order%22%3A%20%22creationTime%3Adesc%22%20%7D&fields=%7B%22mode%22%3A%7B%7D%2C%22keywords%22%3A%5B%22DMSC%20Summer%20School%202023%22%5D%7D' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <YOUR_SCICAT_TOKEN>' | \
  jq . | \
  grep pid | \
  cut -d\" -f4
```

In [None]:
dataset_pids = [
    "20.500.12269/53ec1287-b0fe-4171-bf71-80673a54262e",
    "20.500.12269/488681d6-73cf-477e-8a30-1d625354cc85",
    "20.500.12269/f2947f0e-97e6-470b-a914-9dc8ac03c893",
    "20.500.12269/c566043f-f37c-417f-8dc7-d9d17b25c8ef",
    "20.500.12269/087a0844-d0d8-4f3d-88ba-e6505eea8c7a",
    "20.500.12269/d84012fe-679d-4608-82a8-8e39ad092f40",
    "20.500.12269/249a7405-8ab9-4859-8ea5-e691b80e4007",
    "20.500.12269/17dbda39-0ce7-493c-82fc-24c09b35e0c9",
    "20.500.12269/bdfa6765-1479-4b59-a095-86b75f3ae295",
    "20.500.12269/035d4cbd-e2a2-45a4-a919-d66216ccb29a",
    "20.500.12269/7a3cb15d-992d-4409-b62e-024b509d570c",
    "20.500.12269/25f58b6c-8f45-454f-bd22-ca9a398ab24b",
    "20.500.12269/0445cf2d-53a3-4f3a-8714-be6ea2aeccf2",
    "20.500.12269/f5c6fb62-4dfd-469c-b733-c2f2ca499eb4",
    "20.500.12269/93054ac6-86b5-435f-b294-d9195481b3ad",
    "20.500.12269/d744a02f-548d-4ee8-9b3a-51549fe591f7",
]

Import Scitacean
For more information please check the official [repository](https://github.com/SciCatProject/scitacean) and [documentation](https://scicatproject.github.io/scitacean/)

In [None]:
from scitacean import Client
from scitacean.transfer.ssh import SSHFileTransfer

Instantiate scitacean client

In [None]:
client = Client.from_token(
    url=scicat_instance,
    token=token,
    file_transfer=SFTPFileTransfer(
        host="sftpserver2.esss.dk"
    ))

Load all the datasets whose pids are listed above.  
  
e are using a list comprehension to loop on all the pids and load the dataset through scitacean client.

In [None]:
datasets = [client.get_dataset(pid) for pid in dataset_pids]

Let's explore all the metadata of the first dataset

In [None]:
datasets[0]

As we already saw in the [single dataset notebook](./access_individual_dataset.ipynb),  
we can expand __Files__ and __Scientific Metadata__ to explore further the dataset information

Let's download the first file of each dataset using a list comprehension.

In [None]:
datasets = [
    sct_client.download_files(dataset,target="../data",select=dataset.files[0].remote_path.name)
    for dataset
    in datasets
]

Now we can review if the file has been downloaded.
Let's check the first dataset.

In [None]:
datasets[0]