# Table of Contents
1. [How to upload our pyiron jobs to coscine?](#intro)
    1. [Create and attach Coscine Interface](#create)
    2. [Make a new 'resource'](#newres)
    3. [Access storage](#access)
    4. [Handle Metadata and upload a file](#fileup)
    5. [Upload a job](#jobup)
2. [coscine package](#coscine)

# How to upload our pyiron jobs to coscine? <a name="intro"></a>

To upload pyiron jobs to coscine, you need to have 
- the most recent version of pyiron_contrib
- the coscine package installed
- a coscine account to upload to
- a coscine token!

The first two are normal python updates/installs (e.g. using `mamba install -c conda-forge coscine` or the like).
For the second one, you log in into cosine and open the [user profile](https://coscine.rwth-aachen.de/user/). Below the 'Personal Information' you will find the 'Access Token' section in which you choose a name for your token and an expiration date and create a token. Copy and store the token in a save place (e.g. a password manager)! It provides __full access__ to all data available to you on coscine!


In [1]:
from pyiron import Project



In [2]:
pr = Project('.')

The project has a new attribute storage_interface, which provides an interface to different storage backends (currently mainly coscine):

In [3]:
pr.storage_interface

## Create and attach Coscine Interface  <a name="create"></a>

We will now create a coscine interface, first just as an idependent object to browse through coscine. Once we decided a folder (aka resource in coscine nomenclature) to upload our jobs to, we attach that folder to the project instance. This way, a reload of the project provides us with access to the same folders again. Resources on coscine are like folders with a single metadata scheme attached to them. Therefore, you might need multiple storages in your storage interface to work with different kind of data. However, most of the time we will probably use the 'sfb1394/AtomisticSimulation' or the 'sfb1394/AtomisticSimulationMD' resource types.

Upon creation of a coscine interface, you have to specify the token or you will be asked to provide it.

In [4]:
co_pr = pr.storage_interface.create.coscine()

Coscine token:  ········


The `co_pr` feels like a normal pyiron project (only beeing a lot slower). However, it does not (yet?) support '/' seperated paths.

In [5]:
co_pr

{'groups': ['Coscine Demo Project 2022-06', 'SFB1394', 'TestProject'], 'nodes': []}

In [6]:
co_pr['TestProject']

{'groups': ['TestProject2', 'TestProject2'], 'nodes': ['ANewNode', 'aRes', 'AtomisticSimulation', 'CalphadDB', 'Document Library', 'linked dara', 'NanoIndentation', 'NewCalphadDB', 'SamplesTest', 'some', 't4', 't5', 'Test', 'Test2', 'TestCreateNewRes']}

In [7]:
a_res = co_pr['TestProject']['AtomisticSimulation']

Once we browsed to the right folder to store our files/calculations in, we attach it to the storage interface of the project:

In [9]:
pr.storage_interface.attach('my_name_for_the_storage', a_res)

Coscine token:  ········


This will again ask you for the credentials to log in, since it opens a new connection to the storage to ensure that reloading will work.

## Make a new 'resource' <a name="newres"></a>
If you do not have a resource to attach, you may make a new one in your sub-project using `create_node`

In [10]:
test_co_pr = co_pr['TestProject']

In [11]:
test_co_pr.create_node('SomeNewNode')

{'groups': [], 'nodes': []}

In [12]:
test_co_pr

{'groups': ['TestProject2', 'TestProject2'], 'nodes': ['ANewNode', 'aRes', 'AtomisticSimulation', 'CalphadDB', 'Document Library', 'linked dara', 'NanoIndentation', 'NewCalphadDB', 'SamplesTest', 'some', 'SomeNewNode', 't4', 't5', 'Test', 'Test2', 'TestCreateNewRes']}

In [13]:
test_co_pr.create_group?

[0;31mSignature:[0m
[0mtest_co_pr[0m[0;34m.[0m[0mcreate_group[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mproject_name[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdisplay_name[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mproject_description[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mprincipal_investigators[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mproject_start[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mproject_end[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdiscipline[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mparticipating_organizations[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mproject_keywords[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmetadata_visibility[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgrant_id[

## Access storage <a name="access"></a>

Now, or also after a reload of the project, you may access the storage of the storage interface. It lists all storage locations known to it, their type and if they are connected or not:

In [14]:
pr.storage_interface.storage

Storage Access for ['my_name_for_the_storage(coscine, connected)'].

If you access a storage which is not connected, the connection will be established and you may be asked for the credentials.

In [15]:
a_res = pr.storage_interface.storage['my_name_for_the_storage']

Through the storage, you can access all data inside this folder:

In [16]:
a_res

{'groups': [], 'nodes': ['MyFileName', 'changed_key.txt', 'empty2.txt', 'empty3.txt', 'some.txt', 'some2.txt', 'test.h5']}

In [17]:
file = a_res['empty2.txt']

In [18]:
file.metadata

Metadata,Unnamed: 1
ID,some_upload
User,Niklas
Date,20.10.2021


In [19]:
file

VBox(children=(Output(),))

##  Handle Metadata and upload a file <a name="fileup"></a>
This section provides information on how metadata is handled and regular files (like notebooks) are uploaded to coscine.

The storage also provides you with a `metadata_template` which can/must be filled with metadata for the current folder.

In [20]:
# receive the metadata template for this resource
mdf = a_res.metadata_template

If a metadata form is simply given to display, it does not show empty metadata fields. This is nice for viewing metadata of files, but lacks information to fill the form...


In [21]:
mdf

Metadata,Unnamed: 1


Printing the form will provide you with a full view of the scheme. The first column tells you if the specific property is 'Controlled', e.g. by restricting the entries to be one of a specific vocabulary ('V'). 

In [22]:
print(mdf)

+---+----------+-------------------------+-------+
| C | Type     | Property                | Value |
+---+----------+-------------------------+-------+
|   | str      | ID*                     |       |
|   | str      | External/alias ID       |       |
|   | str      | User*                   |       |
|   | datetime | Date*                   |       |
|   | str      | Affiliation             |       |
|   | str      | DOIs                    |       |
| V | str      | Status                  |       |
|   | datetime | Last status update      |       |
|   | str      | Software IDs            |       |
|   | str      | Software environment ID |       |
|   | str      | Sample ID               |       |
|   | str      | Simulation type         |       |
|   | str      | Job submission commands |       |
|   | str      | Computer name           |       |
|   | str      | Node                    |       |
|   | str      | CPU info                |       |
|   | str      | GPU info      

The possible values for a vocabulary controlled field can be seen by

In [23]:
mdf.vocabulary('Status').keys()

['created',
 'initialized',
 'submitted',
 'running',
 'collect',
 'finished',
 'refresh',
 'suspended']

The form can be filled like a `dict` or with the fill method, which takes a dictionary.

In [24]:
mdf['ID'] = 'test'
mdf['User'] = 'Niklas'
mdf['Date'] = '11.04.2023'

In [25]:
mdf

Metadata,Unnamed: 1
ID,test
User,Niklas
Date,2023-11-04


In [26]:
# just a dummy file that can be uploaded
with open('some.txt', 'w') as f:
    f.write('stuff')

Finally, the data file can be uploaded (will throw an exception if fields in the metadata are wrong/missing)

In [27]:
a_res.upload_file(file='some.txt', metadata=mdf)

some.txt:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [28]:
a_res.upload_file?

[0;31mSignature:[0m
[0ma_res[0m[0;34m.[0m[0mupload_file[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mfile[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmetadata[0m[0;34m:[0m [0mcoscine[0m[0;34m.[0m[0mobject[0m[0;34m.[0m[0mMetadataForm[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfilename[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Upload the provided files to the storage
[0;31mFile:[0m      ~/pyiron/software/pyiron_contrib/pyiron_contrib/generic/coscineIo.py
[0;31mType:[0m      method

## Upload a job <a name="jobup"></a>

A job, which is uploaded to a 'sfb1394/AtomisticSimulation' folder should be parsed for metadata automatically. Thus for a job it should be sufficient to run:

In [29]:
job = pr['test']

In [30]:
job

{'groups': ['executable', 'input', 'output'], 'nodes': ['HDF_VERSION', 'NAME', 'TYPE', 'VERSION', 'job_id', 'server', 'status']}

In [31]:
a_res.upload_job(job)

test.h5:   0%|          | 0.00/508k [00:00<?, ?B/s]

In [32]:
a_res

{'groups': [], 'nodes': ['MyFileName', 'changed_key.txt', 'empty2.txt', 'empty3.txt', 'some.txt', 'some2.txt', 'test.h5']}

The upload job also accepts a form, which would be used instead, or updated depending on the choice of the user. In addition, `upload_job` accepts a `dois` keyword to specify a paper as a result of this data:

In [33]:
a_res.upload_job?

[0;31mSignature:[0m
[0ma_res[0m[0;34m.[0m[0mupload_job[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mjob[0m[0;34m:[0m [0mpyiron_base[0m[0;34m.[0m[0mjobs[0m[0;34m.[0m[0mjob[0m[0;34m.[0m[0mgeneric[0m[0;34m.[0m[0mGenericJob[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mform[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mupdate_form[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdois[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Upload a pyiron job to this CoScInE resource

Args:
    job: job object from pyiron
    form: optional metadata form, required if the metadata mapping between job and resource type is unknown.
    update_form(bool): If true update given form, else use as is.
    dois(str): Optional DOI of papers from this data, overwrites doi in form!
[0;31mFile:[0m      ~/pyiron/software/pyiron_contrib/pyiron_contrib/gene

Therefore, I am not sure what kind of side effectc could arise!

# coscine package <a name="coscine"></a>

To have the full flexibility of what can be done on coscine from a notebook, you might want to have a look at the [coscine](https://git.rwth-aachen.de/coscine/community-features/coscine-python-sdk) package itself. Here, I just show a few commands to get projects, resources, and on how to upload a file. Consult the coscine documentation for more.

In [45]:
from getpass import getpass
import coscine
from datetime import datetime

In [49]:
client=coscine.Client(token=getpass())

 ········


In [50]:
client.get_maintenance()

{'displayName': None,
 'url': None,
 'type': None,
 'body': None,
 'startsDate': None,
 'endsDate': None}

In [51]:
client.version

'0.9.2'

In [52]:
# get list of projects
client.projects()

[<coscine.project.Project at 0x150aade45430>,
 <coscine.project.Project at 0x150aaddb94f0>,
 <coscine.project.Project at 0x150aaddb9730>]

In [53]:
# get specific project
test_pr = client.project('TestProject')

In [54]:
print(test_pr)

+-------------------------------------------------------------------------+
|                           Project TestProject                           |
+-------------------------+-----------------------------------------------+
|         Property        |                     Value                     |
+-------------------------+-----------------------------------------------+
|            ID           |      e1983a56-ee35-413e-9a66-771196c0a089     |
|           Name          |                  TestProject                  |
|       Display Name      |                  TestProject                  |
|       Description       |                  TestProject                  |
| Principle Investigators |                 Niklas Siemer                 |
|       Disciplines       |             Materials Science 406             |
|                         |     Physical and Theoretical Chemistry 303    |
|                         | Chemical Solid State and Surface Research 302 |
|           

In [55]:
res = test_pr.resource('aRes')
print(res)

+-----------------------------------------------------------------------+
|                             Resource aRes                             |
+---------------------+-------------------------------------------------+
|       Property      |                      Value                      |
+---------------------+-------------------------------------------------+
|          ID         |       44349a11-9351-4287-994b-1187150c4955      |
|    Resource Name    |                       aRes                      |
|     Display Name    |                       aRes                      |
|     Description     |                 additional stuff                |
|         PID         |  21.11102/44349a11-9351-4287-994b-1187150c4955  |
|         Type        |                     rdsrwth                     |
|     Disciplines     |              Materials Science 406              |
|                     |      Physical and Theoretical Chemistry 303     |
|                     |  Chemical Soli

In [56]:
form = res.metadata_form()

In [57]:
print(form)

+---+----------+---------------------------------------------------+-------+
| C | Type     | Property                                          | Value |
+---+----------+---------------------------------------------------+-------+
|   | str      | ID*                                               |       |
|   | str      | External/alias ID                                 |       |
|   | str      | User*                                             |       |
|   | datetime | Date*                                             |       |
|   | str      | Affiliation                                       |       |
|   | str      | DOIs                                              |       |
|   | str      | Temperature [°C]                                  |       |
|   | str      | Relative humidity [%]                             |       |
|   | str      | Environmental gas                                 |       |
|   | str      | Operator                                          |       |

In [58]:
form['ID'] = 'any name'
form['User'] = 'Niklas'
form['Date'] = datetime.now()

In [59]:
print(form)

+---+----------+---------------------------------------------------+------------+
| C | Type     | Property                                          | Value      |
+---+----------+---------------------------------------------------+------------+
|   | str      | ID*                                               | any name   |
|   | str      | External/alias ID                                 |            |
|   | str      | User*                                             | Niklas     |
|   | datetime | Date*                                             | 2023-04-14 |
|   | str      | Affiliation                                       |            |
|   | str      | DOIs                                              |            |
|   | str      | Temperature [°C]                                  |            |
|   | str      | Relative humidity [%]                             |            |
|   | str      | Environmental gas                                 |            |
|   | str      |

In [60]:
res.upload('MyFileName', 'some.txt', form)

MyFileName:   0%|          | 0.00/193 [00:00<?, ?B/s]