# Introduction to VSC-PRC

This notebook provides an introduction to VSC-PRC, with basic examples of common iRODS operations such as transferring data and modifying metadata.

Before launching this notebook, make sure that:
* you have set up (VSC-)PRC as described in `README.rst`
* you have valid iRODS access tokens

## The VSCiRODSSession class

With the Python modules in place, you should be able to import the `VSCiRODSSession` class as follows:

In [None]:
from vsc_irods.session import VSCiRODSSession

This class is derived from PRC's `irods.iRODSSession` class, and as such you can still use it to do what PRC is capable off (see https://github.com/irods/python-irodsclient). Here, we will focus on the functionality that is added by VSC-PRC.

As for `irods.iRODSSession`, the `vsc_irods.VSCiRODSSession` class is best initiated using the ``with`` construct to ensure that the session is cleanly terminated, even if an error occurs. In addition to the keyword arguments for `irods.iRODSSession`, it also accepts a `txt` argument. This specifies where the session's print output should be directed to, with the default '-' referring to stdout.

## Transferring data to iRODS

Let's create a new iRODS collection `vsc-prc-intro` inside your iRODS home, and copy the whole `data` folder under `vsc-python-irodsclient/test/` to that location. This `data` folder mainly consists of a set of molecular geometries in XYZ format, for testing purposes.  

In [None]:
import os

# Path on your local file system
local_path = os.path.join(os.path.expandvars('$VSC_PRC_ROOT'), 'test/data')

# Path on the iRODS file system
# Note the tilde (~), referring to your irods_home
irods_path = '~/vsc-prc-intro'

with VSCiRODSSession(txt='-') as session:
    session.path.imkdir(irods_path)
    session.bulk.put(local_path, irods_path=irods_path, recurse=True, verbose=True)

## Searching on iRODS

We can use the `search.find()` method to list the destination collection on iRODS and make sure everything is there:

In [None]:
print('This is what we got on our side:')
for directory, subdirectories, files in os.walk(local_path):
    print(directory)
    for f in files:
        print(os.path.join(directory, f))

print('\nThis is what we got on iRODS:')
with VSCiRODSSession(txt='-') as session:
    for item in session.search.find(irods_path, types='d,f'):
        print(item)

# Modifying metadata

Next, we will add some metadata to all XYZ files in `~/vsc-prc-intro/data/molecules`:

In [None]:
avu = ('Creator', 'YourName')  # Attribute-Value pair

with VSCiRODSSession(txt='-') as session:
    session.bulk.metadata(irods_path + '/data/molecules/*.xyz', object_avu=avu, action='add', verbose=True)

So far, we have been passing simple [glob](https://docs.python.org/3/library/glob.html) patterns to the various operators in ``session.bulk`` to select collections and data objects based on their paths.
For more advanced selections, e.g. based on metadata, we can supply an [Iterator](https://wiki.python.org/moin/Iterator) instead, for example the one provided by `session.search.find` itself.

As an illustration, let's create an iterator for matching files that have the metadata we just added, and use it to remove that metadata:

In [None]:
with VSCiRODSSession(txt='-') as session:
    iterator = session.search.find(irods_path, object_avu=avu)
    session.bulk.metadata(iterator, object_avu=avu, action='remove', verbose=True)

## Getting data from iRODS

Copying data objects and collections on iRODS to the local file system happens in a similar way using `session.bulk.get()`:

In [None]:
with VSCiRODSSession(txt='-') as session:
    iterator = session.search.find(irods_path, pattern='*.txt')
    session.bulk.get(iterator, local_path='.', verbose=True)

This should have transferred the only matching file, called `molecule_names.txt`.
In this case, this is also equivalent with:

In [None]:
with VSCiRODSSession(txt='-') as session:
    # Adding the 'force=True' option to overwrite the local ./molecule_names.txt file
    session.bulk.get(irods_path + '/*/*.txt', local_path='.', force=True, verbose=True)

## Removing data on iRODS

We'll now clean up by removing the root collection for this tutorial (`~/vsc-prc-intro/`). Here, the `force=True` option really removes the collection, i.e. without moving it to the trash. 

In [None]:
with VSCiRODSSession(txt='-') as session:
    session.bulk.remove(irods_path, recurse=True, force=True, verbose=True)