# Python iRODS Client User Training

We will use the environment file of iRODS to have a secure and longer session in PRC. Therefore we need to connect to the KU Leuven iRODS portal (https://{yourZone}.irods.icts.kuleuven.be) and follow relevant instructions there.

## Getting iRODS session

Depending on your operating system, the connection script that you should use is not the same since a different authentication mechanism is in effect.

### On Linux OS

In [None]:
import os
import ssl
from irods.session import iRODSSession

try:
    env_file = os.environ['IRODS_ENVIRONMENT_FILE']
except KeyError:
    env_file = os.path.expanduser('~/.irods/irods_environment.json')

ssl_context = ssl.create_default_context(purpose=ssl.Purpose.SERVER_AUTH, cafile=None, capath=None, cadata=None)
ssl_settings = {'ssl_context': ssl_context}
with iRODSSession(irods_env_file=env_file, **ssl_settings) as session:
    pass

### On Windows OS

In [2]:
import os, os.path
from irods.session import iRODSSession
env_file = os.getenv('IRODS_ENVIRONMENT_FILE', os.path.expanduser('~/.irods/irods_environment.json'))
with iRODSSession(irods_env_file=env_file) as session:
    pass

## Working with Collections

You can instantiate a collection you want:

In [3]:
coll = session.collections.get("/icts_demo/home/u0137480")
coll.path

'/icts_demo/home/u0137480'

You check available attributes and methods of an object by built-in dir() function:

In [4]:
[ x for x in dir(coll) if not x.startswith('__') ]

['_meta',
 'data_objects',
 'id',
 'manager',
 'metadata',
 'move',
 'name',
 'path',
 'remove',
 'subcollections',
 'unregister',
 'walk']

For example you can look into the data objects of an interested collection:

In [6]:
for obj in coll.data_objects:
    print(obj)

<iRODSDataObject 20276 test.txt>


You can use walk() method to generate a collection tree. This method shows all content of a requested collection:

In [7]:
for item in coll.walk():
    print(item)

(<iRODSCollection 10270 b'u0137480'>, [<iRODSCollection 10371 b'training'>], [<iRODSDataObject 20276 test.txt>])
(<iRODSCollection 10371 b'training'>, [], [<iRODSDataObject 10374 01_iRODS-User-Training_Intro.pdf>])


You can create or delete a collection. Lets create a new collection:

In [9]:
coll = session.collections.create("/icts_demo/home/u0137480/newCollection")
coll

<iRODSCollection 20278 b'newCollection'>

## Working with Data Objects

You can create a new data object:

In [10]:
obj = session.data_objects.create("/icts_demo/home/u0137480/test_data")

You can upload a file from your local pc as a data object in/to iRODS:

In [11]:
session.data_objects.put("alice.txt","/icts_demo/home/u0137480/alice.txt")

You can download an existing data object from iRODS to your local file system:

In [None]:
session.data_objects.get("/icts_demo/home/u0137480/test.txt", "/yourLocalPath/test.txt")

To overwrite on an existing file, you need to specify force flag after you import the relavant keyword from iRODS

In [20]:
from irods.keywords import FORCE_FLAG_KW

In [12]:
session.data_objects.get("/icts_demo/home/u0137480/test_data", "C:\\Users\\u0137480\\Desktop\\Python\\test_data", forceFlag="")

<iRODSDataObject 20279 test_data>

You can remove a data object from iRODS by unlinkng:

In [13]:
session.data_objects.unlink("/icts_demo/home/u0137480/test_data", force=True)

You can make copy from one collection (location) to another one on iRODS:

**Note**: If your destination doesnt exist yet, it will be created automatically.

In [14]:
session.data_objects.copy("/icts_demo/home/u0137480/test.txt", "/icts_demo/home/u0137480/newCollection/test.txt")

For the python object having the __dict__ attribute, you can use the builtin vars() function to see useful information about the object you instantiated.

In [18]:
download = session.data_objects.get("/icts_demo/home/u0137480/test.txt", "C:\\Users\\u0137480\\Desktop\\Python\\test1.txt", forceFlag="")
vars(download)

{'manager': <irods.manager.data_object_manager.DataObjectManager at 0x17ee19d9e88>,
 'collection': <iRODSCollection 10270 b'u0137480'>,
 'id': 20276,
 'collection_id': 10270,
 'name': 'test.txt',
 'replica_number': 0,
 'version': None,
 'type': 'generic',
 'size': 0,
 'resource_name': 'netapp',
 'path': '/icts_demo/home/u0137480/test.txt',
 'owner_name': 'u0137480',
 'owner_zone': 'icts_demo',
 'replica_status': '1',
 'status': None,
 'checksum': None,
 'expiry': '00000000000',
 'map_id': 0,
 'comments': None,
 'create_time': datetime.datetime(2021, 11, 23, 15, 35, 28),
 'modify_time': datetime.datetime(2021, 11, 23, 15, 35, 28),
 'resc_hier': 'default;netapp',
 'resc_id': '10044',
 'replicas': [<irods.data_object.iRODSReplica netapp>],
 '_meta': None}

## Getting and Setting Permissions

The PRC make it possible to get and set permissions on a collection or on a data object. 

Lets list the given ACLs on a collection:

In [35]:
coll = session.collections.get("/icts_demo/home/u0137480/newCollection")
acl_coll = session.permissions.get(coll)[0]
acl_coll

<iRODSAccess own /icts_demo/home/u0137480/newCollection u0137480 icts_demo>

You can add a new permission or change the existing one. To be able to add/modify ACLs, you should first import relevant classes:

In [36]:
from irods.access import iRODSAccess

You can give someone else an 'own' access right to your data object: 

In [55]:
acl_dataObj = iRODSAccess("read", "/icts_demo/home/u0137480/test.txt", "u0116999", "icts_demo")
session.permissions.set(acl_dataObj)

To check the given permissions, first get the interested data object instantiated:

In [56]:
data_obj = session.data_objects.get("/icts_demo/home/u0137480/test.txt")

You can list given permissions:

In [57]:
acl_dataObj = session.permissions.get(data_obj)
acl_dataObj

[<iRODSAccess own /icts_demo/home/u0137480/test.txt u0137480 icts_demo>,
 <iRODSAccess read object /icts_demo/home/u0137480/test.txt u0116999 icts_demo>]

## Reading and Writing Files

The PRC provides us working with file-like objects.

You can read a data object:

In [58]:
obj = session.data_objects.get("/icts_demo/home/u0137480/test.txt")
with obj.open('r+') as f:
    print(f.read())

b'This is a test file\n'


You can write on a data object:

In [59]:
obj = session.data_objects.get("/icts_demo/home/u0137480/test.txt")
with obj.open('r+') as f:
    f.write(b'Hello\nWorld\n')
    for line in f:
        print(line)

b'st file\n'


## Computing and Retrieving Checksums

By calling chksum() on an object you can add a cheksum:

In [50]:
obj = session.data_objects.get("/icts_demo/home/u0137480/test.txt")
obj.chksum()

'sha2:zDeTfxNmkZ4wC+eEg40PZIaE4pNP3mbNl+MzrlEjl2E='

If a checksum already is associated to the data object at stake, then you can use checksum attribute to see it:

In [52]:
obj = session.data_objects.get("/icts_demo/home/u0137480/test.txt")
print(obj.checksum)

sha2:zDeTfxNmkZ4wC+eEg40PZIaE4pNP3mbNl+MzrlEjl2E=


## Working with metadata

If you check a file that no metadata attached to, then you will see an empty list. You can check all associated metadata with items() method:

In [61]:
obj = session.data_objects.get("/icts_demo/home/u0137480/test.txt")
print(obj.metadata.items())

[<iRODSMeta 20277 irods::access_time 1637682744 None>]


You can add metadata in an AVU format as many as you want. You can associate more than one valu to an attribute. Let's add AVUs to the test.txt data object:

In [62]:
obj.metadata.add('key1', 'value1', 'unit1')
obj.metadata.add('key1', 'value2')
obj.metadata.add('key2', 'value3')
obj.metadata.add('key2', 'value3', 'unit3')
obj.metadata.add('key3', 'value4')
print(obj.metadata.items())

[<iRODSMeta 20277 irods::access_time 1637682744 None>, <iRODSMeta 20285 key1 value1 unit1>, <iRODSMeta 20286 key1 value2 None>, <iRODSMeta 20287 key2 value3 None>, <iRODSMeta 20288 key2 value3 unit3>, <iRODSMeta 20289 key3 value4 None>]


You can also use Python's item indexing syntax to perform the equivalent of an imeta set, e.g. overwriting all AVU's with a name field of "key1" in a single update.

However, we have to first import a relevant module:

In [63]:
from irods.meta import iRODSMeta

In [64]:
new_meta = iRODSMeta('key1','value5','units2')
obj.metadata[new_meta.name] = new_meta
print(obj.metadata.items())

[<iRODSMeta 20277 irods::access_time 1637682744 None>, <iRODSMeta 20287 key2 value3 None>, <iRODSMeta 20288 key2 value3 unit3>, <iRODSMeta 20289 key3 value4 None>, <iRODSMeta 20290 key1 value5 units2>]


It is possible to get all metadata given with a unique attribute by get_all() method.

In [65]:
obj.metadata.get_all('key2')

[<iRODSMeta 20287 key2 value3 None>, <iRODSMeta 20288 key2 value3 unit3>]

You can delete an attached metadata by remove() method. You should here specify the AVU you want to remove:

In [66]:
obj.metadata.remove('key1', 'value5', 'units2')
obj.metadata.items()

[<iRODSMeta 20277 irods::access_time 1637682744 None>,
 <iRODSMeta 20287 key2 value3 None>,
 <iRODSMeta 20288 key2 value3 unit3>,
 <iRODSMeta 20289 key3 value4 None>]

However, if you want to remove all existing metadata on an object at once, then you can use remove_all() method without an argument:

In [67]:
obj.metadata.remove_all()
obj.metadata.items()

[]

## Atomic operations on metadata

The PRC allows a group of metadata add and remove operations to be performed transactionally, within a single call to the server. This does mean you can apply atomic operations on metadata.

To be able to work with atomic operations, you should import relevant classes:

In [68]:
from irods.meta import iRODSMeta, AVUOperation

Now you can add more than one AVU and also remove metadat at the same call.

In [76]:
obj.metadata.apply_atomic_operations( AVUOperation(operation='remove', avu=iRODSMeta('attr1','val1','unit1')),
                                       AVUOperation(operation='add', avu=iRODSMeta('attr3','val3')),
                                       AVUOperation(operation='add', avu=iRODSMeta('attr2','val2','unit2')),
                                       AVUOperation(operation='remove', avu=iRODSMeta('attr2','val2','unit2')) )
obj.metadata.items()

[<iRODSMeta 20291 attr3 val3 None>]

You can also use a pre-built list of AVUOperations using Python's f(*args_list) syntax. For example, this function uses the atomic metadata API to very quickly remove all AVUs from an object.

Lets add more AVUs:

In [70]:
obj.metadata.apply_atomic_operations(AVUOperation(operation='add', avu=iRODSMeta('attr1','val1')), AVUOperation(operation='add', avu=iRODSMeta('attr2','val2','unit2')), )
obj.metadata.items()

[<iRODSMeta 20291 attr3 val3 None>,
 <iRODSMeta 20292 attr2 val2 unit2>,
 <iRODSMeta 20293 attr1 val1 None>]

Now, lets remove all attached metadata:

In [71]:
avus_on_object = obj.metadata.items()
obj.metadata.apply_atomic_operations( *[AVUOperation(operation='remove', avu=i) for i in avus_on_object] )
obj.metadata.items()

[]

## How to make queries

The PRC offers different query options that you can use based on your need. You may use these queries in your script to easily manage your research data.

First we will make a general query based on collection and data object classes. Therefore we import those classes:

In [72]:
from irods.models import Collection, DataObject

In [73]:
query = session.query(Collection.name, DataObject.name, DataObject.size)
for result in query:
    print('{}/{} size={}'.format(result[Collection.name], result[DataObject.name], result[DataObject.size]))

/icts_demo/home/u0137480/alice.txt size=74703
/icts_demo/home/u0137480/test.txt size=20
/icts_demo/home/u0137480/newCollection/test.txt size=12
/icts_demo/home/u0137480/training/01_iRODS-User-Training_Intro.pdf size=2068313


Let's now make a query based on some criterias that we know already with the metadata assuming provided earlier. FIrst we have to import relevant sub modules. We will make our query based on collection and collection metadata. Also we will filter according to the criterias that we specify:

In [74]:
from irods.column import Criterion
from irods.models import DataObject, DataObjectMeta, Collection, CollectionMeta

In [77]:
results = session.query(DataObject, DataObjectMeta).filter( \
Criterion('=', DataObjectMeta, 'key1')).filter( \
Criterion('like', DataObjectMeta.value, 'val%'))
for item in results:
    print(item[DataObject.name], item[DataObjectMeta.name], item[DataObjectMeta.value], item[DataObjectMeta.units])

test.txt attr3 val3 None


Lets now query the data size and and quantity of the data object you are owner of:

In [78]:
query = session.query(DataObject.owner_name).count(DataObject.id).sum(DataObject.size)
print(query.execute())

+--------------+-----------+-----------+
| D_OWNER_NAME | D_DATA_ID | DATA_SIZE |
+--------------+-----------+-----------+
| u0137480     | 4         | 2143048   |
+--------------+-----------+-----------+
