# iBridges in compute tasks

In this notebook we show how to employ iBridges to find data by its metadata in iRODS and employ this data in a compute workflow.

We do not store data on local disk in this example, we rather stream the content of the data objects into memory, i.e. a variable which we use in the compute workflow. Similarly we do not save the results in a file but directly stream them into a new data object on the iRODS server.

## Prerequisites

- Access to an iRODS instance
- Some textual data files labeled with the metadata key `author` and metadata value `Lewis Carroll`.

In [1]:
!ibridges list irods:my_books

/nluu12p/home/research-test-christine/my_books:
  /nluu12p/home/research-test-christine/my_books/AdventuresSherlockHolmes.txt
  /nluu12p/home/research-test-christine/my_books/AliceAdventuresInWonderLand.txt
  /nluu12p/home/research-test-christine/my_books/DonQuixote.txt
  /nluu12p/home/research-test-christine/my_books/Dracula.txt
  /nluu12p/home/research-test-christine/my_books/Frankenstein.txt
  /nluu12p/home/research-test-christine/my_books/Phantasmagoria.txt
  /nluu12p/home/research-test-christine/my_books/RobinsonCrusoe.txt
  /nluu12p/home/research-test-christine/my_books/TheHuntingOfTheSnark.txt
  /nluu12p/home/research-test-christine/my_books/ThroughTheLookingGlass.txt
  /nluu12p/home/research-test-christine/my_books/TravelsIntoSeveralRemoteNationsOfTheWorld.txt
  /nluu12p/home/research-test-christine/my_books/TwentyThousandLeaguesUnderTheSea.txt



## 1. Find the data in iRODS

In [None]:
from pprint import pprint
from ibridges.interactive import interactive_auth
from ibridges import search_data

In [None]:
session = interactive_auth()

In [None]:
KEY = 'author'
VALUE = 'Lewis Carroll'

In [None]:
data = search_data(session, key_vals={KEY: VALUE})
pprint(data)

## 2. Stream content into a variable

In [None]:
from ibridges import IrodsPath

In [None]:
text = ""

In [None]:
for item in data:
    irods_path = IrodsPath(session, item['COLL_NAME'], item['DATA_NAME'])
    with irods_path.dataobject.open('r') as objRead:
        text = text + objRead.read().decode()

In [None]:
print(text[1700:1900])

## 3. Do your analysis

In [None]:
from collections import Counter
import string

def wordcount(text):
    # Convert to list of words, without punctuation
    words = [''.join(char for char in word
             if char not in string.punctuation) for word in text.split()]
    print("Number of words:", len(words))
    unique_words_count = Counter(words)
    return unique_words_count

In [None]:
result = wordcount(text)
print(f"Alice: {result['Alice']}")

## 4. Write the results directly to iRODS

### Create a new empty data object

In [None]:
irods_path = IrodsPath(session, "wordcount_result.json")
obj = session.irods_session.data_objects.create(str(irods_path))
print(f"New object of size {irods_path.size}")

In [None]:
import json
with obj.open('w') as obj_write:
    obj_write.write(json.dumps(result).encode())

In [None]:
print(f"New object of size {irods_path.size}")

### Add some descriptive metadata

In [None]:
from ibridges import MetaData
from datetime import datetime

In [None]:
meta = MetaData(irods_path.dataobject)
print(meta)

In [None]:
datetime.today()

In [None]:
meta.add('ISEARCH', KEY + '==' + VALUE)
meta.add('prov:SoftwareAgent', 'wordcount.py')
meta.add('prov:wasDerivedFrom', str(data))
meta.add('prov:actedOnBehalfOf', 'Christine')
meta.add('prov:generatedAtTime', datetime.now().strftime("%m/%d/%Y, %H:%M"))

In [None]:
print(meta)