# iBridges in compute tasks

In this notebook we show how to employ iBridges to find data by its metadata in iRODS and employ this data in a compute workflow.

We do not store data on local disk in this example, we rather stream the content of the data objects into memory, i.e. a variable which we use in the compute workflow. Similarly we do not save the results in a file but directly stream them into a new data object on the iRODS server.

## Prerequisites

- Access to an iRODS instance
- Some textual data files labeled with the metadata key `author` and metadata value `Lewis Carroll`.

In [None]:
!ibridges list irods:my_books

## 1. Find the data in iRODS

In [None]:
from pprint import pprint
from ibridges.interactive import interactive_auth
from ibridges.search import search_data, MetaSearch

In [None]:
session = interactive_auth()

In [None]:
KEY = 'author'
VALUE = 'Lewis Carroll'

In [None]:
data = search_data(session, metadata=MetaSearch(key=KEY, value=VALUE))
pprint(data)

## 2. Stream content into a variable

In [None]:
from ibridges import IrodsPath

In [None]:
text = ""

In [None]:
for irods_path in data:
    with irods_path.open('r') as handle:
        text = text + handle.read().decode()

In [None]:
print(text[1700:1900])

## 3. Do your analysis

In [None]:
from collections import Counter
import string

def wordcount(text):
    # Convert to list of words, without punctuation
    words = [''.join(char for char in word
             if char not in string.punctuation) for word in text.split()]
    print("Number of words:", len(words))
    unique_words_count = Counter(words)
    return unique_words_count

In [None]:
result = wordcount(text)
print(f"Alice: {result['Alice']}")

## 4. Write the results directly to iRODS

### Create a new empty data object

In [None]:
import json
irods_path = IrodsPath(session, "wordcount_result.json")
with irods_path.open('w') as obj_write:
    obj_write.write(json.dumps(result).encode())

In [None]:
print(f"New object of size {irods_path.size}")

### Add some descriptive metadata

In [None]:
from datetime import datetime

In [None]:
datetime.today()

In [None]:
irods_path.meta.add('ISEARCH', KEY + '==' + VALUE)
irods_path.meta.add('prov:SoftwareAgent', 'wordcount.py')
irods_path.meta.add('prov:wasDerivedFrom', str(data))
irods_path.meta.add('prov:actedOnBehalfOf', 'Christine')
irods_path.meta.add('prov:generatedAtTime', datetime.now().strftime("%m/%d/%Y, %H:%M"))

In [None]:
print(irods_path.meta)