### <center>A decoupled, modular and scriptable architecture for tools to curate data platforms<br>Supplementary Material IV</center>
# <center>Analysing the Reliability of Bioinformatics Resource Providers listed in identifiers.org using cmd-iaso</center>
### <center>Moritz Langenstein, Henning Hermjakob and Manuel Bernal Llinares<br>September 16, 2020</center>

You can launch this Jupyter Notebook online using

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/identifiers-org/cmd-iaso-analysis.git/main?filepath=Supplementary%20Material%20IV.ipynb)

TODO: Here we download and install cmd-iaso in a new virtual environment

In [None]:
!git clone https://github.com/identifiers-org/cmd-iaso.git
!pip install virtualenv
!virtualenv venv
!venv/bin/pip install --upgrade pip
!venv/bin/pip install cmd-iaso/

TODO: Here we import system libraries used in this Notebook

In [None]:
import gzip
import json
import pickle
import shlex
import urllib.request

TODO: Here we import pretty printing for JSON from cmd-iaso

In [None]:
def print_json(obj):
    code = shlex.quote(f"from iaso.format_json import format_json; print(format_json({repr(obj)}, process_links=False))")
    
    !echo {code} | venv/bin/python3

TODO: We will demonstrate the analysis on the `JWS Online Model Repository at Amsterdam` resource provider.

In [None]:
with urllib.request.urlopen('https://registry.api.identifiers.org/restApi/resources/416') as response:
    print_json(json.loads(response.read()))

TODO: We create two scraping jobs (this would normally be done using `> cmd-iaso jobs jobs.json`) for the resource, the first using a valid LUI, the second one using a randomly generated one.

In [None]:
with open('jobs.json', 'w') as file:
    json.dump([
        (416, 'curien', False, 'http://jjj.bio.vu.nl/models/curien'),
        (416, '7d_', True, 'http://jjj.bio.vu.nl/models/7d_')
    ], file)

TODO: We create a dump folder and use cmd-iaso to perform the scraping

In [None]:
!mkdir dump -p
!echo 'y' | venv/bin/cmd-iaso scrape jobs.json dump

TODO: We can now look at the scraping dump which contains the responses for both resource pings. We can see that both redirected and then failed on an SSL error.

In [None]:
with gzip.open('dump/pings_416.gz', 'rb') as file:
    print_json(pickle.load(file))
    print_json(pickle.load(file))

TODO: We can now compress the dump into the structured findings file which cmd-iaso can read during curation.

In [None]:
!echo 'y' | venv/bin/cmd-iaso dump2datamine dump datamine.json

TODO: The datamine contains information about the environment in which this command was run as well as information similar to the raw dump

In [None]:
with open('datamine.json', 'r') as file:
    print_json(json.load(file))

TODO: We will now analyse the collected findings using the SSL and Scheme-Only-Redirect validators. Explain them and link to GitHub source. We can use cmd-iaso to print a summary of the errors found.

In [None]:
!echo 'y' | venv/bin/cmd-iaso curate --statistics start resources datamine.json --validate scheme-only-redirect --validate ssl-error --validate dns-error --discard-session

TODO: We can also enter the iterative curation mode to see each identified issue separately.

In [None]:
!echo 'end' | venv/bin/cmd-iaso curate --controller terminal --navigator terminal --informant terminal start resources datamine.json --validate scheme-only-redirect --validate ssl-error --validate dns-error --discard-session