# Indra experiments

The aim here is to evaluate Indra as a basis for claim identification in medical literature outside of molecular biology, esp. in the case of clinical trials.


## Document ETL

Taking open-access documents as point of departure. Below is the abstract, stripped of references:




In [1]:
from lxml import etree
import requests
import json
from indra.literature import get_full_text
import os
import csv

#print(os.path.abspath(os.path.curdir))
with open('./env.csv') as f:
    r = csv.reader(f, delimiter=' ')
    for k, v in r:
        os.environ[k] = v


In [2]:
doi = "10.12688/f1000research.16369.1"
pmid = '30631430'
pmcid = 'PMC6281014'
pdflink = None #'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6281014/pdf/f1000research-7-17879.pdf'

In [3]:
(t, ttype) = get_full_text(str(pmid), 'pmid') # or (doi, 'doi')
nsmap={'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
 'ali': 'http://www.niso.org/schemas/ali/1.0/',
 'mml': 'http://www.w3.org/1998/Math/MathML',
 'xlink': 'http://www.w3.org/1999/xlink',
 'j': 'https://jats.nlm.nih.gov/ns/archiving/1.2/'}


if ttype == 'pmc_oa_xml':
    tx = etree.fromstring(t.encode('utf-8'))
    with open(f"{pmcid}.xml", 'w') as f:
        f.write(t)
    ax = tx.xpath('//j:abstract', namespaces=nsmap)[0]
    abstract = ''.join(ax.itertext()).strip()
    if pmcid:
        if not pdflink:
            pdflinks=tx.xpath("//j:self-uri[@content-type='pdf']/@xlink:href", namespaces=nsmap)
            if pdflinks:
                pdflink = pdflinks[0]
        if pdflink:
            source = f"https://www.ncbi.nlm.nih.gov/pmc/articles/{pmcid}/pdf/{pdflink}"
            r = requests.get(source)
            if r.ok:  # actually get a 403 here...
                with open(f"{pmcid}.pdf", 'wb') as f:
                    f.write(r.raw)
            else:
                print(f"open {source} and save it as {pmcid}.pdf\n")
elif ttype == 'abstract':
    abstract = t
print(abstract)


open https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6281014/pdf/f1000research-7-17879.pdf and save it as PMC6281014.pdf

Atherosclerotic renovascular disease (ARVD) is an unresolved therapeutic dilemma despite extensive pre-clinical and clinical studies. The pathophysiology of the disease has been widely studied, and many factors that may be involved in progressive renal injury and cardiovascular risk associated with ARVD have been identified. However, therapies and clinical trials have focused largely on attempts to resolve renal artery stenosis without considering the potential need to treat the renal parenchyma beyond the obstruction. The results of these trials show a staggering consistence: although nearly 100% of the patients undergoing renal angioplasty show a resolution of the vascular obstruction, they do not achieve significant improvements in renal function or blood pressure control compared with those patients receiving medical treatment alone. It seems that we may need to ta

In [11]:
from indra.sources import reach
rp_reach=reach.process_text(abstract, offline=True)
with open(f"{pmcid}.reach.json", 'w') as f:
    json.dump(rp_reach.tree.data, f)
if rp_reach.statements:
    with open(f"{pmcid}.reach.indra.json", 'w') as f:
        json.dump([s.to_json() for s in rp_reach.statements], f)
print(rp_reach.statements)

[]


In [12]:
from indra.sources import eidos
rp_eidos=eidos.process_text(abstract, webservice="http://localhost:9000")
with open(f"{pmcid}.eidos.json", 'w') as f:
    json.dump(rp_eidos.doc.tree.data, f)
if rp_eidos.statements:
    with open(f"{pmcid}.eidos.indra.json", 'w') as f:
        json.dump([s.to_json() for s in rp_eidos.statements], f)
print(rp_eidos.statements)

[Influence(Event(trials), Event(show staggering consistence % patients undergoing renal angioplasty show resolution vascular obstruction do achieve renal function blood pressure control compared patients receiving medical treatment)), Event(pre-clinical clinical), Event(pathophysiology disease has been), Event(many factors), Event(tools), Event(treatment ARVD)]


In [None]:
from indra.sources import sparser
# Buggy version locally, but empty results even on official version.
ab = "".join(abstract.split("-"))
print(ab)
rp_sparser=sparser.process_text(ab)
if rp_sparser and rp_sparser.statements:
    with open(f"{pmcid}.sparser.json", 'w') as f:
        json.dump(rp_sparser.doc.tree.data, f)
    if rp.statements:
        with open(f"{pmcid}.sparser.indra.json", 'w') as f:
            json.dump([s.to_json() for s in rp_sparser.statements], f)
print(rp_sparser)
