# Python TERMite toolkit - TExpress

We provide a Python library for making calls to our NER engine, TERMite, as well as the TExpress module for defining more complex semantic patterns. The library also enables post-processing of the JSON returned from such requests. This notebook assumes that you're read the example TERMite notebook and walks you through how to make a TExpress call and some of the post-processing of the JSON output.

## Example call to TExpress

The toolkit can also be used to make TExpress calls to identify patterns and extract biomedical relationships. Using TExpress with the toolkit is easy: simply ```import texpress``` from the ```termite_toolkit``` and make a call.

A simple TExpress call is made up of:
* the TERMite API endpoint
* the pattern you wish to search for - this can be created in the TERMite UI
* a TExpress request
* request execution

Below is an example TExpress call with the result being printed to the screen.

In [3]:
from pprint import pprint
from termite_toolkit import texpress

# specify termite API endpoint
termite_home = "http://localhost:9090/termite"

# specify the pattern you wish to search for- this can created in the TERMite UI
pattern = ":(INDICATION):{0,5}:(GENE)"

t = texpress.TexpressRequestBuilder()

# individually add items to your TERMite request
t.set_url(termite_home)
t.set_text("sildenafil citrate macrophage colony stimulating factor influenza")
t.set_subsume(True)
t.set_input_format("txt")
t.set_output_format("json")
t.set_allow_ambiguous(False)
t.set_pattern(pattern)

# execute the request
texpress_response = t.execute(display_request=False)

pprint(texpress_response)

{'RESP_META': {'CONID': '127.0.0.1/68',
               'HTTP_CODE': '200',
               'INPUT_SIZE': 65,
               'JSON_PRODUCER': 'EFFICIENT',
               'REQID': 'd70829d1-bc8f-44ce-a2ee-39a19321a43a-12497',
               'RUNTIME_OPTIONS': {'_termitesys.exetermite': 'false',
                                   '_termitesys.exetexpress': 'true',
                                   'alwaysAdd': 'false',
                                   'reverse': 'true',
                                   'texpressAny': 'DBSNP,DRUG,INDICATION,PROTYP,BIOPROC,ANAT,CELLLINE,CELLTYP,COMPANY,BIOCHEM,LABPROC,GENE,GOONTOL,CLINPROC,HPO,ADVENT,DRUGTYP,TECH,PKPD,SBIO,CHEMREC,CHEMMETH,SPMF,DEVICE,MIRNA,MPATH,CLINMES,MDRACUTEAE,MDRACUTEAE,CLINMES,MDRAELLY',
                                   'tx.ambig': 'false'},
               'TERMITE_RUNTIME': 'default',
               'TERMITE_VERS': '6.3.17',
               'Timing_msec_TOTAL': '60'},
 'RESP_PAYLOAD': {},
 'RESP_TEXPRESS': {'Termite_Doc_d70829d

For more information on the TExpress JSON results [click here](https://help.scibite.com/a/solutions/articles/4000021813-anatomy-of-a-texpress-result-server-).

Like TERMite, TExpress calls can be simplified to call options and annotation:


In [6]:
from pprint import pprint
from termite_toolkit import texpress
import sys
import os

termite_home = "http://localhost:9090/termite"
parentDir = os.path.dirname(os.path.dirname(os.path.abspath("__file__")))  # this line relatively locates the parent directory
input_file = os.path.join(parentDir, 'sample_scripts/medline_sample.zip')
options = {"format": "medline.xml", "output": "json", "pattern": ":(INDICATION):{0,5}:(GENE)",
           "opts"  : "reverse=false"}

texpress_json_response = texpress.annotate_files(termite_home, input_file, options)

# TExpress toolkit library

The standard JSON output isn't very human friendly, so we've added functionality for parsing the JSON and doc.JSONx outputs. The output can be returned as either a dictionary object or as a pandas dataframe.

In [12]:
texpress.get_entity_hits_from_json(texpress_response)

{'USR_3[R]': [{'doc_id': 'Termite_Doc_d70829d1-bc8f-44ce-a2ee-39a19321a43a-10967',
   'entities': ['GENE#CSF1#colony stimulating factor 1',
    'INDICATION#D007251#Influenza, Human'],
   'original_fragment': 'macrophage colony stimulating factor influenza',
   'conf': 3}]}

In [8]:
texpress.get_texpress_dataframe(texpress_json_response).head()

Unnamed: 0,docID,patternID,originalFragment,matchEntities,originalSentence,sentence,subsumed
0,25805890,USR_4,"NOD2 and ATG16L1 mutations, 5 ulcerative colit...","[GENE#NOD2, GENE#ATG16L1, INDICATION#D003093]",Peripheral blood MDM were obtained from 24 CD ...,3,False
1,26520163,USR_4,anti-tumour necrosis factor alpha [TNFα,"[INDICATION#D002277, GENE#TNF]",Most clinical trial data indicate that the ris...,7,False
2,24793818,USR_4,KV1.3 potassium channel correlates with pro-in...,"[GENE#KCNA3, INDICATION#D007249]",Expression of T-cell KV1.3 potassium channel c...,0,False
3,24793818,USR_4,KV1.3 and KCa3.1 in the inflamed mucosa,"[GENE#KCNA3, GENE#KCNN4, INDICATION#D052016]",It is unknown if KV1.3 and KCa3.1 in the infla...,3,False
4,24793818,USR_4,"KV1.3 and KCa3.1, immune cell markers, and pro...","[GENE#KCNA3, GENE#KCNN4, INDICATION#D007249]",Protein and mRNA expression of KV1.3 and KCa3....,6,False
