# eTransafe Concordance analysis

This is the use scenario that has been described by Thomas Steger-Hartmann in a publication with Matthew Clark.
The idea is to compare animal observations with clinical observations for the various drugs
1. determine the drugs that have been used in the preclinical and the clinical domain
2. compare the individual PTs for preclinical and clinical
3. compute the concordance matrix
6. Visualize the matrix

(C) 2022 Erasmus University Medical Center, Rotterdam, The Netherlands
Author: Erik M. van Mulligen, e.vanmulligen@erasmusmc.nl

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from knowledgehub.api import KnowledgeHubAPI
from Concordance.condordance_utils import getClinicalDatabases, getPreclinicalDatabases
from Concordance.mapper import Mapper

import ipywidgets as w
from IPython.display import display, Javascript
from ipypublish import nb_setup
import numpy as np
import json
import mysql.connector

import warnings
warnings.filterwarnings('ignore')

In [2]:
api = KnowledgeHubAPI(server='DEV', client_secret='3db5a6d7-4694-48a4-8a2e-e9c30d78f9ab')
mapper = Mapper(api)

## Authenticate for KnowledgeHub

In [3]:
username = w.Text(value='tester',placeholder='Knowledge Hub account', description='username:', disabled=False)
password = w.Password(value='', placeholder='Knowledge Hub password', description='password:', disabled=False)
loginBtn = w.Button(description='Login')
status = w.Output()

def on_button_clicked(_):
    if not api.login(username.value, password.value):
        print("Failed to login")
    else:
        print("successfully logged in")
        display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))


loginBtn.on_click(on_button_clicked)
w.VBox([username, password, loginBtn])

VBox(children=(Text(value='tester', description='username:', placeholder='Knowledge Hub account'), Password(de…

successfully logged in


<IPython.core.display.Javascript object>

## Authenticate for the data stored in the database

In [4]:
global db

dbhost = w.Text(value='localhost',placeholder='database host', description='host:', disabled=False)
dbdatabase = w.Text(value='concordance',placeholder='database name', description='database:', disabled=False)
dbusername = w.Text(value='root',placeholder='database username', description='username:', disabled=False)
dbpassword = w.Password(value='', placeholder='database password', description='password:', disabled=False)
dbLoginBtn = w.Button(description='Login')
status = w.Output()

def dbLoginBtn_click(_):
    global db
    try:
        db = mysql.connector.connect(host=dbhost.value, database=dbdatabase.value, user=dbusername.value, password=dbpassword.value)
        print("successfully logged in database")
        display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))
    except Exception as e:
        print("failed to log in database")
        sys.exit(0)
    
dbLoginBtn.on_click(dbLoginBtn_click)
w.VBox([dbhost, dbdatabase, dbusername, dbpassword, dbLoginBtn])

VBox(children=(Text(value='localhost', description='host:', placeholder='database host'), Text(value='concorda…

successfully logged in database


<IPython.core.display.Javascript object>

## The database
In order to be able to compute the concordance tables we have constructed a database with all preclinical and clinical findings found for drugs (i.e. inchikeys) that are both found in the preclinical and clinical data. For the preclinical data we restricted to findings that are treatment related and not in the control group. For each of the preclinical findings we checked with the semantic service whether the equivalent clinical finding was found in the clinical findings and vice versa. This is stored in the database as 'mapped' per finding. For each of the preclinical and clinical findings we derived the MedDRA PT it is related to. For preclinical terms this is done through the semantic service. For clinical terms the findings are already expressed as MedDRA PT terms. These mappings are stored in the database as 'PT' per finding.



## Drug mapping
We maintain a list of drugs that can be found in the preclinical and clinical data with its inchikey. Per drug we have stored the finding ids that are associated per database.

In [5]:
with open('../data/drugs_mapping.20220305.json', 'r') as drug_file:
    drugs = json.loads(drug_file.read())
    print(f'{len(drugs)} drugs found')

304 drugs found


## Overview of the drugs

In [6]:
pd = nb_setup.setup_pandas(escape_latex=False)
df = pd.DataFrame(np.random.rand(len(drugs),3),columns=['inchiKey','clinicalName','preclinicalName'])
df.inchiKey = [drugs[d]['inchiKey'] for d in drugs]
df.clinicalName = [drugs[d]['clinicalName'] for d in drugs]
df.preclinicalName = [drugs[d]['preclinicalName'] for d in drugs]
df.round(3)

Unnamed: 0,inchiKey,clinicalName,preclinicalName
0,GDLIGKIOYRNHDA,clomipramine,Clomipramine HCl
1,OZVBMTJYIDMWIL,bromocriptine,Bromocriptine
2,KPYSYYIEGFHWSV,baclofen,Baclofen
3,QZUDBNBUXVUHMW,clozapine,Clozapin
4,ZNRGQMMCGHDTEI,tropisetron,Tropisetron
...,...,...,...
299,VMZMNAABQBOLAK,pasireotide,Pasireotide
300,HTIQEAQVCYTUBX,amlodipine,Amlodipine
301,FAKRSMQSSFJEIM,captopril,Captopril
302,YMTINGFKWWXKFG,fenofibrate,Fenofibrate


## Concordance table
Per drug retrieve the preclinical and clinical PT terms. 
- true positives are the PT terms that can be found present in the preclinical and clinical data per drug. 
- false positives are the PT terms that can be found in the preclinical data but not in the clinical data per drug
- false negatives are the PT terms that can not be found in the preclinical data but in the clinical data per drug
- true negatives are all PT terms that can not be found in the preclinical data and in the clinical data per drug

In [10]:
from Concordance.condordance_utils import getAllPreClinicalClinicalPTs, getPTDrugFindings, getAllPreclinicalClinicalDistances
from Concordance.meddra import MedDRA

level = 'soc'
pt_to_group = {}

def getGroup(meddra, pt, level):
    if not pt in pt_to_group:
        if level == 'pt':
            group = meddra.getPt(pt)
        elif level == 'hlt':
            group = meddra.getHLT(pt)
        elif level == 'soc':
            group = meddra.getSoc(pt)
        pt_to_group[pt] = list(group.keys())[0] if len(group) > 0 else None
    return pt_to_group[pt]

meddra = MedDRA(username=dbusername.value, password=dbpassword.value)
ClinicalDatabases = getClinicalDatabases(api);
PreclinicalDatabases = getPreclinicalDatabases(api);

groups = {}
preclinical_pts = {}
clinical_pts = {}
for drug in drugs:
    preclinical_pts[drug] = getPTDrugFindings(db=db, drugInfo=drugs[drug], databases=PreclinicalDatabases.keys(), table='preclinical_meddra')
    clinical_pts[drug] = getPTDrugFindings(db=db, drugInfo=drugs[drug], databases=ClinicalDatabases.keys(), table='clinical_meddra')

all_preclinical_clinical_pts = getAllPreClinicalClinicalPTs(db=db, tables=['preclinical_meddra','clinical_meddra'])
all_preclinical_clinical_distances = getAllPreclinicalClinicalDistances(db=db, tables=['preclinical_meddra','clinical_meddra'])

for pt in all_preclinical_clinical_pts:
    group = getGroup(meddra, pt, level)
    
    if group is not None:
        if not group in groups:
            groups[group] = {'tp': 0, 'fp': 0, 'fn': 0, 'tn': 0, 'drugs': [], 'distance': all_preclinical_clinical_distances[pt]}
        elif abs(groups[group]['distance']) > abs(all_preclinical_clinical_distances[pt]):
            groups[group]['distance'] = all_preclinical_clinical_distances[pt]

        for drug in drugs:
            if not drug in groups[group]['drugs']:
                groups[group]['drugs'].append(drug)
                if pt in preclinical_pts[drug]:
                    if pt in clinical_pts[drug]:
                        groups[group]['tp'] += 1
                    else:
                        groups[group]['fp'] += 1
                else:
                    if pt in clinical_pts[drug]:
                        groups[group]['fn'] += 1
                    else:
                        groups[group]['tn'] += 1
print(groups)

{}


## Concordance tables

In [9]:
from Concordance.condordance_utils import getName
from nbconvert import HTMLExporter
import codecs
import nbformat

def compute_lrp(group):
    sensitivity = compute_sensitivity(group)
    specificity = compute_specificity(group)
    if specificity is not None and sensitivity is not None:
        return sensitivity / (1 - specificity) if specificity != 1 else None
    else:
        return None

def compute_lrn(group):
    sensitivity = compute_sensitivity(group)
    specificity = compute_specificity(group)
    if specificity is not None and sensitivity is not None:
        return (1 - sensitivity) / specificity if specificity != 0 else None
    else:
        return None
    
def compute_chisquare(group):
    tp = group['tp']
    fp = group['fp']
    fn = group['fn']
    tn = group['tn']
    total = tp + fp + fn + tn
    e11 = ((tp + fp) * (tp + fn)) / total
    e12 = ((tp + fp) * (fp + tn)) / total
    e21 = ((fn + tn) * (tp + fn)) / total
    e22 = ((fn + tn) * (fp + tn)) / total
    try:
        return (((tp - e11)**2)/e11) + (((fp - e12)**2)/e12) + (((fn - e21)**2)/e21) + (((tn - e22)**2)/e22)
    except Exception as e:
        return None

def compute_sensitivity(group):
    tp = group['tp']
    fn = group['fn']
    return tp / (tp + fn) if (tp + fn) > 0 else None

def compute_specificity(group):
    fp = group['fp']
    tn = group['tn']
    return tn / (fp + tn) if (fp + tn) > 0 else None

group_title = 'MedDRA ' + level.upper()
pd.set_option('display.max_rows', None)
pd.set_option('display.colheader_justify', 'left')
pd.options.display.float_format = '{:.2f}'.format
df = pd.DataFrame(np.random.rand(len(groups),11),columns=[group_title,'min.distance', 'TP','FP', 'FN', 'TN', 'Sensitivity', 'Specificity', 'LR+', 'LR-', 'chi-square'])
df[group_title] = [getName(meddra, code, level) for code in groups]
df['min.distance'] = [groups[code]['distance'] for code in groups]
df.TP = [groups[code]['tp'] for code in groups]
df.FP = [groups[code]['fp'] for code in groups]
df.FN = [groups[code]['fn'] for code in groups]
df.TN = [groups[code]['tn'] for code in groups]
df['Sensitivity'] = [compute_sensitivity(groups[code]) for code in groups]
df['Specificity'] = [compute_specificity(groups[code]) for code in groups]
df['LR+'] = [compute_lrp(groups[code]) for code in groups]
df['LR-'] = [compute_lrn(groups[code]) for code in groups]
df['chi-square'] = [compute_chisquare(groups[code]) for code in groups]
df.round(3)
df = df.sort_values(by=['LR+'], ascending=False)
dfStyler = df.style.set_properties(**{'text-align': 'right'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])
dfStyler.set_properties(subset=[group_title], **{'text-align': 'left'}).hide_index()

MedDRA SOC,min.distance,TP,FP,FN,TN,Sensitivity,Specificity,LR+,LR-,chi-square


In [51]:
html = df.to_html(index=False, justify='right', border=1)

#write html to file
text_file = open("../data/concordance_" + level.upper() + ".html", "w")
text_file.write(html)
text_file.close()