# eTransafe Concordance analysis

This is the use scenario that has been described by Thomas Steger-Hartmann in a publication with Matthew Clark.
The idea is to compare animal observations with clinical observations for the various drugs
1. determine the drugs that have been used in the preclinical and the clinical domain
2. compare the individual SOCs for preclinical and clinical 
3. compute the concordance matrix
6. Visualize the matrix

(C) 2021 Erasmus University Medical Center, Rotterdam, The Netherlands
Author: Erik M. van Mulligen, e.vanmulligen@erasmusmc.nl

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from knowledgehub.api import KnowledgeHubAPI
from Concordance.condordance_utils import getDrugsMapping, getClinicalDatabases, getPreclinicalDatabases, getSocs, getSocDrugFindings
from Concordance.mapper import Mapper

import ipywidgets as w
from IPython.display import display, Markdown, clear_output, Javascript
from ipypublish import nb_setup
import numpy as np
import seaborn as sns
import pandas
import json
import matplotlib.pyplot as plt
from pprint import pprint
import mysql.connector

In [2]:
api = KnowledgeHubAPI(server='DEV', client_secret='3db5a6d7-4694-48a4-8a2e-e9c30d78f9ab')
mapper = Mapper(api)

## Authenticate for KnowledgeHub

In [5]:
username = w.Text(value='tester',placeholder='Knowledge Hub account', description='username:', disabled=False)
password = w.Password(value='', placeholder='Knowledge Hub password', description='password:', disabled=False)
loginBtn = w.Button(description='Login')
status = w.Output()

def on_button_clicked(_):
    if api.login(username.value, password.value) == False:
        print("Failed to login")
    else:
        print("successfully logged in")
        display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))

loginBtn.on_click(on_button_clicked)
w.VBox([username, password, loginBtn])

VBox(children=(Text(value='tester', description='username:', placeholder='Knowledge Hub account'), Password(de…

successfully logged in


<IPython.core.display.Javascript object>

## Authenticate for the data stored in the database

In [6]:
global db

dbhost = w.Text(value='localhost',placeholder='database host', description='host:', disabled=False)
dbdatabase = w.Text(value='concordance',placeholder='database name', description='database:', disabled=False)
dbusername = w.Text(value='root',placeholder='database username', description='username:', disabled=False)
dbpassword = w.Password(value='', placeholder='database password', description='password:', disabled=False)
dbLoginBtn = w.Button(description='Login')
status = w.Output()

def dbLoginBtn_click(_):
    global db
    try:
        db = mysql.connector.connect(host=dbhost.value, database=dbdatabase.value, user=dbusername.value, password=dbpassword.value)
        print("successfully logged in database")
        display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.get_selected_index()+2)'))
    except Exception as e:
        print("failed to log in database")
        sys.exit(0)
    
dbLoginBtn.on_click(dbLoginBtn_click)
w.VBox([dbhost, dbdatabase, dbusername, dbpassword, dbLoginBtn])

VBox(children=(Text(value='localhost', description='host:', placeholder='database host'), Text(value='concorda…

successfully logged in database


<IPython.core.display.Javascript object>

## The database
In order to be able to compute the concordance tables we have constructed a database with all preclinical and clinical findings found for drugs (i.e. inchikeys) that are both found in the preclinical and clinical data. For the preclinical data we restricted to findings that are treatment related and not in the control group. For each of the preclinical findings we checked with the semantic service whether the equivalent clinical finding was found in the clinical findings and vice versa. This is stored in the database as 'mapped' per finding. For each of the preclinical and clinical findings we derived the system organ class it belongs to. This is stored in the database as 'SOC' per finding.

## Drug mapping
We maintain a list of drugs that can be found in the preclinical and clinical data with its inchikey. Per drug we have stored the finding ids that are associated per database.

In [7]:
with open('../data/drugs_mapping.20220305.json', 'r') as drug_file:
    drugs = json.loads(drug_file.read())
    print(f'{len(drugs)} drugs found')

206 drugs found


## Overview of the drugs

In [8]:
pd = nb_setup.setup_pandas(escape_latex=False)
df = pd.DataFrame(np.random.rand(len(drugs),3),columns=['inchiKey','clinicalName','preclinicalName'])
df.inchiKey = [drugs[d]['inchiKey'] for d in drugs]
df.clinicalName = [drugs[d]['clinicalName'] for d in drugs]
df.preclinicalName = [drugs[d]['preclinicalName'] for d in drugs]
df.round(3)

Unnamed: 0,inchiKey,clinicalName,preclinicalName
0,MWTBKTRZPHJQLH-UHFFFAOYSA-N,alcaftadine,Alcaftadine
1,KKGQTZUTZRNORY-UHFFFAOYSA-N,fingolimod,Fingolimod
2,JLKIGFTWXXRPMT-UHFFFAOYSA-N,sulfamethoxazole,Sulfamethoxazole
3,XWTYSIMOBUGWOL-UHFFFAOYSA-N,terbutaline,
4,MUMGGOZAMZWBJJ-DYKIIFRCSA-N,testosterone,
...,...,...,...
201,DALKLAYLIPSCQL-YPYQNWSCSA-N,methylprednisolone aceponate,Methylprednisolone Aceponate
202,UUOJIACWOAYWEZ-UHFFFAOYSA-N,bopindolol,Bopindolol
203,QPGGEKPRGVJKQB-UHFFFAOYSA-N,dibenzepin,Dibenzepin
204,PVLJETXTTWAYEW-UHFFFAOYSA-N,mizolastine,Mizolastine


## Concordance table
Per drug retrieve the preclinical and clinical findings. 
- true positives are the findings that can be found present in the preclinical and clinical data. 
- false positives are the findings that can be found in the preclinical data but not in the clinical data
- false negatives are the clinical findings that can not be found in the preclinical data
- true negatives are all preclinical unmapped findings that are not part of the drug specific preclinical findings

In [10]:
ClinicalDatabases = getClinicalDatabases(api);
PreclinicalDatabases = getPreclinicalDatabases(api);

groups = {}
preclinical_findings = {}
clinical_findings = {}
for drug in drugs:
    preclinical_findings[drug] = getSocDrugFindings(db=db, drugInfo=drugs[drug], databases=PreclinicalDatabases.keys(), table='preclinical_findings')
    clinical_findings[drug] = getSocDrugFindings(db=db, drugInfo=drugs[drug], databases=ClinicalDatabases.keys(), table='clinical_findings')

# get first the list of SOCs
for soc in getSocs(db, ['preclinical_findings', 'clinical_findings']):
    groups[soc] = {'tp': 0, 'fp': 0, 'fn': 0, 'tn': 0}
    for drug in drugs:
        if soc in preclinical_findings[drug]:
            if soc in clinical_findings[drug]:
                groups[soc]['tp'] += 1
            else:
                groups[soc]['fp'] += 1
        else:
            if soc in clinical_findings[drug]:
                groups[soc]['fn'] += 1
            else:
                groups[soc]['tn'] += 1

## Concordance tables

In [13]:
def compute_lrp(group):
    sensitivity = compute_sensitivity(group)
    specificity = compute_specificity(group)
    if specificity is not None and sensitivity is not None:
        return sensitivity / (1 - specificity) if specificity != 1 else None
    else:
        return None

def compute_lrn(group):
    sensitivity = compute_sensitivity(group)
    specificity = compute_specificity(group)
    if specificity is not None and sensitivity is not None:
        return (1 - sensitivity) / specificity if specificity != 0 else None
    else:
        return None
    
def compute_chisquare(group):
    tp = group['tp']
    fp = group['fp']
    fn = group['fn']
    tn = group['tn']
    total = tp + fp + fn + tn
    e11 = ((tp + fp) * (tp + fn)) / total
    e12 = ((tp + fp) * (fp + tn)) / total
    e21 = ((fn + tn) * (tp + fn)) / total
    e22 = ((fn + tn) * (fp + tn)) / total
    try:
        return (((tp - e11)**2)/e11) + (((fp - e12)**2)/e12) + (((fn - e21)**2)/e21) + (((tn - e22)**2)/e22)
    except Exception as e:
        return None

def compute_sensitivity(group):
    tp = group['tp']
    fn = group['fn']
    return tp / (tp + fn) if (tp + fn) > 0 else None

def compute_specificity(group):
    fp = group['fp']
    tn = group['tn']
    return tn / (fp + tn) if (fp + tn) > 0 else None
                
pd.set_option('display.max_rows', None)
pd.set_option('display.colheader_justify', 'left')
pd.options.display.float_format = '{:.2f}'.format
df = pd.DataFrame(np.random.rand(len(groups),10),columns=['MedDRA SOC','TP','FP', 'FN', 'TN', 'Sensitivity', 'Specificity', 'LR+', 'LR-', 'chi-square'])
df['MedDRA SOC'] = [soc for soc in groups]
df.TP = [groups[soc]['tp'] for soc in groups]
df.FP = [groups[soc]['fp'] for soc in groups]
df.FN = [groups[soc]['fn'] for soc in groups]
df.TN = [groups[soc]['tn'] for soc in groups]
df['Sensitivity'] = [compute_sensitivity(groups[soc]) for soc in groups]
df['Specificity'] = [compute_specificity(groups[soc]) for soc in groups]
df['LR+'] = [compute_lrp(groups[soc]) for soc in groups]
df['LR-'] = [compute_lrn(groups[soc]) for pt in groups]
df['chi-square'] = [compute_chisquare(groups[soc]) for soc in groups]
df.round(3)
df = df.sort_values(by=['LR+'], ascending=False)
dfStyler = df.style.set_properties(**{'text-align': 'right'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])
dfStyler.set_properties(subset=['MedDRA SOC'], **{'text-align': 'left'}).hide_index()

MedDRA SOC,TP,FP,FN,TN,Sensitivity,Specificity,LR+,LR-,chi-square
"Injury, poisoning and procedural complications",30,7,91,78,0.247934,0.917647,3.010626,0.796857,9.289745
Infections and infestations,54,11,84,57,0.391304,0.838235,2.418972,0.796857,11.113175
Ear and labyrinth disorders,7,3,96,100,0.067961,0.970874,2.333333,0.796857,1.681633
"Congenital, familial and genetic disorders",11,16,42,137,0.207547,0.895425,1.98467,0.796857,3.664849
Skin and subcutaneous tissue disorders,57,28,49,72,0.537736,0.72,1.920485,0.796857,14.103225
Musculoskeletal and connective tissue disorders,34,22,60,90,0.361702,0.803571,1.841393,0.796857,7.052459
Vascular disorders,34,18,73,81,0.317757,0.818182,1.747664,0.796857,5.035576
Blood and lymphatic system disorders,64,46,38,58,0.627451,0.557692,1.418585,0.796857,7.093372
Investigations,4,6,62,134,0.060606,0.957143,1.414141,0.796857,0.305933
Cardiac disorders,64,34,55,53,0.537815,0.609195,1.376174,0.796857,4.354923
