# Chexbert

We use Chexbert like a groudntruth for 14 medical observations: <b>Fracture</b> , Consolidation, Enlarged Cardiomediastinum, No Finding, Pleural Other, <b>Cardiomegaly</b>, <b>Pneumothorax</b>, <b>Atelectasis</b>, Support Devices, <b>Edema</b>, <b>Pleural Effusion</b>, Lung Lesion, Lung Opacity.

From this group 6  medical observations are relevant:

Fracture:  Rib fracture, Skull fracture, Rib fracture

Cardiomegaly

Pneumothorax

Atelectasis

Edema: Pulmonary edema, Cerebral edema

Pleural Effusion


In [None]:
import pandas as pd

In [None]:
data = pd.read_csv('~/rads_dispo_lim_2023_02_23.csv')
data.head()

In [None]:
%cd /root/Project-CS224N-ED-Disposition/CheXbert-Labeler

#### Run using conda activate chexbert11 

In [None]:
#!python src/label.py -d=/root/impressions.csv -o=/root/Project-CS224N-ED-Disposition/CheXbert-Labeler -c=/root/chexbert.pth

In [None]:
result_chexbert = pd.read_csv('/root/Project-CS224N-ED-Disposition/CheXbert-Labeler/labeled_reports_output.csv')
result_chexbert.head()

# Lbl2Vec

#### Run using conda activate l2v 

With labl2Vec we can predict medical observations using the column Impression

In [None]:
from gensim.models import doc2vec
import pandas as pd
# from tqdm import tqdm
from tqdm.auto import tqdm
from lbl2vec import Lbl2TransformerVec
tqdm.pandas()

In [None]:
%cd /root/Project-CS224N-ED-Disposition

In [None]:
keywords = [
	["Pneumonia"],
	["Pneumothorax"],
	["Pleural Effusion"],
    ['Edema'],
    ['Fracture'],
#	["Pulmonary edema"],
#	["Rib fracture"],
	["Infection"],
	["Aspiration"],
	["Cardiomegaly"],
	["Opacities"],
	["Atelectasis"],
	["Intracranial hemorrhage"],
	["Subarachnoid hemorrhage"],
	["Subdural hemorrhage"],
	["Epidural hemorrhage"],
	["Intraparenchymal hemorrhage"],
	["Intraventricular hemorrhage"],
#	["Skull fracture"],
	["Stroke"],
#	["Cerebral edema"],
	["Diffuse axonal injury"],
	["Appendicitis"],
	["Cholecystitis"],
	["Abdominal Aortic Aneurysm"],
	["Small bowel obstruction"],
	["Pancreatitis"],
	["Splenic laceration"],
	["Liver laceration"],
	["Colitis"],
	["Pyelonephritis"],
	["Nephrolithiasis"],
	["Malignancy"],
	["Pericaridial effusion"],
	["Aortic dissection"]
]

- Run  Lbl2TransformerVec

In [None]:
# init model using the default transformer-embedding model ("sentence-transformers/all-MiniLM-L6-v2")
label_name = [i[0] for i in keywords]

model = Lbl2TransformerVec(
    keywords_list=keywords,
    documents=data["Impression"],
    label_names = label_name 
)
model.fit()


In [None]:
# get similarity scores from trained model
result_l2v = model.predict_model_docs()

In [None]:
#Save csv 
result_l2v.to_csv('/root/Project-CS224N-ED-Disposition/result_l2v_v1.csv', index=False)

In [None]:
result_l2v.head()

# Compare results from Chexbert and results from Lbl2TransformerVec on common keywords

- Comparing the label with the higher score

- Mapping between keywords and labels from chexbert and l2v. They have 7 keywords in common:

Fracture:  Rib fracture, Skull fracture

Cardiomegaly

Pneumothorax

Atelectasis

Edema: Pulmonary edema, Cerebral edema

Pleural Effusion

Pneumonia

In [None]:
#!pip install matplotlib
#!pip install seaborn

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn

- <b>Result l2v</b>

In [None]:
result_l2v = pd.read_csv('/root/Project-CS224N-ED-Disposition/result_l2v_v1.csv') 
result_l2v.head()

In [None]:
#select second highest values
list_labels = list(result_l2v.columns)
list_labels.remove("doc_key")
list_labels.remove("most_similar_label")
list_labels.remove("highest_similarity_score")
result_l2v['second_similar_label'] = result_l2v[list_labels].columns[np.argpartition(result_l2v[list_labels].values, -2)[:,-2]]
result_l2v[['most_similar_label','second_similar_label']].head()

In [None]:
important_keywords_l2v = ['Fracture', 'Cardiomegaly', 'Pneumothorax', 'Atelectasis',
                          'Edema', 'Pleural Effusion', 'Pneumonia', 'No Finding']

# Substitue values from  most_similar_label that are not in important_keywords_l2v as 'No Finding'
list_nofinding = list(set(list(result_l2v.columns)) - set(important_keywords_l2v )) 
result_l2v = result_l2v.replace(list_nofinding, [np.nan]*len(list_nofinding))
result_l2v[['most_similar_label', 'second_similar_label']].head(10)


In [None]:
# Combine most_similar_label with second_similar_label
#replace null values with values from another column in dataframe?
result_l2v['combine'] = (result_l2v['most_similar_label'].fillna(result_l2v['second_similar_label'])
                     .fillna('No Finding')
                 )

result_l2v['most_similar_label'] = result_l2v[['most_similar_label']].fillna('No Finding')
result_l2v[['most_similar_label', 'second_similar_label', 'combine']].head(10)

In [None]:
important_keywords_l2v = ['Fracture', 'Cardiomegaly', 'Pneumothorax', 'Atelectasis',
                          'Edema', 'Pleural Effusion', 'Pneumonia']
result_l2v_multiple = result_l2v[important_keywords_l2v]

# Replace maximum value of each row for one
result_l2v_multiple.values[range(len(result_l2v_multiple.index)), np.argmax(result_l2v_multiple.values, axis=1)] = 1
result_l2v_multiple[result_l2v_multiple < 1] = 0
result_l2v_multiple.head()

In [None]:
result_l2v_multiple_2 = result_l2v[important_keywords_l2v]

# Replace maximum and second value of each row for one
result_l2v_multiple_2.values[range(len(result_l2v_multiple_2.index)), np.argmax(result_l2v_multiple_2.values, axis=1)] = 1
# Replace second maximum value of each row for one
result_l2v_multiple_2.values[range(len(result_l2v_multiple_2.index)), np.argpartition(result_l2v_multiple_2.values, -2)[:,-2]] = 1

result_l2v_multiple_2[result_l2v_multiple_2 < 1] = 0
result_l2v_multiple_2.head()

- <b>Result chexbert</b>

In [None]:
result_chexbert = pd.read_csv('/root/Project-CS224N-ED-Disposition/CheXbert-Labeler/labeled_reports_output.csv')
result_chexbert.head(3)

In [None]:
important_keywords_chexbert = ['Fracture', 'Cardiomegaly', 'Pneumothorax', 
                               'Atelectasis','Edema', 'Pleural Effusion', 'Pneumonia', 'No Finding'] 
#result_chexbert.replace(0, np.nan, inplace=True)
result_chexbert.replace(-1, 0, inplace=True)
result_chexbert.replace(np.nan, 0, inplace=True)

result_chexbert['most_similar_label'] = result_chexbert[important_keywords_chexbert].idxmax(1)
result_chexbert['most_similar_label']  = result_chexbert['most_similar_label'] .fillna('No Finding')
# Select columns from result_chexbert included in important_keywords
#important_keywords_chexbert.append('Report Impression')
result_chexbert.head()

In [None]:
important_keywords_chexbert = ['Fracture', 'Cardiomegaly', 'Pneumothorax', 
                               'Atelectasis','Edema', 'Pleural Effusion', 'Pneumonia']
result_chexbert_multiple = result_chexbert[important_keywords_chexbert]
result_chexbert_multiple

In [None]:
def evaluate(y_test, y_pred):
    """
    Evaluation function. For each of the text in evaluation data, it reads the score from
    the predictions made. And based on this, it calculates the values of
    True positive, True negative, False positive, and False negative.

    :param y_test: true labels
    :param y_pred: predicted labels
    :param labels: list of possible labels
    :return: evaluation metrics for classification like, precision, recall, and f_score
    """
    y_pred = list(y_pred)
    y_test = list(y_test)
    
    labels = list(set(y_test  + y_pred))
    labels = sorted(labels)
    
    confusion = confusion_matrix(y_test, y_pred, labels= labels)
    print('Confusion Matrix\n')
    print(confusion)

    df_cm = pd.DataFrame(confusion, index=[i for i in labels],
                         columns=[i for i in labels])

    plt.figure(figsize=(7, 7))
    sn.heatmap(df_cm, annot=True)
    plt.title('Confusion Matrix')
    plt.xlabel("Predicted label")
    plt.ylabel("True label")

    # importing accuracy_score, precision_score, recall_score, f1_score
    Accuracy = accuracy_score(y_test, y_pred)
    print('\nAccuracy: {:.2f}\n'.format(Accuracy))

    Micro_Precision = precision_score(y_test, y_pred, average='micro')
    print('Micro Precision: {:.2f}'.format(Micro_Precision))

    Micro_Recall = recall_score(y_test, y_pred, average='micro')
    print('Micro Recall: {:.2f}'.format(Micro_Recall))

    Micro_F1score = f1_score(y_test, y_pred, average='micro')
    print('Micro F1-score: {:.2f}\n'.format(Micro_F1score))

    Macro_Precision = precision_score(y_test, y_pred, average='macro')
    print('Macro Precision: {:.2f}'.format(Macro_Precision))

    Macro_Recall = recall_score(y_test, y_pred, average='macro')
    print('Macro Recall: {:.2f}'.format(Macro_Recall))

    Macro_F1score = f1_score(y_test, y_pred, average='macro')
    print('Macro F1-score: {:.2f}\n'.format(Macro_F1score))

    Weighted_Precision = precision_score(y_test, y_pred, average='weighted')
    print('Weighted Precision: {:.2f}'.format(Weighted_Precision))

    Weighted_Recall = recall_score(y_test, y_pred, average='weighted')
    print('Weighted Recall: {:.2f}'.format(Weighted_Recall))

    Weighted_F1score = f1_score(y_test, y_pred, average='weighted')
    print('Weighted F1-score: {:.2f}'.format(Weighted_F1score))

    from sklearn.metrics import classification_report
    print('\nClassification Report\n')
    report = classification_report(y_test, y_pred, target_names=labels)
    print(report)

    return

- most_similar_label

In [None]:
result_chexbert['most_similar_label']

In [None]:
result_l2v['most_similar_label']

In [None]:
y_test = result_chexbert['most_similar_label']
y_pred = result_l2v['most_similar_label']
evaluate(y_test, y_pred)

- Combining for l2v most_similar_label and  and second_similar_label

In [None]:
y_test = result_chexbert['most_similar_label']
y_pred = result_l2v['combine']
evaluate(y_test, y_pred)

# Multi-Label Classification Techniques

https://mmuratarat.github.io/2020-01-25/multilabel_classification_metrics

In [None]:
import sklearn.metrics

def multi_label_evaluation(y_true, y_pred):
    print('Exact Match Ratio: {0}'.format(sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)))

    print('Hamming loss: {0}'.format(sklearn.metrics.hamming_loss(y_true, y_pred))) 

    #"samples" applies only to multilabel problems. It does not calculate a per-class measure, instead calculating the metric over the true and predicted classes 
    #for each sample in the evaluation data, and returning their (sample_weight-weighted) average.

    print('Recall: {0}'.format(sklearn.metrics.precision_score(y_true=y_true, y_pred=y_pred, average='samples'))) 


    print('Precision: {0}'.format(sklearn.metrics.recall_score(y_true=y_true, y_pred=y_pred, average='samples')))


    print('F1 Measure: {0}'.format(sklearn.metrics.f1_score(y_true=y_true, y_pred=y_pred, average='samples'))) 


In [None]:
y_true = np.array(result_chexbert_multiple)
y_pred = np.array(result_l2v_multiple)
#y_true = y_true  + y_pred
#substitute 2 for 1 and the rest for 0
#y_true[y_true != 2] = 0
#y_true[y_true == 2] = 1
multi_label_evaluation(y_true, y_pred)

In [None]:
y_true = np.array(result_chexbert_multiple)
y_pred = np.array(result_l2v_multiple_2)
y_true = y_true + y_pred
#substitute 2 for 1 and the rest for 0
y_true[y_true != 2] = 0
y_true[y_true == 2] = 1
multi_label_evaluation(y_true, y_pred)