# Using Metadata to Improve Artifical Intelligance Medical Image Diagnostic Accuracy
**Purpose and Background**
Conduct a descriptive analysis of crowdsourced data extracted from user interaction with a mobile application where tasked to binarly (yes or no) identify abnormalities in medical images. 

Two user categories were differentiated: Medical experts hired to interact with the application; and crowd, anyone who downloaded and used the application.

**Show that the crowd agrees with the expert majority more than experts agreeing with the expert majority**


### Import datasets

In [85]:
import pandas as pd
import numpy as np
results = pd.read_csv('1345_customer_results.csv') #medical case results
admin = pd.read_csv('1345_admin_reads.csv') #raw individual read

### Inspect Customer Results

In [86]:
results.dtypes
results = results.set_index('Case ID')

**Preliminary filtering for security purposes**


In [87]:
results = results.dropna(subset=['Origin']) 
results["Expert: Abnormal Votes"] = results["Origin"].str.extract(r'vote(\d)').astype(float)
results = results.drop(['Origin Created At','Origin','Content ID','URL'],axis=1)

In [88]:
results.head(2)

Any rows that did not have a string associated with expert votes in the URL were dropped (i.e. NA)

In [89]:
results = results.dropna(subset=["Expert: Abnormal Votes"])

**Inspect NaN Columns for Content**

In [90]:
results.loc[results['Series'].notna()| results['Series Index'].notna() | results['Patch'].notna() | results['Internal Notes'].notna() | results['Explanation'].notna()]

Dataframe is empty; columns inspected will be dropped

In [91]:
results = results.drop(['Series','Series Index','Patch','Internal Notes','Explanation'],axis=1)

**Inspect Comments for Relevance**

In [80]:
results[results['Comments'] != '[]']


None of the comments seem relevant; comments column will be dropped

In [81]:
results = results.drop(['Comments'],axis=1)

There should only be 8 experts total; drop cases for expert count greater than 8

In [82]:
results = results[results["Expert: Abnormal Votes"] <= 8]

### Important columns for analysis; original metadata
Each row corresponds to a medical case 

**Identifiers:** 

Case ID: unique identifier will serve as index

Labeling State: identifies whether a expert consensus has been achieved (yes=Gold Standard, no= In Progress)

URL: Extracted out expert vote count within the URL 

**Reads and Annotations**

Qualified Reads: total crowd vote count

Expert: Abnormal Votes: number of experts who thought the case was abnormal

(note, the total of experts voting is always 8)

Correct Label: overall expert consensus 

{yes=case is abnormal, no=case is normal, NaN=no consensus}

Majority Label: overall crowd consensus on each case

**Measures of Confidence**

Difficulty: Qualified Reads *without the Correct Label* divided by total Qualified Reads.

Agreement: Qualified Reads *with the Majority Label* divided by total Qualified Reads.

Nth Choice Answer: crowd answer (First Choice is the Majority Label)
        
Nth Choice Votes: number of crowd votes per answer
        
Nth Choice Weight:
        
        
        



### Add Relevant Columns and Optimize Dataframe




#### Cluster cases categorically based on difficulty

In [83]:
bins=[0,0.2,0.4,0.6,0.8,1]
labels=['very easy','easy','moderate','challenging','very challenging']
results['Difficulty Category'] = pd.cut(results['Difficulty'],bins=bins,labels=labels,include_lowest=True)

In [84]:
df = results
index = df.index[df['Crowd Majority']=="'no'"].tolist()
df['Crowd Agreement'][index] -= 0.5

In [None]:
expert_count = 8
df["Expert: Normal Votes"] = (expert_count - results["Expert: Abnormal Votes"])
df["Expert Agreement"] = df["Expert: Abnormal Votes"]/expert_count
df['Consensus'] = np.where(df['Expert Majority'] == df['Crowd Majority'],'yes','no')
df

#### Expert: Normal Votes:
I subtracted the number of total experts by the known number of experts who voted the case as abnormal

#### Expert Agreement: 
I divided the number of experts who voted the case as abnormal by the total number of experts to get the porportion of experts who agree that the case is abnormal.

#### Error Rate: 
I extracted the indexes for each category and calculated the "error rate" for the experts who did not vote for the expert majority

#### Consensus:
I indiciated cases where there was unanimity between experts and crowd.

### I will rename some of the original columns for clarity

   #### {Original column --> Renamed Column}
    
    Correct Label --> Expert Majority

    Majority Label --> Crowd Majority

    Difficulty --> Expert/Crowd Disagreement

    Agreement --> Crowd Agreement

In [None]:
df["Expert Majority"] = results["Correct Label"]
df["Crowd Majority"] = results["Majority Label"]
df["Expert/Crowd Disagreement"] = results["Difficulty"] 
df["Crowd Agreement"] = results["Agreement"] 
df = df.drop(columns= ["Correct Label","Majority Label","Difficulty","Agreement"])

#### Expert/Crowd Disagreement 
is the porportion of crowd disagreeing with expert consensus (i.e. difficulty)

#### Crowd Agreement
is the porportion of crowd agreeing with crowd consensus (i.e. agreement)

#### Error rate of experts
I extracted the indexes for each category and calculated the "error rate" for the experts who did not vote for the expert majority

In [None]:
EM_yes = df.index[df['Expert Majority'] == "'yes'"].tolist()

EM_no = df.index[df['Expert Majority'] == "'no'"].tolist()

df.loc[EM_yes,"Error Rate"]= df['Expert: Normal Votes'][EM_yes]/expert_count
df.loc[EM_no,"Error Rate"]= df['Expert: Abnormal Votes'][EM_no]/expert_count
#df.fillna('', inplace=True)
beg_index = list(df.columns).index('Expert: Abnormal Votes') #9
df.iloc[ : , 13:]


In [None]:
df = results
index = df.index[df['Crowd Majority']=="'no'"].tolist()
df['Crowd Agreement'][index] -= 0.5

## Exploratory Analysis 1

In [None]:
#%pip install jupyter-dash
import plotly.express as px
import plotly.io as pio
import plotly.figure_factory as ff
pio.renderers.default='notebook'
import matplotlib.pyplot as plt
#ax = plt.subplot()

In [None]:
#FIX NaN; currently is blank
#df.fillna('', inplace=True)
print(df['Expert Majority'].value_counts())
#df[df["Expert Majority"]].isnull()
print(df['Expert Majority'].isnull().any().any())

12,000 medical cases were judged to be abnormal by experts

12,000 medical cases were judged to be normal by experts

3,0000 medical cases failed to reach a consensus (4 experts voted for normal and 4 experts voted for abnormal)


### Exploratory Analysis

#### How reliable are the individual experts on average?

With the number of experts being 8, filtering the qualified reads to 5 or more would keep things more porportional

In [None]:
filt_df = df[df['Qualified Reads'] >= 5]
fig = px.density_heatmap(filt_df, x="Expert Majority", y='Crowd Majority',text_auto=True)
fig.show()
filt_df.std(axis=0)

In [None]:
print(sum(df['Expert: Normal Votes']+df['Expert: Abnormal Votes'])-sum(df['Qualified Reads']))

print(sum(filt_df['Expert: Normal Votes']+filt_df['Expert: Abnormal Votes'])-sum(filt_df['Qualified Reads']))


#### When we filtered the qualified reads to 5 or more, the disparity between expert vote count and reader vote count across all cases significantly decreases.

In [None]:

#consensus_yes = filt_df.index[filt_df['Consensus']=='yes'].tolist()

fig2 = px.histogram(filt_df, 
                    x='Expert Agreement',color='Expert Majority',
                    marginal='violin',color_discrete_map={"'yes'":'purple',"'no'":'red'}, 
                    labels={'x' : 'Agreement Ratio', 'y' : 'Count'},
                   )
fig1 = px.histogram(filt_df, x='Crowd Agreement',color='Crowd Majority',marginal='violin', color_discrete_map={"'yes'":'green',"'no'":'yellow'})
fig2.update_layout(title_text='Experts',
    title_x=0.5, showlegend=True,
    legend_title=None)
fig1.data[0].name="Crowd Majority: Yes"
fig2.data[0].name="Expert Majority: No"
fig1.data[2].name="Crowd Majority: No"
fig2.data[2].name="Expert Majority: Yes"
fig2.add_trace(fig1.data[0])
fig2.add_trace(fig1.data[1])
fig2.add_trace(fig1.data[2])
fig2.add_trace(fig1.data[3])

split_conses = filt_df[filt_df["Expert Agreement"]==0.5]
fig3 = px.histogram(split_conses, x='Expert Agreement',color='Expert Majority', color_discrete_map={'nan':'blue'}) #won't change to blue
fig3.data[0].name="Expert Majority: NaN"
fig2.add_trace(fig3.data[0])

fig2.update_layout(barmode='overlay')
fig2.update_xaxes(dtick=0.2)
fig2.update_traces(opacity=0.32)
import plotly.graph_objects as go
fig2.add_shape(type="rect",x0=0.493,x1=0.525,y0=0,y1=3000,line_width=1,line_dash='dot')
#fig2.add_trace(go.Scatter(filt_df.loc[consensus_no], x)
fig2.show()




#consensus_yes = filt_df.index[filt_df['Consensus']=='yes'].tolist()


fig1 = px.histogram(filt_df, x='Crowd Agreement',color='Crowd Majority', color_discrete_map={"'yes'":'green',"'no'":'yellow'},
                   text_auto=True)

fig1.data[1].name="Crowd Majority: No"


fig1.data[0].name="Crowd Majority: Yes"

fig3 = px.histogram(split_conses, x='Expert Agreement',color='Expert Majority', color_discrete_map={'nan':'orange'},
                   text_auto=True) #won't change color
fig3.data[0].name="Expert Majority: NaN"

from plotly.subplots import make_subplots
fig = make_subplots(rows=1,cols=3)
fig.add_trace(fig3.data[0],row=1,col=1)
fig.update_xaxes(title_text='Expert Majority: NaN',row=1,col=1)

fig.add_trace(fig1.data[1],row=1,col=2)
fig.update_xaxes(title_text='Crowd Majority: No',row=1,col=2)
fig.update_xaxes(range=[0.49,0.51])

fig.add_trace(fig1.data[0],row=1,col=3)

fig.add_annotation(text="393",row=1,col=3)
fig.update_xaxes(title_text='Crowd Majority: Yes',range=[0.49,0.61],row=1,col=3)


fig.update_yaxes(range=[0,3000])
fig.update_traces(opacity=0.6)
fig.update_layout(title_text="Frequency of Split Agreement")
fig.add_shape(type="rect",x0=0.493,x1=0.507,y0=0,y1=2950,line_width=1,line_dash='dot', row=1,col=1)
fig.add_shape(type="rect",x0=0.493,x1=0.507,y0=0,y1=2950,line_width=1,line_dash='dot', row=1,col=2)
fig.add_shape(type="rect",x0=0.502,x1=0.602,y0=0,y1=2950,line_width=1,line_dash='dot', row=1,col=3)
fig.show()


When the experts are undecided (N=4) on the case prognosis, crowd appears to have a more unified opinion on the case. Let's make a histogram examining the cases where there's lack of consensus.

## When crowd disagrees with experts, how divded are they?

In [None]:
consensus_no = filt_df.index[filt_df['Consensus']=='no'].tolist()

fig2 = px.histogram(filt_df.loc[consensus_no], 
                    x='Expert Agreement',color='Expert Majority',
                    marginal='violin',color_discrete_map={"'yes'":'purple',"'no'":'red'}, 
                    labels={'x' : 'Agreement Ratio', 'y' : 'Count'},
                   )
fig1 = px.histogram(filt_df.loc[consensus_no], x='Crowd Agreement',color='Crowd Majority',marginal='violin', color_discrete_map={"'yes'":'green',"'no'":'yellow'})
fig2.update_layout(title_text='Experts',
    title_x=0.5, showlegend=True,
    legend_title=None)
fig1.data[0].name="Crowd Majority: Yes"
fig2.data[0].name="Expert Majority: No"
fig1.data[2].name="Crowd Majority: No"
fig2.data[2].name="Expert Majority: Yes"
fig2.add_trace(fig1.data[0])
fig2.add_trace(fig1.data[1])
fig2.add_trace(fig1.data[2])
fig2.add_trace(fig1.data[3])


fig2.update_layout(barmode='overlay')
fig2.update_xaxes(dtick=0.2)
fig2.update_traces(opacity=0.32)
import plotly.graph_objects as go
fig2.add_shape(type="rect",x0=0.5,x1=0.55,y0=0,y1=750,line_width=1,line_dash='dot')
#fig2.add_trace(go.Scatter(filt_df.loc[consensus_no], x)
fig2.show()
fig2.update_layout(title_text='ALPACA Queries Left',
    title_x=0.5, showlegend=True,
    legend_title=None)
fig2.show()
fig2.update_xaxes(range=[0.55,1.1])
fig2.update_layout(title_text='ALPACA Queries Right',
    title_x=0.5, showlegend=True,
    legend_title=None)
fig2.show()


## When experts can't decide case status, what does the crowd think?


In [None]:
#fig = px.histogram(filt_df.loc[experts_split], x='Crowd Agreement',color='Crowd Majority',marginal='box')
#fig.show()


split_conses = filt_df.index[filt_df["Expert Agreement"]==0.5].tolist()
fig3 = px.histogram(filt_df[split_conses], x='Crowd Agreement',color='Crowd Majority')
fig3.show()

Seems uniform; which cases fall into the category? Are they the harder ones?

Make these into bubble charts to show porportion: https://plotly.com/python/bubble-charts/

### Problem x User Matrix

#### Isolate Problem_id, User_id, accuracy, chosen answer, and correct answer

In [None]:
admin = pd.read_csv('1345_admin_reads.csv') #raw individual read

In [None]:
admin
#results = results.set_index('Case ID')


In [None]:
PU_df = admin
PU_df = PU_df.set_index('read_id')
PU_df = admin[['problem_id','user_id','accuracy','chosen_answer']].copy()
PU_df = pd.concat([filt_df[['Expert Majority']],PU_df],axis=1)
PU_df

In [None]:
PU_matrix_accur = PU_df.pivot_table(index='problem_id',columns='user_id',values='accuracy', aggfunc='mean')
PU_matrix_accur

In [None]:
user_accuracy_table = pd.DataFrame(PU_matrix_accur.mean())
user_accuracy_table

Average accuracy of each user

In [None]:
#PU_matrix_corr = PU_df.pivot_table(index='problem_id',columns='user_id',values='Expert Majority', aggfunc='sum')
#PU_matrix_corr
PU_matrix_accur.keys()

### SciKit/Weighted Statistics

In [None]:
pip install deslib

META-DES 7, 8, 15
: R. M. O. Cruz, R. Sabourin, G. D. C. Cavalcanti, T. I. Ren, META-DES: A dynamic ensemble selection framework using meta-learning, Pattern Recognition 48 (5) (2015) 1925–1935.

Cruz, R.M., Sabourin, R. and Cavalcanti, G.D., 2015, July. META-DES. H: a dynamic ensemble selection technique using meta-learning and a dynamic weighting approach. In Neural Networks (IJCNN), 2015 International Joint Conference on (pp. 1-8)

R. M. O. Cruz, R. Sabourin, G. D. C. Cavalcanti, META-DES.Oracle: Meta-learning and feature selection for dynamic ensemble selection, Information Fusion 38 (2017) 84–103.Nov 30;38:84-103.


 This method (Dynamic Ensemble Selection-Kullback-Leibler divergence (DES-KL).) estimates the competence of the classifier (the ability of the user to answer correctly) and puts them into clusters categorizing groups that answer more accurately than others.

https://github.com/scikit-learn-contrib/DESlib/blob/master/deslib/des/probabilistic/deskl.py


In [None]:
import numpy as np

from deslib.des.probabilistic import BaseProbabilistic
from deslib.util import entropy_func


class DESKL(BaseProbabilistic):
    
    def __init__(self, pool_classifiers=None, k=None, DFP=False, with_IH=False,
                 safe_k=None, IH_rate=0.30, mode='selection',
                 random_state=None, knn_classifier='knn',
                 knn_metric='minkowski', DSEL_perc=0.5, n_jobs=-1,
                 voting='hard'):
        super(DESKL, self).__init__(pool_classifiers=pool_classifiers,
                                    k=k,
                                    DFP=DFP,
                                    with_IH=with_IH,
                                    safe_k=safe_k,
                                    IH_rate=IH_rate,
                                    mode=mode,
                                    random_state=random_state,
                                    knn_classifier=knn_classifier,
                                    knn_metric=knn_metric,
                                    DSEL_perc=DSEL_perc,
                                    n_jobs=n_jobs,
                                    voting=voting)

        self.selection_threshold = 0.0

    def source_competence(self):
        """Calculates the source of competence using the KL divergence method.

        The source of competence C_src at the validation point
        :math:`\\mathbf{x}_{k}` is calculated by the KL divergence
        between the vector of class supports produced by the base classifier
        and the outputs of a random classifier (RC) RC = 1/L, L being the
        number of classes in the problem. The value of C_src is negative if
        the base classifier misclassified the instance :math:`\\mathbf{x}_{k}`.

        Returns
        ----------
        C_src : array of shape (n_samples, n_classifiers)
            The competence source for each base classifier at each data point.
        """

        C_src = np.zeros((self.n_samples_, self.n_classifiers_))
        for clf_index in range(self.n_classifiers_):
            supports = self.dsel_scores_[:, clf_index, :]
            is_correct = self.DSEL_processed_[:, clf_index]
            C_src[:, clf_index] = entropy_func(self.n_classes_, supports,
                                               is_correct)

        return C_src
    
DESKL(PU_matrix_accur.keys(),k=10, knn_metric='minkowski')