# Using Metadata to Improve Artifical Intelligance Medical Image Diagnostic Accuracy
**Purpose and Background**
Conduct a descriptive analysis of crowdsourced data extracted from user interaction with a mobile application where tasked to binarly (yes or no) identify abnormalities in medical images. 

Two user categories were differentiated: Medical experts hired to interact with the application; and crowd, anyone who downloaded and used the application.

**Show that the crowd agrees with the expert majority more than experts agreeing with the expert majority**


### Import datasets

In [103]:
import pandas as pd
import numpy as np
results = pd.read_csv('1345_customer_results.csv') #medical case results
admin = pd.read_csv('1345_admin_reads.csv') #raw individual read

### Inspect Customer Results

In [104]:
results.dtypes #MAKE CASE ID AS INDEX!

Case ID                   int64
Origin                   object
Origin Created At        object
Content ID                int64
URL                      object
Labeling State           object
Series                  float64
Series Index            float64
Patch                   float64
Qualified Reads           int64
Correct Label            object
Majority Label           object
Difficulty              float64
Agreement               float64
First Choice Answer      object
First Choice Votes        int64
First Choice Weight     float64
Second Choice Answer     object
Second Choice Votes       int64
Second Choice Weight    float64
Internal Notes          float64
Comments                 object
Explanation             float64
dtype: object

**Preliminary filtering for security purposes**


In [105]:
results = results.dropna(subset=['Origin']) 
results["Expert: Abnormal Votes"] = results["Origin"].str.extract(r'vote(\d)').astype(float)
results = results.drop(['Origin Created At','Origin','Content ID','URL'],axis=1)

In [106]:
results.head(2)

Unnamed: 0,Case ID,Labeling State,Series,Series Index,Patch,Qualified Reads,Correct Label,Majority Label,Difficulty,Agreement,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Internal Notes,Comments,Explanation,Expert: Abnormal Votes
0,5888087,Gold Standard,,,,2,'no','no',0.0,1.0,'no',2,1.54,'yes',0,0.0,,[],,2.0
1,5888088,Gold Standard,,,,3,'no','no',0.0,1.0,'no',3,2.34,'yes',0,0.0,,[],,0.0


Any rows that did not have a string associated with expert votes in the URL were dropped (i.e. NA)

*0 means all experts thought the case was normal  

In [107]:
results = results.dropna(subset=["Expert: Abnormal Votes"])
#results["Expert: Abnormal Votes"].isnull().any()

**Inspect NaN Columns for Content**

In [108]:
results.loc[results['Series'].notna()| results['Series Index'].notna() | results['Patch'].notna() | results['Internal Notes'].notna() | results['Explanation'].notna()]

Unnamed: 0,Case ID,Labeling State,Series,Series Index,Patch,Qualified Reads,Correct Label,Majority Label,Difficulty,Agreement,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Internal Notes,Comments,Explanation,Expert: Abnormal Votes


Dataframe is empty; None of the columns scanned through the pipeline contained any data

In [109]:
results = results.drop(['Series','Series Index','Patch','Internal Notes','Explanation'],axis=1)

**Inspect Comments for Relevance**

In [110]:
results.loc[results['Comments'] != '[]']

Unnamed: 0,Case ID,Labeling State,Qualified Reads,Correct Label,Majority Label,Difficulty,Agreement,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Comments,Expert: Abnormal Votes
4245,5892332,Gold Standard,1,'no','no',0.0,1.0,'no',1,0.8,'yes',0,0.0,['There was rapid and spiky rates so why am I ...,3.0
6029,5894116,Gold Standard,5,'no','yes',1.0,1.0,'yes',5,4.0,'no',0,0.0,['Can someone explain why the answer is “no”?'],0.0
8346,5896433,Gold Standard,3,'yes','no',1.0,1.0,'no',3,2.32,'yes',0,0.0,['??'],5.0
11433,5899520,Gold Standard,2,'yes','no',1.0,1.0,'no',2,1.58,'yes',0,0.0,"[""i can't see any spike in this question so wh...",5.0
12911,5900998,Gold Standard,2,'no','yes',1.0,1.0,'yes',2,1.56,'no',0,0.0,['There is obviously a peak happened in there'],3.0
13827,5901914,Gold Standard,6,'yes','no',1.0,1.0,'no',6,4.72,'yes',0,0.0,['No spike present'],5.0
13953,5902040,Gold Standard,2,'no','yes',1.0,1.0,'yes',2,1.58,'no',0,0.0,['How?'],3.0
16033,5904120,Gold Standard,1,'yes','no',1.0,1.0,'no',1,0.78,'yes',0,0.0,['How? '],6.0
16326,5904413,Gold Standard,3,'yes','no',1.0,1.0,'no',3,2.46,'yes',0,0.0,['Multiple?'],6.0
16385,5904472,Gold Standard,3,'yes','yes',0.333,0.667,'yes',2,1.58,'no',1,0.78,[' Wtf'],5.0


None of the comments seem relevant, so the Comments column will be dropped

In [111]:
results = results.drop(['Comments'],axis=1)

There should only be 8 experts total; cases were dropped for experts count >8

In [112]:
results = results[results["Expert: Abnormal Votes"] <= 8]

### Important columns for analysis; original metadata
Each row corresponds to a medical case 

**Identifiers:** 

Case ID: unique identifier will serve as index

Labeling State: identifies whether a expert consensus has been achieved (yes=Gold Standard, no= In Progress)

URL: Extracted out expert vote count within the URL 

**Reads and Annotations**

Qualified Reads: total crowd vote count

Expert: Abnormal Votes: number of experts who thought the case was abnormal

(note, the total of experts voting is always 8)

Correct Label: overall expert consensus 

{yes=case is abnormal, no=case is normal, NaN=no consensus}

Majority Label: overall crowd consensus on each case

**Measures of Confidence**

Difficulty: Qualified Reads *without the Correct Label* divided by total Qualified Reads.

Agreement: Qualified Reads *with the Majority Label* divided by total Qualified Reads.

Nth Choice Answer: crowd answer (First Choice is the Majority Label)
        
Nth Choice Votes: number of crowd votes per answer
        
Nth Choice Weight:
        
        
        



### Add Additional Relevant Columns 




In [113]:
df = results 
expert_count = 8
df["Expert: Normal Votes"] = (expert_count - results["Expert: Abnormal Votes"])
df["Expert/Expert Agreement"] = df["Expert: Abnormal Votes"]/expert_count
#split_conses = (df.loc(df["Expert/Expert Agreement"]==0.5))
df.head()

Unnamed: 0,Case ID,Labeling State,Qualified Reads,Correct Label,Majority Label,Difficulty,Agreement,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Expert: Abnormal Votes,Expert: Normal Votes,Expert/Expert Agreement
0,5888087,Gold Standard,2,'no','no',0.0,1.0,'no',2,1.54,'yes',0,0.0,2.0,6.0,0.25
1,5888088,Gold Standard,3,'no','no',0.0,1.0,'no',3,2.34,'yes',0,0.0,0.0,8.0,0.0
2,5888089,Gold Standard,2,'no','no',0.0,1.0,'no',2,1.7,'yes',0,0.0,0.0,8.0,0.0
3,5888090,Gold Standard,1,'no','no',0.0,1.0,'no',1,0.82,'yes',0,0.0,0.0,8.0,0.0
4,5888091,In Progress,7,,'yes',,0.571,'yes',4,3.28,'no',3,2.32,4.0,4.0,0.5


I will rename some of the original columns for clarity

In [114]:
df["Expert Majority"] = results["Correct Label"]
df["Crowd Majority"] = results["Majority Label"]
df["Expert/Crowd Disagreement"] = results["Difficulty"] #porportion of crowd disagreeing with expert consensus
df["Crowd/Crowd Agreement"] = results["Agreement"] #porportion of crowd agreeing with crowd consensus
df = df.drop(columns= ["Correct Label","Majority Label","Difficulty","Agreement"])

In [115]:
df.head()

Unnamed: 0,Case ID,Labeling State,Qualified Reads,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Expert: Abnormal Votes,Expert: Normal Votes,Expert/Expert Agreement,Expert Majority,Crowd Majority,Expert/Crowd Disagreement,Crowd/Crowd Agreement
0,5888087,Gold Standard,2,'no',2,1.54,'yes',0,0.0,2.0,6.0,0.25,'no','no',0.0,1.0
1,5888088,Gold Standard,3,'no',3,2.34,'yes',0,0.0,0.0,8.0,0.0,'no','no',0.0,1.0
2,5888089,Gold Standard,2,'no',2,1.7,'yes',0,0.0,0.0,8.0,0.0,'no','no',0.0,1.0
3,5888090,Gold Standard,1,'no',1,0.82,'yes',0,0.0,0.0,8.0,0.0,'no','no',0.0,1.0
4,5888091,In Progress,7,'yes',4,3.28,'no',3,2.32,4.0,4.0,0.5,,'yes',,0.571


#### How reliable are the individual experts on average?

In [128]:
#fig = px.bar(df, x= "Expert Majority", y="Expert: Abnormal Votes", facet_col)
#fig.show()
#FIX NaN
df.fillna('split', inplace=True)
df['Expert Majority'].value_counts()
df.head()

Unnamed: 0,Case ID,Labeling State,Qualified Reads,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Expert: Abnormal Votes,Expert: Normal Votes,Expert/Expert Agreement,Expert Majority,Crowd Majority,Expert/Crowd Disagreement,Crowd/Crowd Agreement
0,5888087,Gold Standard,2,'no',2,1.54,'yes',0,0.0,2.0,6.0,0.25,'no','no',0.0,1.0
1,5888088,Gold Standard,3,'no',3,2.34,'yes',0,0.0,0.0,8.0,0.0,'no','no',0.0,1.0
2,5888089,Gold Standard,2,'no',2,1.7,'yes',0,0.0,0.0,8.0,0.0,'no','no',0.0,1.0
3,5888090,Gold Standard,1,'no',1,0.82,'yes',0,0.0,0.0,8.0,0.0,'no','no',0.0,1.0
4,5888091,In Progress,7,'yes',4,3.28,'no',3,2.32,4.0,4.0,0.5,,'yes',,0.571


In [117]:
#pip install jupyter-dash
import plotly.express as px


#### How does crowd agreement change as a function of the expert agreement?

In [118]:
#fig = px.histogram(df, x="Expert/Expert Agreement", y= "Crowd/Crowd Agreement")
#fig.show()

In [129]:
fig = px.density_heatmap(df, x="Expert/Expert Agreement", y='Crowd/Crowd Agreement',text_auto=True,facet_col="Expert Majority")
fig.show()

In [120]:
fig = px.density_heatmap(df, x="Expert/Expert Agreement", y='Crowd/Crowd Agreement',text_auto=True,facet_col="Expert Majority")

## Exploratory Analysis

In [121]:
print(results["Difficulty"].describe())

count    23758.000000
mean         0.244252
std          0.334227
min          0.000000
25%          0.000000
50%          0.000000
75%          0.429000
max          1.000000
Name: Difficulty, dtype: float64
