# Using Metadata to Improve Artifical Intelligance Medical Image Diagnostic Accuracy
**Purpose and Background**
Conduct a descriptive analysis of crowdsourced data extracted from user interaction with a mobile application where tasked to binarly (yes or no) identify abnormalities in medical images. 

Two user categories were differentiated: Medical experts hired to interact with the application; and crowd, anyone who downloaded and used the application.

**Show that the crowd agrees with the expert majority more than experts agreeing with the expert majority**


### Import datasets

In [839]:
import pandas as pd
import numpy as np
results = pd.read_csv('1345_customer_results.csv') #medical case results
admin = pd.read_csv('1345_admin_reads.csv') #raw individual read

### Inspect Customer Results

In [840]:
results.dtypes
results = results.set_index('Case ID')

**Preliminary filtering for security purposes**


In [841]:
results = results.dropna(subset=['Origin']) 
results["Expert: Abnormal Votes"] = results["Origin"].str.extract(r'vote(\d)').astype(float)
results = results.drop(['Origin Created At','Origin','Content ID','URL'],axis=1)

In [842]:
results.head(2)

Unnamed: 0_level_0,Labeling State,Series,Series Index,Patch,Qualified Reads,Correct Label,Majority Label,Difficulty,Agreement,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Internal Notes,Comments,Explanation,Expert: Abnormal Votes
Case ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
5888087,Gold Standard,,,,2,'no','no',0.0,1.0,'no',2,1.54,'yes',0,0.0,,[],,2.0
5888088,Gold Standard,,,,3,'no','no',0.0,1.0,'no',3,2.34,'yes',0,0.0,,[],,0.0


Any rows that did not have a string associated with expert votes in the URL were dropped (i.e. NA)

*0 means all experts thought the case was normal  

In [843]:
results = results.dropna(subset=["Expert: Abnormal Votes"])
#results["Expert: Abnormal Votes"].isnull().any()

**Inspect NaN Columns for Content**

In [844]:
results.loc[results['Series'].notna()| results['Series Index'].notna() | results['Patch'].notna() | results['Internal Notes'].notna() | results['Explanation'].notna()]

Unnamed: 0_level_0,Labeling State,Series,Series Index,Patch,Qualified Reads,Correct Label,Majority Label,Difficulty,Agreement,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Internal Notes,Comments,Explanation,Expert: Abnormal Votes
Case ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1


Dataframe is empty; None of the columns scanned through the pipeline contained any data

In [845]:
results = results.drop(['Series','Series Index','Patch','Internal Notes','Explanation'],axis=1)

**Inspect Comments for Relevance**

In [846]:
results[results['Comments'] != '[]']


Unnamed: 0_level_0,Labeling State,Qualified Reads,Correct Label,Majority Label,Difficulty,Agreement,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Comments,Expert: Abnormal Votes
Case ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
5892332,Gold Standard,1,'no','no',0.0,1.0,'no',1,0.8,'yes',0,0.0,['There was rapid and spiky rates so why am I ...,3.0
5894116,Gold Standard,5,'no','yes',1.0,1.0,'yes',5,4.0,'no',0,0.0,['Can someone explain why the answer is “no”?'],0.0
5896433,Gold Standard,3,'yes','no',1.0,1.0,'no',3,2.32,'yes',0,0.0,['??'],5.0
5899520,Gold Standard,2,'yes','no',1.0,1.0,'no',2,1.58,'yes',0,0.0,"[""i can't see any spike in this question so wh...",5.0
5900998,Gold Standard,2,'no','yes',1.0,1.0,'yes',2,1.56,'no',0,0.0,['There is obviously a peak happened in there'],3.0
5901914,Gold Standard,6,'yes','no',1.0,1.0,'no',6,4.72,'yes',0,0.0,['No spike present'],5.0
5902040,Gold Standard,2,'no','yes',1.0,1.0,'yes',2,1.58,'no',0,0.0,['How?'],3.0
5904120,Gold Standard,1,'yes','no',1.0,1.0,'no',1,0.78,'yes',0,0.0,['How? '],6.0
5904413,Gold Standard,3,'yes','no',1.0,1.0,'no',3,2.46,'yes',0,0.0,['Multiple?'],6.0
5904472,Gold Standard,3,'yes','yes',0.333,0.667,'yes',2,1.58,'no',1,0.78,[' Wtf'],5.0


None of the comments seem relevant, so the Comments column will be dropped

In [847]:
results = results.drop(['Comments'],axis=1)

There should only be 8 experts total; cases were dropped for experts count >8

In [848]:
results = results[results["Expert: Abnormal Votes"] <= 8]

### Important columns for analysis; original metadata
Each row corresponds to a medical case 

**Identifiers:** 

Case ID: unique identifier will serve as index

Labeling State: identifies whether a expert consensus has been achieved (yes=Gold Standard, no= In Progress)

URL: Extracted out expert vote count within the URL 

**Reads and Annotations**

Qualified Reads: total crowd vote count

Expert: Abnormal Votes: number of experts who thought the case was abnormal

(note, the total of experts voting is always 8)

Correct Label: overall expert consensus 

{yes=case is abnormal, no=case is normal, NaN=no consensus}

Majority Label: overall crowd consensus on each case

**Measures of Confidence**

Difficulty: Qualified Reads *without the Correct Label* divided by total Qualified Reads.

Agreement: Qualified Reads *with the Majority Label* divided by total Qualified Reads.

Nth Choice Answer: crowd answer (First Choice is the Majority Label)
        
Nth Choice Votes: number of crowd votes per answer
        
Nth Choice Weight:
        
        
        



### Add Additional Relevant Columns 




In [849]:
df = results 
expert_count = 8
df["Expert: Normal Votes"] = (expert_count - results["Expert: Abnormal Votes"])
df["Expert Agreement"] = df["Expert: Abnormal Votes"]/expert_count
#split_conses = (df.loc(df["Expert/Expert Agreement"]==0.5))
df

Unnamed: 0_level_0,Labeling State,Qualified Reads,Correct Label,Majority Label,Difficulty,Agreement,First Choice Answer,First Choice Votes,First Choice Weight,Second Choice Answer,Second Choice Votes,Second Choice Weight,Expert: Abnormal Votes,Expert: Normal Votes,Expert Agreement
Case ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
5888087,Gold Standard,2,'no','no',0.000,1.000,'no',2,1.54,'yes',0,0.00,2.0,6.0,0.250
5888088,Gold Standard,3,'no','no',0.000,1.000,'no',3,2.34,'yes',0,0.00,0.0,8.0,0.000
5888089,Gold Standard,2,'no','no',0.000,1.000,'no',2,1.70,'yes',0,0.00,0.0,8.0,0.000
5888090,Gold Standard,1,'no','no',0.000,1.000,'no',1,0.82,'yes',0,0.00,0.0,8.0,0.000
5888091,In Progress,7,,'yes',,0.571,'yes',4,3.28,'no',3,2.32,4.0,4.0,0.500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5918375,Gold Standard,2,'no','yes',1.000,1.000,'yes',2,1.56,'no',0,0.00,2.0,6.0,0.250
5918376,Gold Standard,3,'no','yes',0.667,0.667,'yes',2,1.56,'no',1,0.76,3.0,5.0,0.375
5918377,In Progress,6,,'yes',,1.000,'yes',6,4.78,'no',0,0.00,4.0,4.0,0.500
5918378,Gold Standard,0,'yes',,,,'yes',0,0.00,'no',0,0.00,5.0,3.0,0.625


#### I will rename some of the original columns for clarity
JUST KIDDING :) (go back and change it, you can always make a mental note)

    {Original column --> Renamed Column}
    
    Correct Label --> Expert Majority

    Majority Label --> Crowd Majority

    Difficulty --> Expert/Crowd Disagreement

    Agreement --> Crowd Agreement

In [850]:
df["Expert Majority"] = results["Correct Label"]
df["Crowd Majority"] = results["Majority Label"]
df["Expert/Crowd Disagreement"] = results["Difficulty"] #porportion of crowd disagreeing with expert consensus
df["Crowd Agreement"] = results["Agreement"] #porportion of crowd agreeing with crowd consensus
df = df.drop(columns= ["Correct Label","Majority Label","Difficulty","Agreement"])


df['Consensus'] = np.where(df['Expert Majority'] == df['Crowd Majority'],'yes','no')

#### Error rate of experts
I extracted the indexes for each category and calculated the "error rate" for the experts who did not vote for the expert majority

In [851]:
EM_yes = df.index[df['Expert Majority'] == "'yes'"].tolist()

EM_no = df.index[df['Expert Majority'] == "'no'"].tolist()

df.loc[EM_yes,"Error Rate"]= df['Expert: Normal Votes'][EM_yes]/expert_count
df.loc[EM_no,"Error Rate"]= df['Expert: Abnormal Votes'][EM_no]/expert_count
df.fillna('', inplace=True)
beg_index = list(df.columns).index('Expert: Abnormal Votes') #9
df.iloc[ : , 13:]


Unnamed: 0_level_0,Expert/Crowd Disagreement,Crowd Agreement,Consensus,Error Rate
Case ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
5888087,0.0,1.0,yes,0.25
5888088,0.0,1.0,yes,0.0
5888089,0.0,1.0,yes,0.0
5888090,0.0,1.0,yes,0.0
5888091,,0.571,no,
...,...,...,...,...
5918375,1.0,1.0,no,0.25
5918376,0.667,0.667,no,0.375
5918377,,1.0,no,
5918378,,,no,0.375


## Exploratory Analysis 1

In [852]:
#%pip install jupyter-dash
import plotly.express as px
import plotly.io as pio
import plotly.figure_factory as ff
pio.renderers.default='notebook'
import matplotlib.pyplot as plt
#ax = plt.subplot()

In [853]:
#FIX NaN; currently is blank
df.fillna('', inplace=True)
print(df['Expert Majority'].value_counts())

'no'     12000
'yes'    12000
          3000
Name: Expert Majority, dtype: int64


12,000 medical cases were judged to be abnormal by experts

12,000 medical cases were judged to be normal by experts

3,0000 medical cases failed to reach a consensus (4 experts voted for normal and 4 experts voted for abnormal)


### Exploratory Analysis

In [854]:
fig = px.histogram(df,x= "Qualified Reads", color="Error Rate")
fig.update_layout(xaxis_range=[0,15])
fig.show()

In [855]:
print(len(df[df['Qualified Reads'] < 5].value_counts()))


3639


#### How reliable are the individual experts on average?

In [856]:

fig = px.density_heatmap(df, x="Expert Majority", y='Crowd Majority',text_auto=True)
fig.show()
df.std(axis=0)






Qualified Reads           2.549289
First Choice Votes        2.360711
First Choice Weight       1.888224
Second Choice Votes       0.959345
Second Choice Weight      0.764048
Expert: Abnormal Votes    2.582037
Expert: Normal Votes      2.582037
Expert Agreement          0.322755
dtype: float64

When the experts are undecided (N=4) on the case prognosis, crowd appears to have a more unified opinion on the case. Let's make a histogram examining the cases where there's lack of consensus.

With the number of experts being 8, filtering the qualified reads to 5 or more would keep things more porportional

In [857]:
filt_df = df[df['Qualified Reads'] >= 5]
fig = px.density_heatmap(filt_df, x="Expert Majority", y='Crowd Majority',text_auto=True)
fig.show()
filt_df.std(axis=0)





Qualified Reads           1.942279
First Choice Votes        1.961178
First Choice Weight       1.569474
Second Choice Votes       1.137077
Second Choice Weight      0.905635
Expert: Abnormal Votes    2.456104
Expert: Normal Votes      2.456104
Expert Agreement          0.307013
Crowd Agreement           0.153559
dtype: float64

In [858]:
fig = px.scatter(filt_df,x= "Expert Agreement", y="Crowd Agreement")
#fig.update_layout(yaxis_range=[0.4,1.1])
fig.show()

This doesn't tell us much; perhaps calculate an average?

In [859]:
consensus_no = filt_df.index[filt_df['Consensus']=='no'].tolist()
fig = px.histogram(filt_df.loc[consensus_no], x='Crowd Agreement',color='Crowd Majority',marginal='box')
fig.update_layout(xaxis_range=[-0.1,1.1])
fig.show()

Move red to left side of the graph

In [860]:
fig = px.histogram(filt_df.loc[consensus_no], x='Expert Agreement',color='Expert Majority',marginal='box')
fig.show()

### Let's check out what are the trends when experts are split in their opinion:
what about when crowd is split?

In [861]:
experts_split = filt_df.index[filt_df['Expert Agreement']==0.500].tolist()
fig = px.histogram(filt_df.loc[experts_split], x='Crowd Agreement',color='Crowd Majority',marginal='box')
fig.update_layout(xaxis_range=[0.45,1.05])
fig.show()

## Trying to aggregate case ids

In [862]:

filt_df = df[df['Qualified Reads'] > 5]
fig = px.histogram(filt_df,x= "Case ID")
#fig.update_layout(yaxis_range=[0.4,1.1])
fig.show()

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of ['Labeling State', 'Qualified Reads', 'First Choice Answer', 'First Choice Votes', 'First Choice Weight', 'Second Choice Answer', 'Second Choice Votes', 'Second Choice Weight', 'Expert: Abnormal Votes', 'Expert: Normal Votes', 'Expert Agreement', 'Expert Majority', 'Crowd Majority', 'Expert/Crowd Disagreement', 'Crowd Agreement', 'Consensus', 'Error Rate'] but received: Case ID

per case ID:
sum the instances and show porportions of error rate; ### aggregate case IDs!!

In [None]:
import statistics
#filt_df['aggreg_error_rate'] = filt_df.groupby(['Case ID'], as_index = False)['Error Rate'].mean()
#filt_df['aggreg_error_rate'] = filt_df.groupby(['Case ID'], as_index = False).apply(lambda x: statistics.mean(x))
def average(x):
    x/sum(x)
#print(average(df['Expert: Normal Votes']))
#caseid_agg = df.groupby(['Case ID'])['Expert Agreement'].aggregate{}
print(df['Case ID'].unique)
#fig = px.histogram(caseid_agg)
#fig.show()
#print(sum(caseid_agg))
print(df.index[-1] - df.index[1])


In [None]:

filt_df = df[df['Qualified Reads'] > 5]
fig = px.histogram(filt_df,x="Case ID", color="Error Rate")
#fig.update_layout(yaxis_range=[0.45,1.1])
fig.show()

Seems uniform; which cases fall into the category? Are they the harder ones?

Make these into bubble charts to show porportion: https://plotly.com/python/bubble-charts/

### Problem x User Matrix

#### Isolate Problem_id, User_id, accuracy, chosen answer, and correct answer

In [None]:
admin = pd.read_csv('1345_admin_reads.csv') #raw individual read

In [870]:
admin
#results = results.set_index('Case ID')


Unnamed: 0,topic_id,problem_id,user_id,read_id,labeling_state,patch,score,accuracy,contest_id,mission_id,content_id,chosen_answer,origin,origin_created_at,series,series_index,answerChoiceIds,response_submitted_at,problem_appeared_at
0,1345,5888087,55058,132610888,gold_standard,,100,0.78,8011,,3264386,['no'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 21:41:47.756010+00:00,,,[25796611],2022-04-12 09:20:24.912000+00:00,2022-04-12 09:20:24.415999+00:00
1,1345,5888087,248277,107696869,gold_standard,,100,0.76,6437,,3264386,['no'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 21:41:47.756010+00:00,,,[25796611],2021-12-04 15:00:58.596999+00:00,2021-12-04 15:00:56.760999+00:00
2,1345,5888088,19769,60673858,gold_standard,,100,0.78,5183,,3264387,['no'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 21:41:47.824498+00:00,,,[25796612],2021-08-09 07:41:17.754000+00:00,2021-08-09 07:41:16.961000+00:00
3,1345,5888088,237039,99676200,gold_standard,,100,0.80,6151,,3264387,['no'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 21:41:47.824498+00:00,,,[25796612],2021-11-01 12:05:16.948999+00:00,2021-11-01 12:05:16.076000+00:00
4,1345,5888088,280445,141519169,gold_standard,,100,0.76,9076,,3264387,['no'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 21:41:47.824498+00:00,,,[25796612],2022-06-10 12:16:45.950000+00:00,2022-06-10 12:16:45.112000+00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
152055,1345,5918379,19769,60689328,gold_standard,,100,0.82,5183,,3294678,['yes'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 22:00:15.397399+00:00,,,[25796257],2021-08-09 10:17:10.168000+00:00,2021-08-09 10:17:09.165000+00:00
152056,1345,5918379,53129,61096537,gold_standard,,100,0.78,5183,,3294678,['yes'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 22:00:15.397399+00:00,,,[25796257],2021-08-11 18:34:25.122999+00:00,2021-08-11 18:34:23.250999+00:00
152057,1345,5918379,102777,99217212,gold_standard,,100,0.84,6151,,3294678,['yes'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 22:00:15.397399+00:00,,,[25796257],2021-10-31 00:28:19.111000+00:00,2021-10-31 00:28:17.107000+00:00
152058,1345,5918379,137347,114399651,gold_standard,,100,0.76,6810,,3294678,['yes'],https://centaur-customer-uploads.s3.us-east-1....,2021-07-26 22:00:15.397399+00:00,,,[25796257],2022-01-07 16:10:33.714999+00:00,2022-01-07 16:10:32.913000+00:00


In [876]:
PU_df = admin
PU_df = PU_df.set_index('read_id')
PU_df = admin[['problem_id','user_id','accuracy','chosen_answer']].copy()
PU_df = pd.concat([filt_df[['Expert Majority']],PU_df],axis=1)
