In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics import cohen_kappa_score

In [41]:
interp_df = pd.read_csv("valid_interp.csv", usecols=[1,2])

In [42]:
interp_df.head()

Unnamed: 0,FilePath,Label
0,MURA-v1.1/valid/XR_WRIST/patient11185/study1_p...,1
1,MURA-v1.1/valid/XR_WRIST/patient11185/study1_p...,1
2,MURA-v1.1/valid/XR_WRIST/patient11185/study1_p...,1
3,MURA-v1.1/valid/XR_WRIST/patient11185/study1_p...,1
4,MURA-v1.1/valid/XR_WRIST/patient11186/study1_p...,1


In [43]:
interp_df.FilePath.str.split('/', expand=True).head()

Unnamed: 0,0,1,2,3,4,5
0,MURA-v1.1,valid,XR_WRIST,patient11185,study1_positive,image1.png
1,MURA-v1.1,valid,XR_WRIST,patient11185,study1_positive,image2.png
2,MURA-v1.1,valid,XR_WRIST,patient11185,study1_positive,image3.png
3,MURA-v1.1,valid,XR_WRIST,patient11185,study1_positive,image4.png
4,MURA-v1.1,valid,XR_WRIST,patient11186,study1_positive,image1.png


In [44]:
interp_df[['Location', 'Patient', 'Study']] = interp_df.FilePath.str.split('/', expand=True)[[2, 3, 4]]

In [45]:
interp_df.head()

Unnamed: 0,FilePath,Label,Location,Patient,Study
0,MURA-v1.1/valid/XR_WRIST/patient11185/study1_p...,1,XR_WRIST,patient11185,study1_positive
1,MURA-v1.1/valid/XR_WRIST/patient11185/study1_p...,1,XR_WRIST,patient11185,study1_positive
2,MURA-v1.1/valid/XR_WRIST/patient11185/study1_p...,1,XR_WRIST,patient11185,study1_positive
3,MURA-v1.1/valid/XR_WRIST/patient11185/study1_p...,1,XR_WRIST,patient11185,study1_positive
4,MURA-v1.1/valid/XR_WRIST/patient11186/study1_p...,1,XR_WRIST,patient11186,study1_positive


In [46]:
interp_df.drop(columns='FilePath', inplace=True)

In [47]:
interp_df.head()

Unnamed: 0,Label,Location,Patient,Study
0,1,XR_WRIST,patient11185,study1_positive
1,1,XR_WRIST,patient11185,study1_positive
2,1,XR_WRIST,patient11185,study1_positive
3,1,XR_WRIST,patient11185,study1_positive
4,1,XR_WRIST,patient11186,study1_positive


## Poor Fellows who broke multiple bones

Last time when I tried to do this, I only kept the 'Patient" and 'Study" column and dropped 'Location' Column as well. It turned out that there might be patients who have done X-Rays on multiple locations. Thus, grouping could have inappropriately grouped a patient's study from multiple locations into one.

To verify our hypothesis, let's check if there are indeed patients who have taken studies in multiple locations.

In [48]:
poor_df = interp_df.drop(columns=['Label', 'Study'])

In [49]:
len(poor_df)

3197

So there are in total 3197 studies. Since now we are only keeping the "Location" and the "Patient" column, there will be a lot of duplicates. For example, if patient N has taken 3 images in the wrist, then there will be three identical rows. See below:

In [50]:
poor_df[poor_df.Patient == 'patient11185']

Unnamed: 0,Location,Patient
0,XR_WRIST,patient11185
1,XR_WRIST,patient11185
2,XR_WRIST,patient11185
3,XR_WRIST,patient11185


Our goal here is to keep only unique location-patient pair so that we can accurately count how many locations each patient has taken X-Ray on. To do so, we first need to drop all the duplicates.

In [51]:
poor_df.drop_duplicates(inplace=True)
poor_df.head()

Unnamed: 0,Location,Patient
0,XR_WRIST,patient11185
4,XR_WRIST,patient11186
12,XR_WRIST,patient11187
13,XR_WRIST,patient11188
17,XR_WRIST,patient11189


The next step is technically unnecessary but I did it out of habit. If you observe the data frame above carefully enough, you will find that the index jumps. This is because every row retains its original index; since some rows are dropped, there will be gap in index between rows. All that the next step does is to re-index every row from 0 to the end.

In [52]:
poor_df.reset_index(drop=True, inplace=True)
poor_df.head()

Unnamed: 0,Location,Patient
0,XR_WRIST,patient11185
1,XR_WRIST,patient11186
2,XR_WRIST,patient11187
3,XR_WRIST,patient11188
4,XR_WRIST,patient11189


In [53]:
len(poor_df)

1118

Now we are going to group the data frame by patient and count how many rows each patient has, which here will be the number of locations each patient have taken X-Ray on.

In [54]:
location_count_df = poor_df.groupby(by=['Patient']).count().reset_index()

In [55]:
location_count_df.head()

Unnamed: 0,Patient,Location
0,patient11185,1
1,patient11186,4
2,patient11187,2
3,patient11188,3
4,patient11189,2


It is immediately clear that many patients have taken images in multiple locations. That would explain why last time when we omitted location and did the grouping, the final data frame has less rows than what is expected. After all, some rows are possibly dropped as duplicates and some other from different locations grouped together as one single study.

We will then keep only patients who have X-Rays taken in multiple locations.

In [56]:
poor_guys = location_count_df[location_count_df.Location > 1]
poor_guys.head()

Unnamed: 0,Patient,Location
1,patient11186,4
2,patient11187,2
3,patient11188,3
4,patient11189,2
5,patient11190,2


In [57]:
len(poor_guys)

248

Well, seems that we got 248 poor guys who have multiple bones broken or with other signs of abnormalities.

It is tempting to see how the poorest guy fare.

In [58]:
np.max(poor_guys.Location)

5

Damn. This guy has 5 upper extremities checked! How many are there?

In [59]:
locations = np.unique(interp_df.Location)
locations

array(['XR_ELBOW', 'XR_FINGER', 'XR_FOREARM', 'XR_HAND', 'XR_HUMERUS',
       'XR_SHOULDER', 'XR_WRIST'], dtype=object)

In [60]:
len(locations)

7

So this guy had 5 out of 7 of them checked.

I really feel a sense of pity. What happened to this poor fellow? To look deeper, we need to first find which patient got the highest "mark".

In [61]:
poor_guys.Location.idxmax()

50

In [62]:
poor_guys.Patient[50]

'patient11235'

So it is patient 11235. Let's take a look at his images and see what have happened to him.

**TODO**: Download the dataset and display the images of the patients below.

## Correct Inference

In [63]:
interp_df.head()

Unnamed: 0,Label,Location,Patient,Study
0,1,XR_WRIST,patient11185,study1_positive
1,1,XR_WRIST,patient11185,study1_positive
2,1,XR_WRIST,patient11185,study1_positive
3,1,XR_WRIST,patient11185,study1_positive
4,1,XR_WRIST,patient11186,study1_positive


Now for each (patient, location, study) tuple, there will be multiple rows because each study contains multiple images. What we are going to do is that for each study, we are going to take the maximum of all the individual image labels. Essentially, what it does is that, if any of the image in a study is positive, then the entire study is positive. This makes sense as if the bone looks abnormal from any perspective, then the bone should be deemed as abnormal.

Here is how we do it.

In [64]:
foo = interp_df.groupby(['Location', 'Patient', 'Study'], as_index=False)

In [65]:
result_df = interp_df.groupby(['Location', 'Patient', 'Study'], as_index=False).max()

In [66]:
result_df.head()

Unnamed: 0,Location,Patient,Study,Label
0,XR_ELBOW,patient11186,study1_positive,1
1,XR_ELBOW,patient11189,study1_positive,1
2,XR_ELBOW,patient11204,study1_negative,0
3,XR_ELBOW,patient11205,study1_negative,0
4,XR_ELBOW,patient11217,study1_negative,0


Note that at first glance, it might look about the same as `interp_df`. If you pay closer attention, you will see the difference. Here are a few for your convenience.

In [67]:
len(interp_df), len(result_df)

(3197, 1199)

In [68]:
interp_df[interp_df.Patient == 'patient11185']

Unnamed: 0,Label,Location,Patient,Study
0,1,XR_WRIST,patient11185,study1_positive
1,1,XR_WRIST,patient11185,study1_positive
2,1,XR_WRIST,patient11185,study1_positive
3,1,XR_WRIST,patient11185,study1_positive


In [69]:
result_df[result_df.Patient == 'patient11185']

Unnamed: 0,Location,Patient,Study,Label
962,XR_WRIST,patient11185,study1_positive,1


We can now check if the `result_df` has the expected number of rows.

In [70]:
len(result_df)

1199

In [71]:
valid_label_df = pd.read_csv("./valid_labeled_studies.csv", header=None, names=['FilePath', 'Label'])

In [72]:
valid_label_df[['Location', 'Patient', 'Study']] = valid_label_df.FilePath.str.split('/', expand=True)[[2, 3, 4]]

In [73]:
valid_label_df.drop(columns="FilePath", inplace=True)

In [74]:
valid_label_df.head()

Unnamed: 0,Label,Location,Patient,Study
0,1,XR_WRIST,patient11185,study1_positive
1,1,XR_WRIST,patient11186,study1_positive
2,1,XR_WRIST,patient11186,study2_positive
3,1,XR_WRIST,patient11186,study3_positive
4,1,XR_WRIST,patient11187,study1_positive


In [75]:
len(valid_label_df)

1199

Good. At least we got the same number of predictions as expected.

In [76]:
valid_label_df.head()

Unnamed: 0,Label,Location,Patient,Study
0,1,XR_WRIST,patient11185,study1_positive
1,1,XR_WRIST,patient11186,study1_positive
2,1,XR_WRIST,patient11186,study2_positive
3,1,XR_WRIST,patient11186,study3_positive
4,1,XR_WRIST,patient11187,study1_positive


In [77]:
result_df.head()

Unnamed: 0,Location,Patient,Study,Label
0,XR_ELBOW,patient11186,study1_positive,1
1,XR_ELBOW,patient11189,study1_positive,1
2,XR_ELBOW,patient11204,study1_negative,0
3,XR_ELBOW,patient11205,study1_negative,0
4,XR_ELBOW,patient11217,study1_negative,0


So how are we going to "merge" these two data frames together?

To tell the label columns of the two data frames apart, we'd better first rename them differently.

In [78]:
valid_label_df.rename(columns={'Label': 'Target'}, inplace=True)
valid_label_df.head()

Unnamed: 0,Target,Location,Patient,Study
0,1,XR_WRIST,patient11185,study1_positive
1,1,XR_WRIST,patient11186,study1_positive
2,1,XR_WRIST,patient11186,study2_positive
3,1,XR_WRIST,patient11186,study3_positive
4,1,XR_WRIST,patient11187,study1_positive


In [79]:
result_df.rename(columns={'Label': 'Prediction'}, inplace=True)
result_df.head()

Unnamed: 0,Location,Patient,Study,Prediction
0,XR_ELBOW,patient11186,study1_positive,1
1,XR_ELBOW,patient11189,study1_positive,1
2,XR_ELBOW,patient11204,study1_negative,0
3,XR_ELBOW,patient11205,study1_negative,0
4,XR_ELBOW,patient11217,study1_negative,0


In [80]:
len(valid_label_df), len(result_df)

(1199, 1199)

In [81]:
result_df.columns.tolist()[:-1]

['Location', 'Patient', 'Study']

In [82]:
final_df = result_df.merge(valid_label_df, on=result_df.columns.tolist()[:-1])
final_df.head()

Unnamed: 0,Location,Patient,Study,Prediction,Target
0,XR_ELBOW,patient11186,study1_positive,1,1
1,XR_ELBOW,patient11189,study1_positive,1,1
2,XR_ELBOW,patient11204,study1_negative,0,0
3,XR_ELBOW,patient11205,study1_negative,0,0
4,XR_ELBOW,patient11217,study1_negative,0,0


Well, finally, we got what we need to start calculating the kappa score of our prediction.

In [83]:
cohen_kappa_score(final_df.Prediction.values, final_df.Target.values)

0.573271793457848

The original kappa score is around 0.53. Well, it is not as exciting an improvement as I have wished, but at least it tells me that inference-stage hacking is not enough to save our model. We really need to do some more serious preprocessing of the data or even coming up with a customized CNN architecture to compete with other researchers' models. Yes, fastai will not help you no-brainer everything.

To be continued.