## X-Ray Abnormality Detection with CNNs

> Antonopoulos Ilias (p3352004) <br />
> Ndoja Silva (p3352017) <br />
> MSc Data Science AUEB

## Table of Contents

- [Data Loading](#Data-Loading)
- [Exploratory Data Analysis](#Exploratory-Data-Analysis)
- [Hyperparameter Tuning](#Hyperparameter-Tuning)
- [Model Selection](#Model-Selection)
- [Evaluation](#Evaluation)

In [None]:
# TODO: remove
# !pip install scipy
# !pip install git+https://github.com/keras-team/keras-tuner

In [116]:
import gc
import os
import pathlib
import random
from glob import glob

# import keras_tuner as kt
import pandas as pd
import tensorflow as tf


pd.set_option("max_colwidth", None)
# from scipy import optimize  # TODO: remove

In [71]:
SEED = 123456

random.seed(SEED)

In [72]:
print(tf.__version__)

2.8.0


In [73]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices("GPU")))

Num GPUs Available:  1


### Data Loading

In [74]:
def inspect_df(df: pd.DataFrame, n: int = 5) -> pd.DataFrame:
    """Helper method to easily inspect DataFrames."""

    print(f"shape: {df.shape}")

    return df.head(n)

In [75]:
random.choices(glob(os.path.join(DATASET_DIR, "*", "*", "*", "*", "*.png")), k=10)

['data/MURA-v1.1/train/XR_WRIST/patient08677/study1_negative/image1.png',
 'data/MURA-v1.1/train/XR_WRIST/patient08822/study1_negative/image3.png',
 'data/MURA-v1.1/train/XR_ELBOW/patient05309/study1_positive/image2.png',
 'data/MURA-v1.1/train/XR_FINGER/patient04573/study1_negative/image1.png',
 'data/MURA-v1.1/train/XR_ELBOW/patient05276/study1_positive/image2.png',
 'data/MURA-v1.1/train/XR_FOREARM/patient09254/study1_positive/image2.png',
 'data/MURA-v1.1/train/XR_ELBOW/patient03454/study1_negative/image1.png',
 'data/MURA-v1.1/train/XR_HAND/patient07839/study1_negative/image2.png',
 'data/MURA-v1.1/train/XR_ELBOW/patient06081/study1_negative/image1.png',
 'data/MURA-v1.1/train/XR_FINGER/patient03997/study1_negative/image1.png']

In [79]:
DATASET_DIR = "data/MURA-v1.1/"

image_count = len(list(pathlib.Path(DATASET_DIR).glob("*/*/*/*/*.png")))

print(f"Total PNG images found in dir <{DATASET_DIR}>: {image_count}")

Total PNG images found in dir data/MURA-v1.1/: 40009


In [117]:
train_image_paths = pd.read_csv(
    os.path.join(DATASET_DIR, "train_image_paths.csv"),
    names=["image_path"],
    header=None,
    skiprows=[0],
    index_col=False,
)

inspect_df(train_image_paths)

shape: (36807, 1)


Unnamed: 0,image_path
0,MURA-v1.1/train/XR_SHOULDER/patient00001/study1_positive/image2.png
1,MURA-v1.1/train/XR_SHOULDER/patient00001/study1_positive/image3.png
2,MURA-v1.1/train/XR_SHOULDER/patient00002/study1_positive/image1.png
3,MURA-v1.1/train/XR_SHOULDER/patient00002/study1_positive/image2.png
4,MURA-v1.1/train/XR_SHOULDER/patient00002/study1_positive/image3.png


In [121]:
train_image_paths["study_type"] = train_image_paths["image_path"].map(
    lambda x: x.split("/")[2]
)
train_image_paths["patient"] = train_image_paths["image_path"].map(
    lambda x: x.split("/")[3]
)
train_image_paths["study"] = train_image_paths["image_path"].map(
    lambda x: x.split("/")[4]
)

In [123]:
inspect_df(train_image_paths)

shape: (36807, 4)


Unnamed: 0,image_path,study_type,patient,study
0,MURA-v1.1/train/XR_SHOULDER/patient00001/study1_positive/image2.png,XR_SHOULDER,patient00001,study1_positive
1,MURA-v1.1/train/XR_SHOULDER/patient00001/study1_positive/image3.png,XR_SHOULDER,patient00001,study1_positive
2,MURA-v1.1/train/XR_SHOULDER/patient00002/study1_positive/image1.png,XR_SHOULDER,patient00002,study1_positive
3,MURA-v1.1/train/XR_SHOULDER/patient00002/study1_positive/image2.png,XR_SHOULDER,patient00002,study1_positive
4,MURA-v1.1/train/XR_SHOULDER/patient00002/study1_positive/image3.png,XR_SHOULDER,patient00002,study1_positive


In [118]:
train_labeled_studies = pd.read_csv(
    os.path.join(DATASET_DIR, "train_labeled_studies.csv"),
    names=["study_path", "label"],
    header=None,
    skiprows=[0],
    index_col=False,
)

inspect_df(train_labeled_studies)

shape: (13456, 2)


Unnamed: 0,study_path,label
0,MURA-v1.1/train/XR_SHOULDER/patient00002/study1_positive/,1
1,MURA-v1.1/train/XR_SHOULDER/patient00003/study1_positive/,1
2,MURA-v1.1/train/XR_SHOULDER/patient00004/study1_positive/,1
3,MURA-v1.1/train/XR_SHOULDER/patient00005/study1_positive/,1
4,MURA-v1.1/train/XR_SHOULDER/patient00006/study1_positive/,1


In [124]:
train_labeled_studies["study_type"] = train_labeled_studies["study_path"].map(
    lambda x: x.split("/")[2]
)
train_labeled_studies["patient"] = train_labeled_studies["study_path"].map(
    lambda x: x.split("/")[3]
)
train_labeled_studies["study"] = train_labeled_studies["study_path"].map(
    lambda x: x.split("/")[4]
)

In [125]:
inspect_df(train_labeled_studies)

shape: (13456, 5)


Unnamed: 0,study_path,label,study_type,patient,study
0,MURA-v1.1/train/XR_SHOULDER/patient00002/study1_positive/,1,XR_SHOULDER,patient00002,study1_positive
1,MURA-v1.1/train/XR_SHOULDER/patient00003/study1_positive/,1,XR_SHOULDER,patient00003,study1_positive
2,MURA-v1.1/train/XR_SHOULDER/patient00004/study1_positive/,1,XR_SHOULDER,patient00004,study1_positive
3,MURA-v1.1/train/XR_SHOULDER/patient00005/study1_positive/,1,XR_SHOULDER,patient00005,study1_positive
4,MURA-v1.1/train/XR_SHOULDER/patient00006/study1_positive/,1,XR_SHOULDER,patient00006,study1_positive


### Exploratory Data Analysis

Each study contains one or more views (images) and is labeled as either normal or abnormal.

The training dataset consists of `13456` studies with a total of `36807` images.

### Hyperparameter Tuning

### Model Selection

### Evaluation