# BoneawareAI

Authors: Karthik Subramanian, Charles Green, Sai Anurag Pichika, Saarang Prabhuram


## Setup

### Load Extensions

Before getting started we need to run some standard code to set up our environment. You'll need to execute this code again each time you start the notebook.

First, run this cell to load the [autoreload](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html?highlight=autoreload) extension. This enables us to modify `.py` source files and reintegrate them into the notebook, ensuring a smooth editing and debugging experience.

In [61]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Google Colab Setup
Next we need to run a few commands to set up our environment on Google Colab. If you are running this notebook on a local machine you can skip this section.

Run the following cell to mount your Google Drive. Follow the link, sign in to your Google account (the same account you used to store this notebook!).

In [62]:
from google.colab import drive
drive.mount('/content/drive')

ModuleNotFoundError: No module named 'google.colab'

In [63]:
import os
PROJECT_PATH = 'BoneawareAI'
GOOGLE_DRIVE_PATH = f'/content/drive/MyDrive/{PROJECT_PATH}'
os.chdir(GOOGLE_DRIVE_PATH)
os.getcwd()

FileNotFoundError: [WinError 3] The system cannot find the path specified: '/content/drive/MyDrive/BoneawareAI'

In [64]:
import sys
sys.path.append(GOOGLE_DRIVE_PATH) # this is important for the imports in the .py files to work

In [65]:
!pip install pyyaml==5.4.1
!pip install boto3
!pip install configparser
!pip install torch



### Local Setup OR Google Drive
Run the cell below regardless of whether you are using google drive or local setup.

In [66]:
# if running locally set GOOGLE PATH
import sys
if 'google.colab' in sys.modules:
  print(f'Running in google colab. Our path is `{GOOGLE_DRIVE_PATH}`')
else:
  GOOGLE_DRIVE_PATH = '.'
  print('Running locally.')

Running locally.


### Imports

In [67]:
import sys
sys.path.append('../src')  # Add the 'src' folder to Python's module search path
sys.path.append('../datasets')  # Add the 'datasets' folder to Python's module search path
sys.path.append('../notebooks')  # Add the 'notebooks' folder to Python's module search path

In [68]:
from image_utils import set_seed, MURADataset, get_transforms, load_data, confirm_images_and_labels, count_body_parts, count_positive_negative, count_body_parts_with_augmentations

In [69]:
import torch

device = 'mps' if torch.backends.mps.is_available() else ('cuda' if torch.cuda.is_available() else 'cpu')
print("Using device = " + device)
if device == 'cpu':
    print("WARNING: Using CPU will cause slower train times")


Using device = cpu


#### Set Seed

This is so the results can be duplicated, ensure that the seed is set in the image_utils.py file, if you want a random seed, import random and set a random number

In [70]:
set_seed(42)

In [71]:
import torch
import random
import numpy as np
import pandas as pd

## Data Preprocessing
Get the dataset, perform data augmentation to get finalized MURA dataset

In [16]:
# Downloading MURA dataset and unzipping the file (this one takes time)
from src.data_loader import download_dataset
from src.constants import DATASETS_FOLDER, MURA_DATASET
from src.helpers.utils import unzip_file
download_dataset(MURA_DATASET, DATASETS_FOLDER)
unzip_file(os.path.join(os.getcwd(), DATASETS_FOLDER, MURA_DATASET))

File downloaded successfully to datasets\MURA-v1.1.zip
successfully unzipped the file at path c:\code\BoneawareAI\datasets\MURA-v1.1.zip


In [72]:
#17 minutes to load local
data_dir = "../datasets/MURA-v1.1"
batch_size = 32

# Load training and validation data
train_loader, valid_loader = load_data(data_dir, batch_size=batch_size)

Loaded 147232 training samples and 3197 validation samples.


In [79]:
print("Training Data:")
for images, labels in train_loader:
    print(f"Batch size: {len(images)}, Labels: {labels}")
    break

# Test the validation DataLoader
print("Validation Data:")
for images, labels in valid_loader:
    print(f"Batch size: {len(images)}, Labels: {labels}")
    break

Training Data:


FileNotFoundError: Image file not found: ..\datasets\MURA-v1.1\train\MURA-v1.1\train\XR_WRIST\patient04318\study1_positive\image2.png

In [26]:
# Access the datasets from the DataLoaders
train_dataset = train_loader.dataset
valid_dataset = valid_loader.dataset

# Example: Print the length of the datasets
print(f"Number of samples in the training dataset: {len(train_dataset)}")
print(f"Number of samples in the validation dataset: {len(valid_dataset)}")

Number of samples in the training dataset: 147232
Number of samples in the validation dataset: 3197


In [27]:
#16 minutes to confirm on local, does not need to run as you can always use the dataset to confirm as well
confirm_images_and_labels(train_loader, "train")
confirm_images_and_labels(valid_loader, "valid")

Checking train dataset...


FileNotFoundError: [Errno 2] No such file or directory: 'C:\\code\\BoneawareAI\\datasets\\MURA-v1.1\\train\\MURA-v1.1\\train\\XR_ELBOW\\patient04970\\study1_positive\\image1.png'

In [29]:
count_body_parts(train_dataset, "train")
count_body_parts(valid_dataset, "valid")

AttributeError: 'MURADataset' object has no attribute 'image_paths'

In [60]:
# Example usage with 3 augmentations,  adjust the augmentations as needed
count_body_parts_with_augmentations(train_dataset, "train", num_augmentations=3)
count_body_parts_with_augmentations(valid_dataset, "valid", num_augmentations=3)

AttributeError: 'MURADataset' object has no attribute 'image_paths'

In [22]:
# Count positive/negative cases in the training dataset (with 3 augmentations)
count_positive_negative(train_dataset, "train", num_augmentations=3)

# Count positive/negative cases in the validation dataset (with 3 augmentations)
count_positive_negative(valid_dataset, "valid", num_augmentations=3)

Train dataset positive/negative distribution (with augmentations):
      BodyPart  Negative  Positive  AugmentedNegative  AugmentedPositive
0     XR_ELBOW      2925      2006              11700               8024
1    XR_FINGER      3138      1968              12552               7872
2   XR_FOREARM      1164       661               4656               2644
3      XR_HAND      4059      1484              16236               5936
4   XR_HUMERUS       673       599               2692               2396
5  XR_SHOULDER      4211      4168              16844              16672
6     XR_WRIST      5765      3987              23060              15948
Valid dataset positive/negative distribution (with augmentations):
      BodyPart  Negative  Positive  AugmentedNegative  AugmentedPositive
0     XR_ELBOW       235       230                940                920
1    XR_FINGER       214       247                856                988
2   XR_FOREARM       150       151                600          

### Other Datasets

# Model