# BoneawareAI

Authors: Karthik Subramanian, Charles Green, Sai Anurag Pichika, Saarang Prabhuram


## Setup

### Load Extensions

Before getting started we need to run some standard code to set up our environment. You'll need to execute this code again each time you start the notebook.

First, run this cell to load the [autoreload](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html?highlight=autoreload) extension. This enables us to modify `.py` source files and reintegrate them into the notebook, ensuring a smooth editing and debugging experience.

In [31]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Google Colab Setup
Next we need to run a few commands to set up our environment on Google Colab. If you are running this notebook on a local machine you can skip this section.

Run the following cell to mount your Google Drive. Follow the link, sign in to your Google account (the same account you used to store this notebook!).

In [2]:
from google.colab import drive
drive.mount('/content/drive')

ModuleNotFoundError: No module named 'google.colab'

In [3]:
import os
PROJECT_PATH = 'BoneawareAI'
GOOGLE_DRIVE_PATH = f'/content/drive/MyDrive/{PROJECT_PATH}'
os.chdir(GOOGLE_DRIVE_PATH)
os.getcwd()

FileNotFoundError: [WinError 3] The system cannot find the path specified: '/content/drive/MyDrive/BoneawareAI'

In [4]:
import sys
sys.path.append(GOOGLE_DRIVE_PATH) # this is important for the imports in the .py files to work

In [5]:
!pip install pyyaml==5.4.1
!pip install boto3
!pip install configparser
!pip install torch



### Local Setup OR Google Drive
Run the cell below regardless of whether you are using google drive or local setup.

In [6]:
# if running locally set GOOGLE PATH
import sys
if 'google.colab' in sys.modules:
  print(f'Running in google colab. Our path is `{GOOGLE_DRIVE_PATH}`')
else:
  GOOGLE_DRIVE_PATH = '.'
  print('Running locally.')

Running locally.


### Imports

In [7]:
import sys
sys.path.append('../src')  # Add the 'src' folder to Python's module search path
sys.path.append('../datasets')  # Add the 'datasets' folder to Python's module search path
sys.path.append('../notebooks')  # Add the 'notebooks' folder to Python's module search path

In [32]:
from image_utils import set_seed, MURADataset, get_transforms, load_data, confirm_images_and_labels, count_body_parts, count_positive_negative, count_body_parts_with_augmentations

#### Set Seed

This is so the results can be duplicated, ensure that the seed is set in the image_utils.py file, if you want a random seed, import random and set a random number

In [9]:
set_seed(42)

In [10]:
import torch
import random
import numpy as np

## Data Preprocessing
Get the dataset, perform data augmentation to get finalized MURA dataset

In [16]:
# Downloading MURA dataset and unzipping the file (this one takes time)
from src.data_loader import download_dataset
from src.constants import DATASETS_FOLDER, MURA_DATASET
from src.helpers.utils import unzip_file
download_dataset(MURA_DATASET, DATASETS_FOLDER)
unzip_file(os.path.join(os.getcwd(), DATASETS_FOLDER, MURA_DATASET))

File downloaded successfully to datasets\MURA-v1.1.zip
successfully unzipped the file at path c:\code\BoneawareAI\datasets\MURA-v1.1.zip


In [79]:
#17 minutes to load local
data_dir = "../datasets/MURA-v1.1"
batch_size = 32

# Load training and validation data
train_loader, valid_loader = load_data(data_dir, batch_size=batch_size)

Found 147232 validated image filenames belonging to 2 classes in the training set.
Found 12788 validated image filenames belonging to 2 classes in the validation set.


In [78]:
print("Training Data:")
for images, labels in train_loader:
    print(f"Batch size: {len(images)}, Labels: {labels}")
    break

# Test the validation DataLoader
print("Validation Data:")
for images, labels in valid_loader:
    print(f"Batch size: {len(images)}, Labels: {labels}")
    break

Training Data:
Batch size: 32, Labels: tensor([1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
        0, 0, 0, 0, 1, 1, 0, 1])
Validation Data:
Batch size: 32, Labels: tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1])


In [80]:
# Access the datasets from the DataLoaders
train_dataset = train_loader.dataset
valid_dataset = valid_loader.dataset

# Example: Print the length of the datasets
print(f"Number of samples in the training dataset: {len(train_dataset)}")
print(f"Number of samples in the validation dataset: {len(valid_dataset)}")

Number of samples in the training dataset: 147232
Number of samples in the validation dataset: 12788


In [81]:
#16 minutes to confirm on local, does not need to run as you can always use the dataset to confirm as well
confirm_images_and_labels(train_loader, "train")
confirm_images_and_labels(valid_loader, "valid")

Checking train dataset...
Total train images: 147232
Unique labels in train dataset: [0, 1]

Checking valid dataset...
Total valid images: 12788
Unique labels in valid dataset: [0, 1]



In [82]:
count_body_parts(train_dataset, "train")
count_body_parts(valid_dataset, "valid")

Train dataset body part distribution:
XR_SHOULDER: 8379
XR_HUMERUS: 1272
XR_FINGER: 5106
XR_ELBOW: 4931
XR_WRIST: 9752
XR_FOREARM: 1825
XR_HAND: 5543

Valid dataset body part distribution:
XR_WRIST: 659
XR_FOREARM: 301
XR_HAND: 460
XR_HUMERUS: 288
XR_SHOULDER: 563
XR_ELBOW: 465
XR_FINGER: 461



In [83]:
# Example usage with 3 augmentations,  adjust the augmentations as needed
count_body_parts_with_augmentations(train_dataset, "train", num_augmentations=3)
count_body_parts_with_augmentations(valid_dataset, "valid", num_augmentations=3)

Train dataset body part distribution (with augmentations):
XR_SHOULDER: Original: 8379, Augmented: 33516
XR_HUMERUS: Original: 1272, Augmented: 5088
XR_FINGER: Original: 5106, Augmented: 20424
XR_ELBOW: Original: 4931, Augmented: 19724
XR_WRIST: Original: 9752, Augmented: 39008
XR_FOREARM: Original: 1825, Augmented: 7300
XR_HAND: Original: 5543, Augmented: 22172

Valid dataset body part distribution (with augmentations):
XR_WRIST: Original: 659, Augmented: 2636
XR_FOREARM: Original: 301, Augmented: 1204
XR_HAND: Original: 460, Augmented: 1840
XR_HUMERUS: Original: 288, Augmented: 1152
XR_SHOULDER: Original: 563, Augmented: 2252
XR_ELBOW: Original: 465, Augmented: 1860
XR_FINGER: Original: 461, Augmented: 1844



In [84]:
# Count positive/negative cases in the training dataset (with 3 augmentations)
count_positive_negative(train_dataset, "train", num_augmentations=3)

# Count positive/negative cases in the validation dataset (with 3 augmentations)
count_positive_negative(valid_dataset, "valid", num_augmentations=3)

Train dataset positive/negative distribution (with augmentations):
XR_ELBOW: Positive: 2006 (Augmented: 8024), Negative: 2925 (Augmented: 11700)
XR_FINGER: Positive: 1968 (Augmented: 7872), Negative: 3138 (Augmented: 12552)
XR_FOREARM: Positive: 661 (Augmented: 2644), Negative: 1164 (Augmented: 4656)
XR_HAND: Positive: 1484 (Augmented: 5936), Negative: 4059 (Augmented: 16236)
XR_HUMERUS: Positive: 599 (Augmented: 2396), Negative: 673 (Augmented: 2692)
XR_SHOULDER: Positive: 4168 (Augmented: 16672), Negative: 4211 (Augmented: 16844)
XR_WRIST: Positive: 3987 (Augmented: 15948), Negative: 5765 (Augmented: 23060)

Valid dataset positive/negative distribution (with augmentations):
XR_ELBOW: Positive: 230 (Augmented: 920), Negative: 235 (Augmented: 940)
XR_FINGER: Positive: 247 (Augmented: 988), Negative: 214 (Augmented: 856)
XR_FOREARM: Positive: 151 (Augmented: 604), Negative: 150 (Augmented: 600)
XR_HAND: Positive: 189 (Augmented: 756), Negative: 271 (Augmented: 1084)
XR_HUMERUS: Positive

### Other Datasets

# Model