# BoneawareAI

Authors: Karthik Subramanian, Charles Green, Sai Anurag Pichika, Saarang Prabhuram


## Setup

### Load Extensions

Before getting started we need to run some standard code to set up our environment. You'll need to execute this code again each time you start the notebook.

First, run this cell to load the [autoreload](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html?highlight=autoreload) extension. This enables us to modify `.py` source files and reintegrate them into the notebook, ensuring a smooth editing and debugging experience.

In [2]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Google Colab Setup
Next we need to run a few commands to set up our environment on Google Colab. If you are running this notebook on a local machine you can skip this section.

Run the following cell to mount your Google Drive. Follow the link, sign in to your Google account (the same account you used to store this notebook!).

In [3]:
from google.colab import drive
drive.mount('/content/drive')

ModuleNotFoundError: No module named 'google'

In [4]:
import os
PROJECT_PATH = 'BoneawareAI'
GOOGLE_DRIVE_PATH = f'/content/drive/MyDrive/{PROJECT_PATH}'
os.chdir(GOOGLE_DRIVE_PATH)
os.getcwd()

FileNotFoundError: [WinError 3] The system cannot find the path specified: '/content/drive/MyDrive/BoneawareAI'

In [3]:
import sys
sys.path.append(GOOGLE_DRIVE_PATH) # this is important for the imports in the .py files to work

NameError: name 'GOOGLE_DRIVE_PATH' is not defined

In [4]:
%pip install pyyaml==5.4.1
%pip install boto3
%pip install configparser
%pip install torch

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Local Setup OR Google Drive
Run the cell below regardless of whether you are using google drive or local setup.

In [12]:
# if running locally set GOOGLE PATH
import sys
isLocal = False
if 'google.colab' in sys.modules:
  print(f'Running in google colab. Our path is `{GOOGLE_DRIVE_PATH}`')
else:
  GOOGLE_DRIVE_PATH = '.'
  print('Running locally.')
  isLocal = True

Running locally.


### Imports

In [13]:
# RUN LOCALLY
import sys
if isLocal:
    sys.path.append('../src')  # Add the 'src' folder to Python's module search path
    sys.path.append('../datasets')  # Add the 'datasets' folder to Python's module search path
    sys.path.append('../notebooks')  # Add the 'notebooks' folder to Python's module search path
    print('Modules added correctly, locally.')
else:
    sys.path.append('src')  # Add the 'src' folder to Python's module search path
    sys.path.append('datasets')  # Add the 'datasets' folder to Python's module search path
    sys.path.append('notebooks')  # Add the 'notebooks' folder to Python's module search path
    print('Modules added correctly on colab.')

Modules added correctly, locally.


In [14]:
from image_utils import set_seed, MURADataset, load_data, confirm_images_and_labels, count_body_parts, count_positive_negative, count_body_parts_with_augmentations

In [15]:
import torch

device = 'mps' if torch.backends.mps.is_available() else ('cuda' if torch.cuda.is_available() else 'cpu')
print("Using device = " + device)
if device == 'cpu':
    print("WARNING: Using CPU will cause slower train times")


Using device = cpu


#### Set Seed

This is so the results can be duplicated, ensure that the seed is set in the image_utils.py file, if you want a random seed, import random and set a random number

In [16]:
set_seed(42)

In [17]:
import os
import torch
import random
import numpy as np
import pandas as pd

## Data Preprocessing
Get the dataset, perform data augmentation to get finalized MURA dataset

In [None]:
# Downloading MURA dataset and unzipping the file (this one takes time)
from data_loader import download_dataset
from constants import DATASETS_FOLDER, MURA_DATASET
from helpers.utils import unzip_file
# Define the parent directory and dataset path
parent_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))  # Go to the parent directory
datasets_folder = os.path.join(parent_dir, DATASETS_FOLDER)   # Define datasets folder in the parent directory
dataset_path = os.path.join(datasets_folder, MURA_DATASET)    # Full path to the dataset file

# Ensure the datasets folder exists
os.makedirs(datasets_folder, exist_ok=True)

# Check if the dataset is already downloaded
if not os.path.exists(dataset_path):
    print(f"{MURA_DATASET} not found in {DATASETS_FOLDER}. Downloading and extracting...")
    # Download and unzip the dataset
    download_dataset(MURA_DATASET, datasets_folder)
    unzip_file(dataset_path)
else:
    print(f"{MURA_DATASET} already exists in {DATASETS_FOLDER}. Skipping download.")

c:\Users\saara\Desktop\Masters\CS 7643 Deep Learning\Project\BoneawareAI
MURA-v1.1.zip already exists in datasets. Skipping download.


In [33]:

data_dir = "../datasets/MURA-v1.1"
batch_size = 32

# Load training and validation data
train_loader, valid_loader = load_data(data_dir, batch_size=batch_size)

Loaded 147232 training samples and 3197 validation samples.


In [34]:
print("Training Data:")
for images, labels in train_loader:
    print(f"Batch size: {len(images)}, Labels: {labels}")
    break

# Test the validation DataLoader
print("Validation Data:")
for images, labels in valid_loader:
    print(f"Batch size: {len(images)}, Labels: {labels}")
    break

Training Data:
Batch size: 32, Labels: tensor([0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 1, 0])
Validation Data:
Batch size: 32, Labels: tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1])


In [35]:
# Access the datasets from the DataLoaders
train_dataset = train_loader.dataset
valid_dataset = valid_loader.dataset

# Example: Print the length of the datasets
print(f"Number of samples in the training dataset: {len(train_dataset)}")
print(f"Number of samples in the validation dataset: {len(valid_dataset)}")

Number of samples in the training dataset: 147232
Number of samples in the validation dataset: 3197


In [None]:
#16 minutes to confirm on local, does not need to run as you can always use the dataset to confirm as well
#confirm_images_and_labels(train_dataset, "train")
#confirm_images_and_labels(valid_dataset, "valid")

In [30]:
count_body_parts(train_dataset, "train")
count_body_parts(valid_dataset, "valid")

Train dataset body part distribution:


Unnamed: 0,BodyPart,Count
0,XR_WRIST,9752
1,XR_SHOULDER,8379
2,XR_HAND,5543
3,XR_FINGER,5106
4,XR_ELBOW,4931
5,XR_FOREARM,1825
6,XR_HUMERUS,1272


Valid dataset body part distribution:


Unnamed: 0,BodyPart,Count
0,XR_WRIST,659
1,XR_SHOULDER,563
2,XR_ELBOW,465
3,XR_FINGER,461
4,XR_HAND,460
5,XR_FOREARM,301
6,XR_HUMERUS,288


In [31]:
# Example usage with 3 augmentations,  adjust the augmentations as needed
count_body_parts_with_augmentations(train_dataset, "train", num_augmentations=3)
count_body_parts_with_augmentations(valid_dataset, "valid", num_augmentations=3)

Train dataset body part distribution (with augmentations):


Unnamed: 0,BodyPart,OriginalCount,AugmentedCount
0,XR_WRIST,9752,39008
1,XR_SHOULDER,8379,33516
2,XR_HAND,5543,22172
3,XR_FINGER,5106,20424
4,XR_ELBOW,4931,19724
5,XR_FOREARM,1825,7300
6,XR_HUMERUS,1272,5088


Valid dataset body part distribution (with augmentations):


Unnamed: 0,BodyPart,OriginalCount,AugmentedCount
0,XR_WRIST,659,2636
1,XR_SHOULDER,563,2252
2,XR_ELBOW,465,1860
3,XR_FINGER,461,1844
4,XR_HAND,460,1840
5,XR_FOREARM,301,1204
6,XR_HUMERUS,288,1152


In [26]:
# Count positive/negative cases in the training dataset (with 3 augmentations)
count_positive_negative(train_dataset, "train", num_augmentations=3)

# Count positive/negative cases in the validation dataset (with 3 augmentations)
count_positive_negative(valid_dataset, "valid", num_augmentations=3)

Train dataset positive/negative distribution (with augmentations):


Unnamed: 0,BodyPart,Negative,Positive,AugmentedNegative,AugmentedPositive
0,XR_ELBOW,2925,2006,11700,8024
1,XR_FINGER,3138,1968,12552,7872
2,XR_FOREARM,1164,661,4656,2644
3,XR_HAND,4059,1484,16236,5936
4,XR_HUMERUS,673,599,2692,2396
5,XR_SHOULDER,4211,4168,16844,16672
6,XR_WRIST,5765,3987,23060,15948


Valid dataset positive/negative distribution (with augmentations):


Unnamed: 0,BodyPart,Negative,Positive,AugmentedNegative,AugmentedPositive
0,XR_ELBOW,235,230,940,920
1,XR_FINGER,214,247,856,988
2,XR_FOREARM,150,151,600,604
3,XR_HAND,271,189,1084,756
4,XR_HUMERUS,148,140,592,560
5,XR_SHOULDER,285,278,1140,1112
6,XR_WRIST,364,295,1456,1180


### Other Datasets

: 

# Model