Dog data
https://data.mendeley.com/datasets/ktx4cj55pn/1
https://www.imaios.com/en/vet-anatomy/dog/dog-osteology

Chest data
https://www.kaggle.com/datasets/nih-chest-xrays/data?select=Data_Entry_2017.csv

Random chest data
https://www.kaggle.com/datasets/nih-chest-xrays/sample?resource=download

###### Importing Libraries

###### Import Libraries: Import the necessary libraries for data handling and visualization.

In [1]:
import pandas as pd
import os

###### Define Paths: Define the paths to your CSV and image folder.

In [21]:
# IMPORTANT: Use the correct file name here
csv_path = '../data/sample_labels.csv'
images_dir = '../data/images' # Assuming images are directly in the data folder. Adjust if they are in a subfolder.

###### Load CSV: Load the sample_labels.csv file into a Pandas DataFrame.

In [22]:
df = pd.read_csv(csv_path)
print(df.head())

        Image Index                                     Finding Labels  \
0  00000013_005.png  Emphysema|Infiltration|Pleural_Thickening|Pneu...   
1  00000013_026.png                             Cardiomegaly|Emphysema   
2  00000017_001.png                                         No Finding   
3  00000030_001.png                                        Atelectasis   
4  00000032_001.png                        Cardiomegaly|Edema|Effusion   

   Follow-up #  Patient ID Patient Age Patient Gender View Position  \
0            5          13        060Y              M            AP   
1           26          13        057Y              M            AP   
2            1          17        077Y              M            AP   
3            1          30        079Y              M            PA   
4            1          32        055Y              F            AP   

   OriginalImageWidth  OriginalImageHeight  OriginalImagePixelSpacing_x  \
0                3056                 2544           

###### Data Cleaning & Binary Labeling:
- Create a new is_abnormal column.
- Use .loc to set the value to 0 for rows where Finding Labels is 'No Finding'.
- Set all other rows to 1.

In [23]:
df['is_abnormal'] = 1
df.loc[df['Finding Labels'] == 'No Finding', 'is_abnormal'] = 0
print(df['is_abnormal'].value_counts())

is_abnormal
0    3044
1    2562
Name: count, dtype: int64


###### Create Two Datasets:
- Images Dataset: Create a DataFrame that links the image filenames to the new is_abnormal label.

In [24]:
image_data = df[['Image Index', 'is_abnormal']].copy()
image_data['Image Index'] = image_data['Image Index'].apply(lambda x: os.path.join(images_dir, x))
print(image_data.head())

                       Image Index  is_abnormal
0  ../data/images/00000013_005.png            1
1  ../data/images/00000013_026.png            1
2  ../data/images/00000017_001.png            0
3  ../data/images/00000030_001.png            1
4  ../data/images/00000032_001.png            1


- Structured Data: Create a separate DataFrame for the structured features.

In [25]:
structured_data = df[['Patient Age', 'Patient Gender', 'is_abnormal']].copy()
print(structured_data.head())

  Patient Age Patient Gender  is_abnormal
0        060Y              M            1
1        057Y              M            1
2        077Y              M            0
3        079Y              M            1
4        055Y              F            1


Vision Model Training (PyTorch)
1. Import Libraries: Add PyTorch and torchvision imports.

In [26]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torchvision import models, transforms
from PIL import Image

2. Create a Custom Dataset Class: This is a crucial step for handling your custom data.

In [27]:
class XrayDataset(Dataset):
    def __init__(self, dataframe, transform=None):
        self.dataframe = dataframe
        self.transform = transform

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, idx):
        img_name = self.dataframe.iloc[idx, 0]
        image = Image.open(img_name).convert('RGB')
        label = self.dataframe.iloc[idx, 1]

        if self.transform:
            image = self.transform(image)

        return image, torch.tensor(label, dtype=torch.float32)

3. Define Transforms: Define the transformations for your images.

In [28]:
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

4. Instantiate Dataset and DataLoader:

In [29]:
dataset = XrayDataset(dataframe=image_data, transform=transform)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

5. Load Pre-trained Model: Load MobileNetV2 and modify the final layer for binary classification.

In [30]:
model = models.mobilenet_v2(weights='DEFAULT')
model.classifier[1] = nn.Linear(model.classifier[1].in_features, 1)

# Check for MPS (M1 GPU) and move the model to it
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model.to(device)
print(f"Using device: {device}")

Using device: mps


Train the Model (Simplified): This is a simplified training loop for a proof-of-concept.

In [31]:
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(1): # Train for 1 epoch for the MVP
    for images, labels in dataloader:
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs.squeeze(), labels)
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

Epoch 1, Loss: 0.8526185154914856
