In this project, we will work with the medical mnist datasource, specifically the pneumonia dataset from https://medmnist.com/ . The problem consists of classifying chest x-ray images as having pneumonia or not. Run the below lines of code to install the appropriate dataloaders and visualize the data

In [None]:
!pip install -qqq medmnist

[?25l[K     |███▊                            | 10 kB 22.2 MB/s eta 0:00:01[K     |███████▌                        | 20 kB 25.0 MB/s eta 0:00:01[K     |███████████▏                    | 30 kB 26.2 MB/s eta 0:00:01[K     |███████████████                 | 40 kB 7.7 MB/s eta 0:00:01[K     |██████████████████▊             | 51 kB 6.8 MB/s eta 0:00:01[K     |██████████████████████▍         | 61 kB 8.0 MB/s eta 0:00:01[K     |██████████████████████████▏     | 71 kB 8.0 MB/s eta 0:00:01[K     |██████████████████████████████  | 81 kB 8.8 MB/s eta 0:00:01[K     |████████████████████████████████| 87 kB 3.8 MB/s 
[?25h  Building wheel for fire (setup.py) ... [?25l[?25hdone


In [None]:
from tqdm import tqdm
import numpy as np
import random
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import torch.nn.functional as F
import torchvision.models as models
import torchvision.transforms as transforms

import medmnist
from medmnist import INFO, Evaluator
from numpy.random import RandomState
import numpy as np
import torch
import torch.optim as optim
from torch.utils.data import Subset
import re
from torchvision import datasets, transforms

In [None]:
def train(model, device, train_loader, optimizer, epoch, display=True):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.binary_cross_entropy_with_logits(output, target.float())
        loss.backward()
        optimizer.step()
    if display:
      print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
          epoch, batch_idx * len(data), len(train_loader.dataset),
          100. * batch_idx / len(train_loader), loss.item()))

def test(model, device, test_loader, name="\nVal"):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.binary_cross_entropy_with_logits(output, target.float(), size_average=False).item() # sum up batch loss
            pred = output >= 0.5 
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('{} set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        name, test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    return 100. * correct / len(test_loader.dataset)

*** Challenge 2***

You may use the same testbed but without the constraints on external datasets or models trained on external datasets. See the full project description for the constraints on the external data or models. You may not, however, use any of the PneumoniaMnist training set. 

In [None]:
import torchvision.models as models

Pre-trained AlexNet 

In [None]:
%%time
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
random.seed(0)
np.random.seed(0)

# preprocessing
data_flag = 'pneumoniamnist'
download = True

info = INFO[data_flag]
n_classes = len(info['label'])
DataClass = getattr(medmnist, info['python_class'])

data_transform = transforms.Compose([
      transforms.Resize(224),
      transforms.ToTensor(),
      transforms.Normalize(mean=[.5], std=[.5]),
      transforms.Lambda(lambda x: x.repeat(3, 1, 1) )
      ])

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

# load the data
train_dataset = DataClass(split='train', transform=data_transform, download=download)
val_dataset = DataClass(split='train', transform=data_transform, download=download)

accs_val = []

for seed in  range(0, 50):
  prng = RandomState(seed)
  random_permute = prng.permutation(np.arange(0, 1000))
  train_top = 10//n_classes
  val_top = 1000//n_classes
  indx_train = np.concatenate([np.where(train_dataset.labels == label)[0][random_permute[0:train_top]] for label in range(0, n_classes)])
  indx_val = np.concatenate([np.where(train_dataset.labels == label)[0][random_permute[train_top:train_top + val_top]] for label in range(0, n_classes)])

  train_data = Subset(train_dataset, indx_train)
  val_data = Subset(val_dataset, indx_val)

  print('Num Samples For Training %d Num Samples For Val %d'%(train_data.indices.shape[0],val_data.indices.shape[0]))

  train_loader = torch.utils.data.DataLoader(train_data,
                                             batch_size=32, 
                                             shuffle=True)

  val_loader = torch.utils.data.DataLoader(val_data,
                                             batch_size=128, 
                                             shuffle=False)
  model = models.alexnet(pretrained=True)
  model.classifier = nn.Linear(256 * 6 * 6, 1)
  
  model.to(device) 
  optimizer = torch.optim.Adam(model.classifier.parameters(),lr=1e-3)

  for epoch in range(10):
    train(model, device, train_loader, optimizer, epoch, display=epoch%5==0)
  accs_val.append(test(model, device, val_loader))

accs_val = np.array(accs_val)

print('Val acc over 5 instances on dataset: %s %.2f +- %.2f (var: %.2f)'%(data_flag, accs_val.mean(), accs_val.std(), accs_val.var()))

Using downloaded and verified file: /root/.medmnist/pneumoniamnist.npz
Using downloaded and verified file: /root/.medmnist/pneumoniamnist.npz
Num Samples For Training 10 Num Samples For Val 1000





Val set: Average loss: 0.2790, Accuracy: 879/1000 (87.90%)

Num Samples For Training 10 Num Samples For Val 1000

Val set: Average loss: 0.3121, Accuracy: 842/1000 (84.20%)

Num Samples For Training 10 Num Samples For Val 1000

Val set: Average loss: 0.3460, Accuracy: 843/1000 (84.30%)

Num Samples For Training 10 Num Samples For Val 1000

Val set: Average loss: 0.2755, Accuracy: 897/1000 (89.70%)

Num Samples For Training 10 Num Samples For Val 1000

Val set: Average loss: 0.3265, Accuracy: 842/1000 (84.20%)

Num Samples For Training 10 Num Samples For Val 1000

Val set: Average loss: 0.3921, Accuracy: 780/1000 (78.00%)

Num Samples For Training 10 Num Samples For Val 1000

Val set: Average loss: 0.2922, Accuracy: 892/1000 (89.20%)

Num Samples For Training 10 Num Samples For Val 1000

Val set: Average loss: 0.3110, Accuracy: 845/1000 (84.50%)

Num Samples For Training 10 Num Samples For Val 1000

Val set: Average loss: 0.2884, Accuracy: 877/1000 (87.70%)

Num Samples For Training 10

Pretrained ResNet152

In [None]:
def resNet152():
    resNet152 = torch.hub.load('pytorch/vision:v0.10.0', 'resnet152', pretrained=True)
    resNet152.fc = nn.Linear(in_features=2048, out_features=1, bias=True)
    return resNet152

In [None]:
%%time
from random import randint

torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
random.seed(0)
np.random.seed(0)

# preprocessing
data_flag = 'pneumoniamnist'

download = True

info = INFO[data_flag]
n_classes = len(info['label'])
DataClass = getattr(medmnist, info['python_class'])

data_transform = transforms.Compose([transforms.ToTensor(),
                                     transforms.Normalize(mean=[.5], std=[.5])])

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

# load the data
train_dataset = DataClass(split='train', transform=data_transform, download=download)
val_dataset = DataClass(split='train', transform=data_transform, download=download)

accs_val = []
loss_val = []
seed = randint(0,50)
prng = RandomState(seed)
random_permute = prng.permutation(np.arange(0, 1000))
train_top = 10//n_classes
val_top = 1000//n_classes
indx_train = np.concatenate([np.where(train_dataset.labels == label)[0][random_permute[0:train_top]] for label in range(0, n_classes)])
indx_val = np.concatenate([np.where(train_dataset.labels == label)[0][random_permute[train_top:train_top + val_top]] for label in range(0, n_classes)])

train_data = Subset(train_dataset, indx_train)
val_data = Subset(val_dataset, indx_val)

print('Num Samples For Training %d Num Samples For Val %d'%(train_data.indices.shape[0],val_data.indices.shape[0]))

train_loader = torch.utils.data.DataLoader(train_data,
                                            batch_size=32, 
                                            shuffle=True)

val_loader = torch.utils.data.DataLoader(val_data,
                                            batch_size=128, 
                                            shuffle=False)

model = resNet152()
model.to(device)

optimizer = torch.optim.Adam(model.parameters(),lr=1e-3)

for epoch in range(250):
    l, a = train(model, device, train_loader, optimizer, epoch, display=epoch%5==0)
    loss_val.append(l)
    accs_val.append(a)

print('Val set: %.2f'%(test(model, device, val_loader)), '%')

Using downloaded and verified file: /root/.medmnist/pneumoniamnist.npz
Using downloaded and verified file: /root/.medmnist/pneumoniamnist.npz
Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0







Val set: Average loss: 2.6910, Accuracy: 653/1000 (65.30%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.7002, Accuracy: 838/1000 (83.80%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.6696, Accuracy: 834/1000 (83.40%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 4.9410, Accuracy: 534/1000 (53.40%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.7099, Accuracy: 813/1000 (81.30%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.5691, Accuracy: 614/1000 (61.40%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.2649, Accuracy: 727/1000 (72.70%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.4528, Accuracy: 672/1000 (67.20%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.6792, Accuracy: 880/1000 (88.00%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.8080, Accuracy: 633/1000 (63.30%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.4346, Accuracy: 638/1000 (63.80%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.0281, Accuracy: 775/1000 (77.50%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.0283, Accuracy: 663/1000 (66.30%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.1966, Accuracy: 776/1000 (77.60%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.2007, Accuracy: 691/1000 (69.10%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.1451, Accuracy: 698/1000 (69.80%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 3.5119, Accuracy: 522/1000 (52.20%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 3.7777, Accuracy: 508/1000 (50.80%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.9807, Accuracy: 765/1000 (76.50%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.9140, Accuracy: 863/1000 (86.30%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 6.4807, Accuracy: 504/1000 (50.40%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 5.5398, Accuracy: 523/1000 (52.30%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.4942, Accuracy: 605/1000 (60.50%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.7993, Accuracy: 839/1000 (83.90%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.2321, Accuracy: 686/1000 (68.60%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.5176, Accuracy: 862/1000 (86.20%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.8444, Accuracy: 848/1000 (84.80%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.9207, Accuracy: 832/1000 (83.20%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.4275, Accuracy: 691/1000 (69.10%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.9575, Accuracy: 549/1000 (54.90%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.7306, Accuracy: 770/1000 (77.00%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.7269, Accuracy: 828/1000 (82.80%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 7.4814, Accuracy: 500/1000 (50.00%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 6.9555, Accuracy: 533/1000 (53.30%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.9887, Accuracy: 554/1000 (55.40%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.5382, Accuracy: 873/1000 (87.30%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.7336, Accuracy: 864/1000 (86.40%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.0407, Accuracy: 647/1000 (64.70%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.8066, Accuracy: 696/1000 (69.60%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 3.2783, Accuracy: 520/1000 (52.00%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 3.2119, Accuracy: 477/1000 (47.70%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.7293, Accuracy: 610/1000 (61.00%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.7975, Accuracy: 767/1000 (76.70%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.8771, Accuracy: 542/1000 (54.20%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 9.4009, Accuracy: 500/1000 (50.00%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 1.4976, Accuracy: 739/1000 (73.90%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 3.4124, Accuracy: 536/1000 (53.60%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.8542, Accuracy: 852/1000 (85.20%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 0.7649, Accuracy: 858/1000 (85.80%)

Num Samples For Training 10 Num Samples For Val 1000


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0



Val set: Average loss: 2.3518, Accuracy: 675/1000 (67.50%)

Val acc over 5 instances on dataset: pneumoniamnist 68.75 +- 12.86 (var: 165.46)
CPU times: user 8h 17min 29s, sys: 11min 47s, total: 8h 29min 17s
Wall time: 8h 28min 58s
