# First test

In this file, we do not apply any transformations to the data. We only rescale images and pass them through a pretrained CNN model.
The objective is to determine whether a pretrained model is already capable of finding relevant information in these images, without transforming them, and in 224x224.

## Import Dataset from Kaggle

In [2]:
! pip install -q kaggle

In [3]:
from google.colab import files

In [None]:
files.upload()

In [5]:
! mkdir ~/.kaggle

In [6]:
! cp kaggle.json ~/.kaggle/

In [7]:
! chmod 600 ~/.kaggle/kaggle.json

In [9]:
! kaggle datasets download -d paultimothymooney/chest-xray-pneumonia

Downloading chest-xray-pneumonia.zip to /content
100% 2.29G/2.29G [00:21<00:00, 155MB/s]
100% 2.29G/2.29G [00:21<00:00, 117MB/s]


In [None]:
! unzip chest-xray-pneumonia.zip -d data/

## Load model

In [11]:
from torchvision.models import resnet50, ResNet50_Weights
import torch

In [12]:
pretrained = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%|██████████| 97.8M/97.8M [00:01<00:00, 71.7MB/s]


In [13]:
pretrained.fc = torch.nn.Identity()
pretrained.avgpool = torch.nn.Identity()

In [14]:
class Model(torch.nn.Module):
    def __init__(self, pretrained):
        super(Model, self).__init__()
        self.pretrained = pretrained
        self.linear1 = torch.nn.Linear(224*224*2, 10000)
        self.linear2 = torch.nn.Linear(10000, 1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.pretrained(x)
        for (i, batch) in enumerate(x):
          xtemp = self.linear1(batch)
          xtemp = self.linear2(xtemp)
          xtemp = self.sigmoid(xtemp)
          x[i] = xtemp
        return x

In [15]:
model = Model(pretrained)

In [16]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [17]:
model = model.to(device)

## Load Data and Preprocess


In this test, the only preprocessing applied is a basic resize to (224, 224, 3), and a normalization using Pytorch Transforms.  
We do not modify the images. Aspect ratio is taken into account during the resize process.

In [39]:
import math
import os
import cv2
import torchvision.transforms as transforms
from Dataset import Dataset
from data_loading import open_preprocess_photos

In [40]:
normal_dir: str = r'./data/chest_xray/test/NORMAL'
pneumo_dir: str = r'./data/chest_xray/test/PNEUMONIA'

assert os.path.exists(normal_dir) and os.path.isdir(normal_dir), "Normal dir isn't found or isn't a directory"
assert os.path.exists(pneumo_dir) and os.path.isdir(pneumo_dir), "Pneumonia dir isn't found or isn't a directory"

In [41]:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

transform = transforms.Compose([
    transforms.ToTensor(),
    normalize,
])

In [42]:
normal = open_preprocess_photos(normal_dir, transform, (224, 224))
pneumonia = open_preprocess_photos(pneumo_dir, transform, (224, 224))

In [43]:
dataset = Dataset(normal, pneumonia, 0, 1, 128)

In [45]:
len(dataset[0][0])  # Should be as big as the given batch_size, so 128

128

## Training

In [25]:
model.forward(dataset[0].to(device).unsqueeze(0))

tensor([[0.4623, 0.4623, 0.4623,  ..., 0.4623, 0.4623, 0.4623]],
       device='cuda:0', grad_fn=<AsStridedBackward0>)

In [104]:
lr=1e-2
momentum=0.9
batch_size=1024

optimizer = torch.optim.SGD(model.parameters(), lr, momentum)
criterion = torch.nn.BCELoss()              # Check for data imbalance if weight is needed

1