# First test

In this file, we do not apply any transformations to the data. We only rescale images and pass them through a pretrained CNN model.
The objective is to determine whether a pretrained model is already capable of finding relevant information in these images, without transforming them, and in 224x224.

## Import Dataset from Kaggle

In [1]:
! pip install -q kaggle

In [2]:
from google.colab import files

In [None]:
files.upload()

In [5]:
! mkdir ~/.kaggle

In [6]:
! cp kaggle.json ~/.kaggle/

In [7]:
! chmod 600 ~/.kaggle/kaggle.json

In [8]:
! kaggle datasets download -d paultimothymooney/chest-xray-pneumonia

Downloading chest-xray-pneumonia.zip to /content
100% 2.29G/2.29G [01:45<00:00, 21.7MB/s]
100% 2.29G/2.29G [01:45<00:00, 23.3MB/s]


In [None]:
! unzip chest-xray-pneumonia.zip -d data/

## Load model

In [1]:
from torchvision.models import resnet50, ResNet50_Weights
import torch

In [2]:
pretrained = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)

In [3]:
pretrained.fc = torch.nn.Identity()
pretrained.avgpool = torch.nn.Identity()

In [4]:
class Model(torch.nn.Module):
    def __init__(self, pretrained):
        super(Model, self).__init__()
        self.pretrained = pretrained
        self.linear1 = torch.nn.Linear(224*224*2, 10000)
        self.linear2 = torch.nn.Linear(10000, 1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.pretrained(x)
        x = self.linear1(x)
        x = self.linear2(x)
        x = self.sigmoid(x)
        #for (i, batch) in enumerate(x):
        #  xtemp = self.linear1(batch)
        #  xtemp = self.linear2(xtemp)
        #  xtemp = self.sigmoid(xtemp)
        #  x[i] = xtemp
        return x

In [5]:
model = Model(pretrained)

In [6]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [7]:
model = model.to(device)

## Load Data and Preprocess


In this test, the only preprocessing applied is a basic resize to (224, 224, 3), and a normalization using Pytorch Transforms.  
We do not modify the images. Aspect ratio is taken into account during the resize process.

In [8]:
import math
import os
import cv2
import torchvision.transforms as transforms
from Dataset import Dataset
from data_loading import open_preprocess_photos

In [9]:
normal_dir: str = r'./data/chest_xray/test/NORMAL'
pneumo_dir: str = r'./data/chest_xray/test/PNEUMONIA'

assert os.path.exists(normal_dir) and os.path.isdir(normal_dir), "Normal dir isn't found or isn't a directory"
assert os.path.exists(pneumo_dir) and os.path.isdir(pneumo_dir), "Pneumonia dir isn't found or isn't a directory"

In [10]:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

transform = transforms.Compose([
    transforms.ToTensor(),
    normalize,
])

In [11]:
normal = open_preprocess_photos(normal_dir, transform, (224, 224))
pneumonia = open_preprocess_photos(pneumo_dir, transform, (224, 224))

In [12]:
dataset = Dataset(normal, pneumonia, 0, 1, 128)

In [13]:
dataset[0][0].shape  # Should be as big as the given batch_size, so 128

torch.Size([128, 3, 224, 224])

## Training

In [16]:
data = dataset[0][0].to(device)
model.forward(data).shape

OutOfMemoryError: CUDA out of memory. Tried to allocate 392.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 345.06 MiB is free. Process 64691 has 14.41 GiB memory in use. Of the allocated memory 14.16 GiB is allocated by PyTorch, and 116.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

In [15]:
del(data)

In [None]:
lr=1e-2
momentum=0.9
batch_size=1024

optimizer = torch.optim.SGD(model.parameters(), lr, momentum)
criterion = torch.nn.BCELoss()              # Check for data imbalance if weight is needed

1