# First test

In this file, we do not apply any transformations to the data. We only rescale images and pass them through a pretrained CNN model.
The objective is to determine whether a pretrained model is already capable of finding relevant information in these images, without transforming them, and in 224x224.

## Import Dataset from Kaggle

In [3]:
! pip install -q kaggle

In [4]:
from google.colab import files

In [5]:
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"martonroux","key":"879f3ccc54442ff9fdc69b1b07a25313"}'}

In [6]:
! mkdir ~/.kaggle

In [7]:
! cp kaggle.json ~/.kaggle/

In [9]:
! chmod 600 ~/.kaggle/kaggle.json

In [None]:
! kaggle datasets list

In [None]:
! kaggle datasets download -d paultimothymooney/chest-xray-pneumonia

In [None]:
! unzip chest-xray-pneumonia.zip -d data/

## Load model

In [13]:
from torchvision.models import resnet50, ResNet50_Weights
import torch

In [14]:
pretrained = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)

In [15]:
pretrained.fc = torch.nn.Identity()
pretrained.avgpool = torch.nn.Identity()

In [99]:
class Model(torch.nn.Module):
    def __init__(self, pretrained):
        super(Model, self).__init__()
        self.pretrained = pretrained
        self.linear1 = torch.nn.Linear(224*224*2, 10000)
        self.linear2 = torch.nn.Linear(10000, 1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.pretrained(x)
        for (i, batch) in enumerate(x):
          xtemp = self.linear1(batch)
          xtemp = self.linear2(xtemp)
          xtemp = self.sigmoid(xtemp)
          x[i] = xtemp
        return x

In [100]:
model = Model(pretrained)

In [101]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [102]:
model = model.to(device)

## Load Data

In [36]:
import math
import os
import cv2
import torchvision.transforms as transforms

In [25]:
data_dir = r'./data/chest_xray/test/NORMAL'

assert os.path.exists(data_dir) and os.path.isdir(data_dir), "data dir does not exist"

In [27]:
img_list = os.listdir(data_dir)
img_list = list(map(lambda path: os.path.join(data_dir, path), img_list))
img_list.sort()

## Preprocess

In this test, the only preprocess applied is a resize to match the 224 x 224 x 3 size without breaking the aspect ratio, and a normalization of values.

In [30]:
images = []

for img in img_list:
    img_data = cv2.imread(img)
    shape = img_data.shape

    if shape[0] > shape[1]:
        diff = shape[0] - shape[1]
        padding = int(math.ceil(diff / 2))
        img_data = img_data[padding : shape[0] - padding, :, :]
    else:
        diff = shape[1] - shape[0]
        padding = int(math.ceil(diff / 2))
        img_data = img_data[:, padding : shape[1] - padding, :]
    img_data = cv2.resize(img_data, (224, 224))
    images.append(img_data)

In [35]:
images[0].shape

(224, 224, 3)

Convert to Tensor and normalize values

In [37]:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

transform = transforms.Compose([
    transforms.ToTensor(),
    normalize,
])

In [38]:
dataset = []

for img in images:
  dataset.append(transform(img))

## Training

In [103]:
model.forward(dataset[0].to(device).unsqueeze(0))

tensor([[0.5977, 0.5977, 0.5977,  ..., 0.5977, 0.5977, 0.5977]],
       device='cuda:0', grad_fn=<AsStridedBackward0>)

In [104]:
lr=1e-2
momentum=0.9

optimizer = torch.optim.SGD(model.parameters(), lr, momentum)
criterion = torch.nn.BCELoss()              # Check for data imbalance if weight is needed

In [106]:
! git status

fatal: not a git repository (or any of the parent directories): .git
