

## **Deep Learning in Biomedicine**
#### **Final Project:** Classification of Chest X-ray Images and Patient Metadata Using Multi-Modal Model
**Team 4**: Alexander Sternfeld, Silvia Romanato and Antoine Bonnet


Here is a table with the models we will train:
| Model | Vision | Tabular |
| --- | --- | --- |
| 1 | - | FCN | 
| 2 | ResNet50 (CNN) | FCN | 
| 3 | ResNet50 (CNN) | - |
| 4 | DenseNet (CNN) | FCN | 
| 5 | DenseNet (CNN) | - | 
| 6 | Vision Transformer (ViT) | FCN | 
| 7 | Vision Transformer (ViT)| - | 

We use the same fully-connected network (FCN) for all models. 

### **Some ideas for the encoders**

https://github.com/naity/image_tabular 

<img src="../figures/joint_encoders.png" alt="drawing" width="800"/>



In [1]:
from data import *
from utils import *
from cnn import *
from vit import *

## 2. **Visual Encoders**

### 2.1. **ResNet** (CNN)

In [4]:
from transformers import AutoImageProcessor, ResNetForImageClassification
import torch
from datasets import load_dataset

processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = ResNetForImageClassification.from_pretrained("microsoft/resnet-50")

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]
inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


Downloading pytorch_model.bin:   0%|          | 0.00/103M [00:00<?, ?B/s]

In [5]:
model

ResNetForImageClassification(
  (resnet): ResNetModel(
    (embedder): ResNetEmbeddings(
      (embedder): ResNetConvLayer(
        (convolution): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        (normalization): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
      )
      (pooler): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    )
    (encoder): ResNetEncoder(
      (stages): ModuleList(
        (0): ResNetStage(
          (layers): Sequential(
            (0): ResNetBottleNeckLayer(
              (shortcut): ResNetShortCut(
                (convolution): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                (normalization): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              )
              (layer): Sequential(
                (0): ResNetConvLayer(
                  (convolution): Conv2d(64

### 2.2. **DenseNet** (CNN)

https://github.com/liuzhuang13/DenseNet
https://huggingface.co/docs/timm/models/densenet

In [2]:
import timm
model = timm.create_model('densenet121', pretrained=True)
model

Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/densenet121_ra-50efcf5c.pth" to /Users/abonnet/.cache/torch/hub/checkpoints/densenet121_ra-50efcf5c.pth


DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNormAct2d(
      64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
      (drop): Identity()
      (act): ReLU(inplace=True)
    )
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): DenseBlock(
      (denselayer1): DenseLayer(
        (norm1): BatchNormAct2d(
          64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNormAct2d(
          128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
          (drop): Identity()
          (act): ReLU(inplace=True)
        )
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
  

### 2.3. **Vision Transformer** (ViT)

In [None]:
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])