# Resnet


Original paper: [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)

![resnet](https://miro.medium.com/v2/resize:fit:1400/1*tFlMRm_wjBDrgOQMhEM0cQ.png)

The ResNet50 architecture is a variant of the ResNet (Residual Network) architecture, which is known for its ability to train very deep neural networks effectively. ResNet50 specifically has 50 layers (hence the name) and is widely used for tasks like image classification. Here's a detailed rundown of its architecture:

1. **Input Layer**: This layer takes as input the image data, typically represented as a three-dimensional array of pixel values (height, width, channels).

2. **Initial Convolutional Layer**: The input image passes through an initial convolutional layer with 64 filters (also known as kernels or feature detectors) of size 7x7. This layer is followed by batch normalization and ReLU activation.

3. **Pooling Layer**: After the initial convolution, a max-pooling layer with a 3x3 filter and a stride of 2 is applied to downsample the spatial dimensions of the feature maps.

4. **Residual Blocks**: ResNet50 consists of several residual blocks, each containing multiple convolutional layers. The key idea behind a residual block is the introduction of skip connections (also called shortcut connections) that bypass one or more convolutional layers. This allows for easier training of deep networks by mitigating the vanishing gradient problem. ResNet50 includes several of these blocks stacked on top of each other.

5. **Global Average Pooling**: Towards the end of the network, after several residual blocks, global average pooling is applied to reduce the spatial dimensions of the feature maps to 1x1. This operation calculates the average value of each feature map, resulting in a fixed-size vector regardless of the input image size.

6. **Fully Connected Layer**: Following global average pooling, a fully connected layer is used for classification. In ResNet50, this layer has 1000 units corresponding to the 1000 ImageNet classes.

7. **Softmax Activation**: The output of the fully connected layer is passed through a softmax activation function, which converts the raw scores into probabilities, indicating the likelihood of each class.

8. **Output Layer**: The final output layer presents the predicted probabilities for each class in the classification task.

Overall, ResNet50's architecture is characterized by its deep structure, residual blocks, and skip connections, which allow it to effectively learn features from images and achieve state-of-the-art performance in various computer vision tasks.

## [TODO do the 1.5 archi](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/resnet_50_v1_5_for_pytorch)

## Data

[Dataset Card](https://huggingface.co/datasets/bastienp/visible-watermark-pita)

In this notebook we will use the pita-watermark dataset which conatins images of the coco dataset that has watermarks added on top of them. 

The goal is to:
- Detect watermarks 
- Remove watermarks 


#### TOOD: specify goals

In [1]:
import torch

# Import torch dataset and create custom dataset that loads the data
from torch.utils.data import Dataset
import os
from PIL import Image
import pandas as pd

class WatermarkDataset(Dataset):
    def __init__(self, name="visible_watermark_pita", split="train", transform=None):
        self.name = name
        self.split = split
        self.transform = transform
        self.data = pd.read_csv(f"{name}/{split}.csv")

    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        img_name = os.path.join(self.name, self.split , str(self.data.iloc[idx]["image_id"]) + ".png")
        image = Image.open(img_name)
        label = self.data.iloc[idx]
        if self.transform:
            image = self.transform(image)
        
        return image, label["category_id"]
    
    def __repr__(self) -> str:
        return f"{self.name} - {self.split}: {len(self.data)}\n {self.data.head()}"

In [None]:
dataset = WatermarkDataset()
dataset

In [None]:
dataset[0]

## Model

In [None]:
import torch
import torch.nn as nn

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        self.conv3 = nn.Conv2d(out_channels, out_channels * 4, kernel_size=1, stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * 4)
        
        self.downsample = nn.Sequential()
        if stride != 1 or in_channels != out_channels * 4:
            self.downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels * 4, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * 4)
            )
        
    def forward(self, x):
        identity = x
        
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        
        out = self.conv3(out)
        out = self.bn3(out)
        
        identity = self.downsample(identity)
        
        out += identity
        out = self.relu(out)
        
        return out



In [None]:
class Resnet50(torch.nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.conv1 = torch.nn.Conv2d(3, 64, 7, 2, 3, bias=False)
        self.bn1 = torch.nn.BatchNorm2d(64)
        self.relu = torch.nn.ReLU()
        self.maxpool = torch.nn.MaxPool2d(2, 2)
    
        self.residuals = torch.nn.Sequential(
            # 3x3 conv, 64 filters, stride 1, times 3
            ResidualBlock(in_channels=64, out_channels=64),
            ResidualBlock(in_channels=256, out_channels=64),
            ResidualBlock(in_channels=256, out_channels=64, stride=2),

            # 3x3 conv, 128 filters, stride 2, times 4
            ResidualBlock(in_channels=256, out_channels=128),
            ResidualBlock(in_channels=512, out_channels=128),
            ResidualBlock(in_channels=512, out_channels=128),
            ResidualBlock(in_channels=512, out_channels=256),

            # 3x3 conv, 256 filters, stride 2, times 6
            ResidualBlock(in_channels=1024, out_channels=256, stride=2),
            ResidualBlock(in_channels=1024, out_channels=256),
            ResidualBlock(in_channels=1024, out_channels=256),
            ResidualBlock(in_channels=1024, out_channels=256),
            ResidualBlock(in_channels=1024, out_channels=256),
            ResidualBlock(in_channels=1024, out_channels=512),

            # 3x3 conv, 512 filters, stride 2, times 3
            ResidualBlock(in_channels=2048, out_channels=512, stride=2),
            ResidualBlock(in_channels=2048, out_channels=512),
            ResidualBlock(in_channels=2048, out_channels=512),
        )


        self.avgpool = torch.nn.AdaptiveAvgPool2d((1, 1))
        self.fc = torch.nn.Linear(2048, num_classes) # TODO: Change fc size to same as paper

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.residuals(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1) # TODO: Change flqtten to conv layer
        x = self.fc(x)
        return x

model = Resnet50(2)

In [None]:
# Import summary from torchsummary and print the model summary
from torchsummary import summary

summary(model, (3, 224, 224), device='cpu')

In [None]:
from torchsummary import summary

# Import a torch resnt50 and print its summary
import torchvision.models as models
resnet50 = models.resnet50(pretrained=False)
summary(resnet50, (3, 224, 224), device='cpu')

In [None]:
# Load imqge
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import requests

url = "https://raw.githubusercontent.com/pytorch/hub/master/images/dog.jpg"
im = Image.open(requests.get(url, stream=True).raw)
# im

In [None]:
# Convert to tensor
from torchvision import transforms

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

im = preprocess(im)
im = im.unsqueeze(0)

# Predict
model.eval()
with torch.no_grad():
    prediction = model(im)
    prediction = torch.nn.functional.softmax(prediction[0], dim=0)
    print(prediction)


## Train to classify watermarks

In [None]:
# Create a custom training loop
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import transforms
from tqdm import tqdm

BATCH_SIZE = 48

# Preprocess the images
preprocess = transforms.Compose([
    transforms.Resize(256),
    # transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
# Load the dataset
train_dataset = WatermarkDataset(split="train", transform=preprocess)
val_dataset = WatermarkDataset(split="val", transform=preprocess)
test_dataset = WatermarkDataset(split="test", transform=preprocess)

# Create a DataLoader
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4, pin_memory=True)

# Create the model
model = Resnet50(2)

# Loss and optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training step
def train_step(model, input, target, criterion, optimizer):
    optimizer.zero_grad()
    output = model(input)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    return loss

# Validation step
def val_step(model, input, target, criterion):
    output = model(input)
    loss = criterion(output, target)
    return loss

In [None]:
import torch.nn.functional as F

device = torch.device( "cuda" if torch.cuda.is_available() else "cpu" )
model.to(device)

def train(num_epochs, train_loader, val_loader, model, criterion, optimizer, device):
    losses, val_losses = [], []
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for i, data in tqdm(enumerate(train_loader), total=len(train_loader)):
            inputs, targets = data
            inputs = inputs.float().view(-1, 3, 256, 256).to(device)
            targets = F.one_hot(targets - 1, num_classes=2).float().to(device)
            loss = train_step(model, inputs, targets, criterion, optimizer)
            running_loss += loss.item()
        print(f"Epoch {epoch+1}, loss: {running_loss/len(train_loader)}")
        losses.append(running_loss/len(train_loader))

        model.eval()
        running_loss = 0.0
        for i, data in tqdm(enumerate(val_loader), total=len(val_loader)):
            inputs, targets = data
            inputs = inputs.float().view(-1, 3, 256, 256).to(device)
            targets = F.one_hot(targets - 1, num_classes=2).float().to(device)
            loss = val_step(model, inputs, targets, criterion)
            running_loss += loss.item()
        print(f"Validation loss: {running_loss/len(val_loader)}")
        val_losses.append(running_loss/len(val_loader))
    
    return model, losses, val_losses


In [None]:
torch.cuda.empty_cache()

In [None]:

# Training loop
num_epochs = 20

model, losses, val_losses = train(num_epochs, train_loader, val_loader, model, criterion, optimizer, device)
losses, val_losses

In [None]:
# Plot the losses
plt.plot(losses, label="train")
plt.plot(val_losses, label="val")
plt.legend()
plt.show()

In [None]:
# Compute accuracy ON VAL SET PLEASE FIXME to test set
correct = 0
total = 0
with torch.no_grad():
    for data in tqdm(test_loader, total=len(test_loader)):
        images, labels = data
        images = images.float().view(-1, 3, 256, 256).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == (labels - 1)).sum().item()

print(f"Accuracy: {100 * correct / total}%")

# Save the model
torch.save(model.state_dict(), "resnet50.pth")