# ResNet

*Author: Pytorch Team*

**Deep residual networks pre-trained on ImageNet**

<img src="https://pytorch.org/assets/images/resnet.png" alt="alt" width="50%"/>

- https://pytorch.org/hub/pytorch_vision_resnet/
- https://arxiv.org/pdf/1512.03385

In [None]:
# Check current library versions
import torch, torchvision
print('torch:', torch.__version__)
print('torchvision:', torchvision.__version__)

In [None]:
# Load pretrained ResNet18 via torch.hub
# Note: torch.hub can raise ImportError if your PyTorch/torchvision versions are incompatible.
# - Using weights=True is deprecated in newer torchvision; the recommended way is to use the Weights Enum
#   with torchvision.models (shown below as a commented alternative).
import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', weights=True)

# Examples for other ResNet variants (same hub snapshot):
# model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet34', pretrained=True)
# model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)
# model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet101', pretrained=True)
# model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet152', pretrained=True)

# Switch to evaluation mode (BatchNorm/Dropout behave accordingly)
model.eval()

# Recommended alternative (commented): avoid torch.hub and use torchvision.models with weight enums
# from torchvision import models
# from torchvision.models import ResNet18_Weights
# model = models.resnet18(weights=ResNet18_Weights.DEFAULT)
# model.eval()

All pre-trained models expect input images normalized in the same way,
i.e. mini-batches of 3-channel RGB images of shape `(3 x H x W)`, where `H` and `W` are expected to be at least `224`.
The images have to be loaded in to a range of `[0, 1]` and then normalized using `mean = [0.485, 0.456, 0.406]`
and `std = [0.229, 0.224, 0.225]`.

Here's a sample execution.

In [None]:
# Download an example image (PyTorch Hub sample)
# - Requires network access. It may fail behind firewalls or proxies.
import urllib

# GitHub sample image URL and local filename
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")

# Use a try/except to handle differences across Python versions
try:
    # Legacy style (works in some environments)
    urllib.URLopener().retrieve(url, filename)
except Exception:
    # Standard Python 3 style
    urllib.request.urlretrieve(url, filename)

# After saving, we load and display it in the next cell with PIL.

In [None]:
# Preview the downloaded image
import PIL.Image
import matplotlib.pyplot as plt

# Open the saved image file (RGB)
input_image = PIL.Image.open(filename)

# Display with matplotlib for a quick sanity check
plt.imshow(input_image)
# plt.axis('off')  # Uncomment to hide axes
plt.title('Downloaded example image')
plt.show()

## Why normalize with ImageNet statistics?

Pretrained models (e.g., ResNet) were trained with input images normalized per-channel by ImageNet mean and std. Applying the same normalization at inference time aligns the input distribution with training.

- Match training distribution: Use the same mean/std to avoid performance drops from distribution shift.
- Correct channel bias: R/G/B channels have different average brightness/variance; (x - mean) / std balances scale per channel.
- Numerical stability: Prevents saturation or vanishing/exploding effects due to extreme input scales.
- Reproducibility/generalization: Stabilizes feature ranges so the model behaves robustly across images.

ImageNet recommended stats:
- mean = [0.485, 0.456, 0.406]
- std  = [0.229, 0.224, 0.225]

Code (same as in this notebook):
```python
transforms.Normalize(mean=[0.485, 0.456, 0.406],
                     std=[0.229, 0.224, 0.225])
```

Notes
- When reusing ImageNet-pretrained weights, use the ImageNet stats above.
- If training from scratch on your own dataset, consider computing and using your dataset's mean/std.

## preprocess

In [None]:
# Build preprocessing pipeline and run inference
# - ImageNet pretrained models expect RGB images with shape (3, 224, 224).
# - Process input images with Resize → CenterCrop → ToTensor → Normalize.
from PIL import Image
from torchvision import transforms

# Open image from file (RGB)
input_image = Image.open(filename)

# Preprocessing pipeline
preprocess = transforms.Compose([
    transforms.Resize(256),                      # Resize the shorter side to 256
    transforms.CenterCrop(224),                  # Center crop to 224x224
    transforms.ToTensor(),                       # PIL [0..255] → float tensor [0..1] (C, H, W)
    transforms.Normalize(                        # Normalize with ImageNet channel-wise mean/std
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

# Create (3, 224, 224) tensor
input_tensor = preprocess(input_image)
print(input_tensor.shape)

# Add batch dimension → (1, 3, 224, 224)
input_batch = input_tensor.unsqueeze(0)  # The model expects batched inputs
print(input_batch.shape)

# Move to GPU if available (for speed)
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

### Inference

In [None]:
# Inference (disable grad for speed/memory)
with torch.no_grad():
    output = model(input_batch)

# Output tensor: (1, 1000) — logits for ImageNet's 1000 classes
print(output[0])
print(output[0].shape)

# Convert logits to probabilities (sum to 1). dim=0 applies over the class dimension
probabilities = torch.nn.functional.softmax(output[0], dim=0)
print(probabilities)

In [None]:
# Download ImageNet labels
!wget https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt

In [None]:
# Load ImageNet class labels and print Top-5 results
# - Assumes imagenet_classes.txt was downloaded via wget in the previous cell.
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]  # strip newline per line

# Get top-5 probabilities and their class indices from the probability vector
# torch.topk returns (values, indices); we request k=5
top5_prob, top5_catid = torch.topk(probabilities, 5)

print('Top-5 predictions:')
for i in range(top5_prob.size(0)):
    class_name = categories[top5_catid[i]]
    prob_value = top5_prob[i].item()
    print(f"{i+1}. {class_name:25s}  prob={prob_value:.4f}")

# Note: Alternatively, you can take top-5 from logits via torch.topk(output[0], 5)
# and then apply softmax to compute probabilities.

## Top-N error

- Top-1 error
  - Definition: The fraction of images where the model’s single most-confident prediction (argmax) is NOT the ground-truth label.
  - Relation to accuracy: top-1 error = 1 − top-1 accuracy.
  - Example: If top-1 error is 30.24%, then top-1 accuracy is 69.76%.

- Top-5 error
  - Definition: The fraction of images where the ground-truth label is NOT among the model’s five most-confident predictions.
  - Relation to accuracy: top-5 error = 1 − top-5 accuracy.
  - Intuition: More forgiving for fine-grained classes (e.g., different dog breeds). If the true class appears anywhere in the top 5 guesses, it counts as correct for top-5.

- Lower is better for both. A top-5 error of 10.92% means in 89.08% of images, the correct class is within the model’s top 5 predictions.


```python
# logits: torch.Tensor of shape (N, 1000)
# target: torch.Tensor of shape (N,) with class indices

top1_correct = (logits.argmax(dim=1) == target).float().mean()
top5_correct = logits.topk(5, dim=1).indices.eq(target.unsqueeze(1)).any(dim=1).float().mean()

top1_error = 1.0 - top1_correct.item()
top5_error = 1.0 - top5_correct.item()
print(f"top-1 error: {top1_error:.4f}, top-5 error: {top5_error:.4f}")
```

## Top-5 accuracy vs. "1 − sum of top‑5 probabilities"

Top‑5 accuracy is a 0/1 correctness check per sample: it is 1 if the ground‑truth class index is among the top‑5 predicted classes (by score), else 0. The dataset/batch top‑5 accuracy is the average of those 0/1 values.

It is NOT computed as 1 minus the sum of the top‑5 probabilities. The sum of top‑5 probabilities can be large even when the true label is not in the top‑5 (accuracy=0), and it can be smaller even when the true label is inside the top‑5 (accuracy=1).

Two toy examples:
- Example A (wrongly classified but top‑5 probs sum is large):
  - probs = [0.20, 0.19, 0.18, 0.17, 0.16, 0.10], true label = index 5
  - top‑5 indices = [0,1,2,3,4] → ground truth NOT in top‑5 → top‑5 accuracy = 0
  - sum(top‑5 probs) = 0.90 → 1 − 0.90 = 0.10 (this is NOT the top‑5 error)
- Example B (correctly classified but top‑5 probs sum is smaller):
  - probs = [0.21, 0.20, 0.19, 0.18, 0.17, 0.05], true label = index 4
  - top‑5 indices = [0,1,2,3,4] → ground truth IS in top‑5 → top‑5 accuracy = 1
  - sum(top‑5 probs) = 0.95 → 1 − 0.95 = 0.05 (still NOT the top‑5 error)

In short: top‑5 accuracy checks membership of the true class in the top‑5 set (rank-based), not the magnitude of probabilities. Softmax is only needed if you want probabilities for display; the ranking (top‑k) is identical on logits and on softmaxed probabilities.

### Model Description

Resnet models were proposed in "Deep Residual Learning for Image Recognition".
Here we have the 5 versions of resnet models, which contains 18, 34, 50, 101, 152 layers respectively.
Detailed model architectures can be found in Table 1.
Their 1-crop error rates on ImageNet dataset with pretrained models are listed below.

| Model structure | Top-1 error | Top-5 error |
| --------------- | ----------- | ----------- |
|  resnet18       | 30.24       | 10.92       |
|  resnet34       | 26.70       | 8.58        |
|  resnet50       | 23.85       | 7.13        |
|  resnet101      | 22.63       | 6.44        |
|  resnet152      | 21.69       | 5.94        |

### References

 - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)