### Introduction

In this notebook, we download the dataset and install the necessary packages. Then, we will produce predictions of the VGG-19 model on our dataset. Finally, we generate a file that you can upload to the EvalAI server for evaluation.

### Prerequisites

#### Install timm 

In [3]:
!pip install timm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting timm
  Downloading timm-0.6.12-py3-none-any.whl (549 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m549.1/549.1 KB[0m [31m38.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface-hub, timm
Successfully installed huggingface-hub-0.11.1 timm-0.6.12


#### Download the dataset

In this stage, we want to download all images available in VizWiz dataset. Downloading images and annotations may take a while.

In [3]:
!mkdir -p dataset/images predictions
!wget https://vizwiz.cs.colorado.edu/VizWiz_final/images/train.zip \
      https://vizwiz.cs.colorado.edu/VizWiz_final/images/val.zip \
      https://vizwiz.cs.colorado.edu/VizWiz_final/images/test.zip

--2023-01-11 08:28:52--  https://vizwiz.cs.colorado.edu/VizWiz_final/images/train.zip
Resolving vizwiz.cs.colorado.edu (vizwiz.cs.colorado.edu)... 198.59.7.50
Connecting to vizwiz.cs.colorado.edu (vizwiz.cs.colorado.edu)|198.59.7.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11298421598 (11G) [application/zip]
Saving to: ‘train.zip’


2023-01-11 08:40:27 (15.5 MB/s) - ‘train.zip’ saved [11298421598/11298421598]

--2023-01-11 08:40:27--  https://vizwiz.cs.colorado.edu/VizWiz_final/images/val.zip
Reusing existing connection to vizwiz.cs.colorado.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: 3488913457 (3.2G) [application/zip]
Saving to: ‘val.zip’


2023-01-11 08:44:01 (15.5 MB/s) - ‘val.zip’ saved [3488913457/3488913457]

--2023-01-11 08:44:01--  https://vizwiz.cs.colorado.edu/VizWiz_final/images/test.zip
Reusing existing connection to vizwiz.cs.colorado.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: 3975272799 (3.7G) [app

In [4]:
!unzip -q -o train.zip -d dataset/images
!unzip -q -o val.zip -d dataset/images
!unzip -q -o test.zip -d dataset/images

In [11]:
!rm train.zip val.zip test.zip

In [None]:
!mv -v dataset/images/train/* dataset/images/
!mv -v dataset/images/val/* dataset/images/
!mv -v dataset/images/test/* dataset/images/

### Get predictions

#### Import libraries

In [46]:
import os
import argparse
import json
from datetime import datetime

import numpy as np
from PIL import Image

import torch
import torch.nn as nn
from torch.utils.data import Dataset
import torchvision
from torchvision import transforms

import timm 

#### Set variables


In [89]:
ann_path = 'dataset/annotations.json'
images_path = 'dataset/images'
prediction_path = 'predictions/'
model_name = 'vgg19'
batch_size = 64

#### Load annotation file

In [58]:
annotations = json.load(open(ann_path))    
indices_in_1k = [d['id'] for d in annotations['categories']]

#### Set device

In [59]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

#### Create dataset class and dataloader

In [90]:
test_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                          std=[0.229, 0.224, 0.225])
    ])       

class VizWizClassification(Dataset):
    def __init__(self, annotations, transform=None):
        self.images = [os.path.join(images_path,str(path)) for path in annotations['images']]
        self.transform = transform

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        image = Image.open(self.images[idx]).convert('RGB')
        if self.transform:
            image = self.transform(image)
        return image, self.images[idx].split("/")[2]
    
dataset = VizWizClassification(annotations,test_transform)
vizwiz_loader = torch.utils.data.DataLoader(dataset,batch_size=batch_size, shuffle=False)

#### Load the model

In [83]:
model = timm.create_model(model_name, pretrained=True).to(device)
model.eval()

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padd

#### Get predictions

In [91]:
results = {}
with torch.no_grad():
    for images, images_path in vizwiz_loader:
        images = images.to(device)
        outputs = model(images)[:,indices_in_1k]
        pred = list(outputs.data.max(1)[1].cpu())
        for i in range(len(pred)):
                results[images_path[i]] = indices_in_1k[pred[i]]

### Save the prediction file for EvalAI server

In [92]:
file_path = os.path.join(prediction_path, datetime.now().strftime("prediction-%m-%d-%Y-%H:%M:%S.json"))
with open(file_path, 'w') as outfile:
    json.dump(results, outfile)

Now you can upload this file on EvalAI server.