### Introduction

This notebook will be for environment setup and for producing the baseline result using ResNet-50. Run the **Prerequisities** section once to get the VizWiz dataset. Under **Prerequisities** there will be two sections for installing the *timm* library (contains pre-trained models). One section will be dedicated just for a local installation if you are running this on your laptop and another for a Google CoLab installation so we don't have to keep on installing the library everytime.

### Mount Google Drive

In [None]:
# Show button and code
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Prerequisites

#### Download the dataset

In this stage, we want to download all images available in VizWiz dataset. Downloading images and annotations may take a while.

In [None]:
!mkdir -p dataset/images predictions
!wget https://vizwiz.cs.colorado.edu/VizWiz_final/images/train.zip \
      https://vizwiz.cs.colorado.edu/VizWiz_final/images/val.zip \
      https://vizwiz.cs.colorado.edu/VizWiz_final/images/test.zip

--2024-04-14 15:18:42--  https://vizwiz.cs.colorado.edu/VizWiz_final/images/train.zip
Resolving vizwiz.cs.colorado.edu (vizwiz.cs.colorado.edu)... 198.59.7.50
Connecting to vizwiz.cs.colorado.edu (vizwiz.cs.colorado.edu)|198.59.7.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11298421598 (11G) [application/zip]
Saving to: ‘train.zip’


2024-04-14 15:30:45 (14.9 MB/s) - ‘train.zip’ saved [11298421598/11298421598]

--2024-04-14 15:30:45--  https://vizwiz.cs.colorado.edu/VizWiz_final/images/val.zip
Reusing existing connection to vizwiz.cs.colorado.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: 3488913457 (3.2G) [application/zip]
Saving to: ‘val.zip’


2024-04-14 15:34:35 (14.5 MB/s) - ‘val.zip’ saved [3488913457/3488913457]

--2024-04-14 15:34:35--  https://vizwiz.cs.colorado.edu/VizWiz_final/images/test.zip
Reusing existing connection to vizwiz.cs.colorado.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: 3975272799 (3.7G) [app

In [None]:
!unzip -q -o train.zip -d dataset/images
!unzip -q -o val.zip -d dataset/images
!unzip -q -o test.zip -d dataset/images

In [None]:
!rm train.zip val.zip test.zip

#### Install timm (local)

In [None]:
!pip install timm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting timm
  Downloading timm-0.6.12-py3-none-any.whl (549 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m549.1/549.1 KB[0m [31m38.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface-hub, timm
Successfully installed huggingface-hub-0.11.1 timm-0.6.12


#### Install timm (colab)

In [None]:
!pip3 install virtualenv
!virtualenv "/content/drive/MyDrive/CMSC_472/CMSC 472 Project/Code/virtual_env"
!chmod 755 "/content/drive/MyDrive/CMSC_472/CMSC 472 Project/Code/virtual_env/bin/activate"; pip install timm

Collecting virtualenv
  Downloading virtualenv-20.26.1-py3-none-any.whl (3.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.9/3.9 MB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting distlib<1,>=0.3.7 (from virtualenv)
  Downloading distlib-0.3.8-py2.py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.9/468.9 kB[0m [31m28.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: distlib, virtualenv
Successfully installed distlib-0.3.8 virtualenv-20.26.1
created virtual environment CPython3.10.12.final.0-64 in 22843ms
  creator CPython3Posix(dest=/content/drive/MyDrive/CMSC_472/CMSC 472 Project/Code/virtual_env, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==24.0, setuptools==69.5.1, wheel==0.43.0
  activators BashActivator,CShellActivator,F

### Get predictions

#### Import libraries (local)

In [None]:
import os
import argparse
import json
from datetime import datetime

import numpy as np
from PIL import Image

import torch
import torch.nn as nn
from torch.utils.data import Dataset
import torchvision
from torchvision import transforms

import timm

#### Import Libraries (CoLab)

In [None]:
import os
import argparse
import json
from datetime import datetime

import numpy as np
from PIL import Image

import torch
import torch.nn as nn
from torch.utils.data import Dataset
import torchvision
from torchvision import transforms

# changes working directory to cmsc_472 shared proj file
os.chdir("/content/drive/MyDrive/CMSC_472/CMSC 472 Project/Code/")

import sys
# add the path of the virtual environmentsite-packages to colab system path
sys.path.append("virtual_env/lib/python3.10/site-packages")
import timm

#### Set variables


In [None]:
ann_path = 'dataset/annotations.json'
images_path = 'dataset/images'
prediction_path = 'predictions/'

# change model name based on what you are training
#model_name = 'resnet50'
model_name = 'vit_base_patch32_224'

batch_size = 64

#### Load annotation file

In [None]:
annotations = json.load(open(ann_path))
indices_in_1k = [d['id'] for d in annotations['categories']]

#### Set device

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

#### Create dataset class and dataloader

In [None]:
test_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),

    # for normalization use top for
    # ViT and bottom for ResNet
    transforms.Normalize(mean=[0.5, 0.5, 0.5],
                          std=[0.5, 0.5, 0.5])
    ])
    #transforms.Normalize(mean=[0.485, 0.456, 0.406],
    #                      std=[0.229, 0.224, 0.225])
    #])

class VizWizClassification(Dataset):
    def __init__(self, annotations, transform=None):
        self.images = []
        for img in annotations['images']:
          if 'train' in img:
            self.images.append(images_path + '/train/' + str(img))

          if 'val' in img:
            self.images.append(images_path + '/val/' + str(img))

          if 'test' in img:
            self.images.append(images_path + '/test/' + str(img))

        self.transform = transform

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        image = Image.open(self.images[idx]).convert('RGB')
        if self.transform:
            image = self.transform(image)
        return image, self.images[idx].split("/")[3]

dataset = VizWizClassification(annotations,test_transform)
vizwiz_loader = torch.utils.data.DataLoader(dataset,batch_size=batch_size, shuffle=False)

#### Load the model

In [None]:
model = timm.create_model(model_name, pretrained=True).to(device)
model.eval()

VisionTransformer(
  (patch_embed): PatchEmbed(
    (proj): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32))
    (norm): Identity()
  )
  (pos_drop): Dropout(p=0.0, inplace=False)
  (patch_drop): Identity()
  (norm_pre): Identity()
  (blocks): Sequential(
    (0): Block(
      (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=768, out_features=2304, bias=True)
        (q_norm): Identity()
        (k_norm): Identity()
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=768, out_features=768, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (ls1): Identity()
      (drop_path1): Identity()
      (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=768, out_features=3072, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (norm): Identity(

#### Get predictions

In [None]:
results = {}
with torch.no_grad():
    for images, images_path in vizwiz_loader:
        images = images.to(device)
        outputs = model(images)[:,indices_in_1k]
        pred = list(outputs.data.max(1)[1].cpu())
        for i in range(len(pred)):
                results[images_path[i]] = indices_in_1k[pred[i]]

### Save the prediction file for EvalAI server

In [None]:
file_path = os.path.join(prediction_path, datetime.now().strftime("prediction-%m-%d-%Y-%H:%M:%S.json"))
with open(file_path, 'w') as outfile:
    json.dump(results, outfile)

Now you can upload this file on EvalAI server.