<a href="https://colab.research.google.com/github/kanadn/DiHT-GCC/blob/main/AMLProjectTinkering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Mount Google drive, `cd` to the path

In [None]:
#mount Google drive and connect to a path
from google.colab import drive
import os

drive.mount('/content/drive')
%cd /content/drive/MyDrive/AdvancedML/AMLProject/diht

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/AdvancedML/AMLProject/diht


Install requirements

In [None]:
!pip install -r requirements.txt
!pip install -e .

For some reason we need a specific version of Pillow

In [None]:
!pip install Pillow==9.0.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting Pillow==9.0.0
  Downloading Pillow-9.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.3/4.3 MB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Pillow
  Attempting uninstall: Pillow
    Found existing installation: Pillow 9.4.0
    Uninstalling Pillow-9.4.0:
      Successfully uninstalled Pillow-9.4.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
diht 1.0 requires pillow==9.4.0, but you have pillow 9.0.0 which is incompatible.[0m[31m
[0mSuccessfully installed Pillow-9.0.0


These models are available in the code...

In [None]:
import diht
print(diht.available_models())

['diht_vitb32_224px', 'diht_vitl14_336px', 'diht_vitb16_224px']


In [None]:
import torch
import diht

from diht import model_zoo
from PIL import Image


text_tokenizer, image_transform, model = model_zoo.load_model(
    "diht_vitl14_336px", is_train=False
)

image = Image.open("infer_image.png").convert("RGB")
image = image_transform(image).unsqueeze(0)
text_captions = ["a mountain", "a beach", "a desert"]
text = text_tokenizer(text_captions)

with torch.no_grad():
    image_features, text_features, logit_scale = model(image, text)
    logits_per_image = logit_scale * image_features @ text_features.T
    probs = logits_per_image.softmax(dim=-1).numpy()

print(f"text captions: {text_captions}")
print(f"text caption probs: {probs}")

100%|█████████████████████████████████████| 1.59G/1.59G [01:51<00:00, 15.3MiB/s]


text captions: ['a mountain', 'a beach', 'a desert']
text caption probs: [[0.99370664 0.00514016 0.00115325]]


Sample code is working, now let's do some tinkering...

In [None]:
text_captions2 = ["a mountain", "a lake", "a mountain and a lake"]
texts = text_tokenizer(text_captions2)

with torch.no_grad():
    image_features, text_features, logit_scale = model(image, texts)
    logits_per_image = logit_scale * image_features @ text_features.T
    probs = logits_per_image.softmax(dim=-1).numpy()

print(f"text captions: {text_captions2}")
print(f"text caption probs: {probs}")

text captions: ['a mountain', 'a lake', 'a mountain and a lake']
text caption probs: [[0.01392776 0.01522721 0.97084504]]


The model seems to be smart :)

Now let's import Google's [Conceptual Captions](https://ai.google.com/research/ConceptualCaptions/) dataset...  
Edit: Since the dataset only has image URLs, we need to fecth the images. This might take a lot of time, like, a lot!  
The following approach stores the dataset into a map, I think. It failed once and I need to restart everything again. Seems very inefficient. Skip this.

In [None]:
!pip install datasets

In [None]:
from concurrent.futures import ThreadPoolExecutor
from functools import partial
import io
import urllib

import PIL.Image

from datasets import load_dataset
from datasets.utils.file_utils import get_datasets_user_agent


USER_AGENT = get_datasets_user_agent()


def fetch_single_image(image_url, timeout=None, retries=0):
    for _ in range(retries + 1):
        try:
            request = urllib.request.Request(
                image_url,
                data=None,
                headers={"user-agent": USER_AGENT},
            )
            with urllib.request.urlopen(request, timeout=timeout) as req:
                image = PIL.Image.open(io.BytesIO(req.read()))
            break
        except Exception:
            image = None
    return image


def fetch_images(batch, num_threads, timeout=None, retries=0):
    fetch_single_image_with_args = partial(fetch_single_image, timeout=timeout, retries=retries)
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        batch["image"] = list(executor.map(fetch_single_image_with_args, batch["image_url"]))
    return batch


num_threads = 20
dset = load_dataset("conceptual_captions")
dset = dset.map(fetch_images, batched=True, batch_size=100, fn_kwargs={"num_threads": num_threads})


Found another way: https://github.com/igorbrigadir/DownloadConceptualCaptions

In [None]:
%cd ..

/content/drive/MyDrive/AdvancedML/AMLProject


In [None]:
#!git clone https://github.com/igorbrigadir/DownloadConceptualCaptions.git
%cd DownloadConceptualCaptions/

/content/drive/MyDrive/AdvancedML/AMLProject/DownloadConceptualCaptions


In [None]:
!pip install python-magic

The following step is problematic. Since the folder already has many images, the drive mount function is failing with error "timed out" :(

In [None]:
!python download_data.py

So I managed to download the data on my machine. Now the problem is, how to test this data?... I was planning to use Colab but looking at the sheer size of the data, Drive can't be used to store.  
So now our task is to find an optimal way to test this gigantic dataset. Either on cloud or on our machines.