# Introduction
This notebook contains a portion of results from our NeurIPS 2023 submission. We attempt training on chunk and token embeddings and explore how these embeddings may carry explanatory signal for classification. The hypothesis is that we've encoded positional information sufficiently that we can now learn on the concatenated embeddings themselves.

This notebook is specifically Step 1 in the process: gathering embeddings and creating chunk-embedded sequence.

# 1. Sequence-level Dataset Construction
- Running inference with trained $f_\theta$ to construct $\mathbf{Z}$ embedded sequences, for both train and test
- We also show a bit of the process before we dive into full-sequence training. We get an idea of what clustering over token embeddings looks like before the conversion to $\mathbf{C}$ sequence mosaics


In [1]:
%load_ext autoreload
%autoreload 2
import torch
from embed_patches import *

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
print(torch.version.cuda)

USE_GPU = True
print("GPU detected?", torch.cuda.is_available())
if USE_GPU and torch.cuda.is_available():
	device = torch.device('cuda')
	print("\nNote: gpu available & selected!")
else:
	device = torch.device('cpu')
	print("\nNote: gpu NOT available!")

11.7
GPU detected? False

Note: gpu NOT available!


## 1A. Get train set stats/info

In [3]:
patch_dir = "/home/data/tinycam/train/train.hdf5"
label_dict_path = "/home/lofi/lofi/src/outputs/train-cam-cam16-224-background-labeldict.obj"
image_names = print_X_names(label_dict_path)
train_dim_dict = gather_Z_dims(patch_dir, image_names)

Gathering dimensions...
done!


In [4]:
i_max = 0
j_max = 0
for v in train_dim_dict.values():
    if v[0] > i_max:
        i_max = v[0]
    if v[1] > j_max:
        j_max = v[1]

print("max dims (gathered from extracted patches) are:", i_max, j_max)

max dims (gathered from extracted patches) are: 102 108


In [5]:
from utils import deserialize
custom_train_dict_path = "/home/data/tinycam/test/cam16-eval/my_data/cam16_train_dim_dict.obj"
train_dims = deserialize(custom_train_dict_path)

image_names = []
train_dim_dict = {}
for key in train_dims.keys():
    im_id = key.split(".tif")[0]
    image_names.append(im_id)
    train_dim_dict[im_id] = (train_dims[key][3][1], train_dims[key][3][0]) # swap dims

In [9]:
i_max = 0
j_max = 0
for v in train_dim_dict.values():
    if v[0] > i_max:
        i_max = v[0]
    if v[1] > j_max:
        j_max = v[1]

print("Using original image dimensions, max sizes are:", i_max, j_max)

Using original image dimensions, max sizes are: 123 123


## 1B. Get test set stats/info

In [10]:
import pandas as pd
ref_path = "/home/data/tinycam/test/cam16-eval/csnaftp_gdrive-16/reference.csv"
ref_df = pd.read_csv(ref_path, header=None, names=["id", "class", "meta1", "meta2"])
ref_df.head()

Unnamed: 0,id,class,meta1,meta2
0,test_001,Tumor,IDC,Macro
1,test_002,Tumor,ILC,Macro
2,test_003,Normal,DCIS,
3,test_004,Tumor,IDC,Micro
4,test_005,Normal,DCIS,


In [11]:
test_classes = [1 if el=="Tumor" else 0 for el in ref_df["class"]]
test_label_dict = dict(zip(ref_df["id"], test_classes))

In [12]:
from utils import deserialize
custom_test_dict_path = "/home/data/tinycam/test/cam16-eval/my_data/cam16_test_dim_dict.obj"
test_dims = deserialize(custom_test_dict_path)

gt_path = "/home/data/tinycam/test/cam16-eval/csnaftp_gdrive-16/lesion_annotations"
gt_files = os.listdir(gt_path)

image_names = []
test_dim_dict = {}
for key in test_dims.keys():
    im_id = key.split(".tif")[0]
    image_names.append(im_id)
    test_dim_dict[im_id] = (test_dims[key][3][1], test_dims[key][3][0]) # swap dims

In [13]:
i_max = 0
j_max = 0
for v in test_dim_dict.values():
    if v[0] > i_max:
        i_max = v[0]
    if v[1] > j_max:
        j_max = v[1]

print("max sizes are:", i_max, j_max)

max sizes are: 123 118


In [14]:
from utils import serialize
label_dict_path_test = "/home/lofi/lofi/src/outputs/test-cam-cam16-224-background-labeldict.obj"
utils.serialize(test_label_dict, label_dict_path_test)

*Note these values above:* we want to pad our images to all be the same size for any downstream learning. Say 124 x 124 for the padding.

## 1C. Generating Training Masks
- relate numpy coords with salient objects

In [15]:
from cam_process import computeEvaluationMaskXML_lowres
import numpy as np
import matplotlib.pyplot as plt

In [18]:
gt_path = "/home/data/tinycam/test/cam16-eval/gigadb-16-17/lesion_annotations (1)"
gt_save_path = "/home/data/tinycam/train/gt_masks"
#-------rerun if needed: can take 15-20min-----------
# level, resolution = 5, None
# print("we have", len(os.listdir(gt_path)), "masks to generate!")
# for i, mask in enumerate(os.listdir(gt_path)):
#     id = mask.split(".xml")[0]
#     print("started processing mask", i, "| ID:", id)
#     og_dims = (train_dims[id + ".tif"][0][1], train_dims[id + ".tif"][0][0]) # swap dims
#     # og_dims = test_dims[id + ".tif"][0]
#     mask_np = computeEvaluationMaskXML_lowres(gt_path + "/" + mask, og_dims, resolution, level)
#     if mask_np is None:
#         break
#     np.save(gt_save_path + "/" + id + "_gt", mask_np)
#     print("finished processing mask", i, "| ID:", id)

Now we create a lookup dictionary of salient objects

In [19]:
# ----- run again if you want, 2 min------
# gt_dict = {}
# gt_path = "/home/data/tinycam/train/gt_masks/"
# for idx, gt_file in enumerate(os.listdir(gt_path)):
#     gt_mask = np.load(gt_path + "/" + gt_file)
#     gt_id = gt_file.split("_gt.npy")[0] 
#     new_dims = train_dims[gt_id + ".tif"][3]
#     gt_mask_sm = cv2.resize(gt_mask, (new_dims[1], new_dims[0]), interpolation=cv2.INTER_AREA)
#     to_add = dict(((j,i), int(gt_mask_sm[i][j])) for i in range(len(gt_mask_sm)) for j in range(len(gt_mask_sm[0])))
#     gt_dict[gt_id] = to_add

# utils.serialize(gt_dict, "outputs/train_so_dict.obj")

## 1D. Model Inference on Train Set
Let's now load and set up model $f_\theta$. Choices for Camelyon16 data include:
- `"tile2vec"`: an unsupervised learning model, ResNetr-16 trained from scratch
- `"vit_iid"`: a (weakly) supervised learning model, ViT trained from scratch on IID fuzzy targets
- `"hipt"`: a self-supervised and weakly supervised learning model, a hierarchical ViT, pre-trained and used out of the box
- `"plip"`: a Foundation Model, pre-trained and used out of the box
- `None`: skip inference

In [20]:
# set model you want to run for inference
modelstr = None #"tile2vec", "plip", "hipt", "vit_iid"

In [21]:
if modelstr == "tile2vec":
    from models import ResNet18 
    model = ResNet18(n_classes=2, in_channels=3, z_dim=128, supervised=False, no_relu=False, loss_type='triplet', tile_size=224, activation='relu')
    chkpt = "/home/lofi/lofi/models/cam/to-port/ResNet18-hdf5_triplets_random_loading-224-label_selfsup-custom_loss-on_cam-cam16-filtration_background.sd"
    checkpoint = torch.load(chkpt, map_location=device)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.to(device)
    prev_epoch = checkpoint['epoch']
    loss = checkpoint['loss']
elif modelstr == "plip":
    from transformers import AutoProcessor, AutoTokenizer, AutoModelForZeroShotImageClassification
    tokenizer = AutoTokenizer.from_pretrained("vinid/plip")
    processor = AutoProcessor.from_pretrained("vinid/plip")
    model_plip = AutoModelForZeroShotImageClassification.from_pretrained("vinid/plip")
elif modelstr == None:
    print("No model selected for inference! Skipping inference...")
else:
    print("Not yet supported for inference! Skipping inference...")

No model selected for inference! Skipping inference...


Now we run inference on the training set. We want to process all patches in the training set and then attempt to learn from the concatenated embeddings

*Note:* tile2vec should take roughly 20min with 1 T4 GPU, and then closer to 60min for plip

In [22]:
if modelstr == "tile2vec":
    patch_dir = "/home/data/tinycam/train/train.hdf5"
    save_dir = "/home/data/tinycam/train/Zs"
    construct_Zs_efficient(model, patch_dir, train_dim_dict, save_dir, device, scope="all")
elif modelstr == "plip":
    patch_dir = "/home/data/tinycam/train/train.hdf5"
    save_dir = "/home/data/tinycam/train/Zs_plip"
    construct_Zs_efficient(model_plip, patch_dir, train_dim_dict, save_dir, device, scope="all", modelstr="plip", processor=processor, tokenizer=tokenizer)
elif modelstr == None:
    print("No model selected for inference! Skipping inference...")
else:
    print("Not yet supported for inference! Skipping inference...")

No model selected for inference! Skipping inference...


# 1E. Model Inference on Test Set
Expect 20-45min of GPU computation depending on model architecture.

In [23]:
if modelstr == "tile2vec":
    patch_dir = "/home/data/tinycam/test/test.hdf5"
    save_dir = "/home/data/tinycam/test/Zs"
    construct_Zs_efficient(model, patch_dir, test_dim_dict, save_dir, device, scope="all", arm="test") 
elif modelstr == "plip":
    patch_dir = "/home/data/tinycam/test/test.hdf5"
    save_dir = "/home/data/tinycam/test/Zs_plip"
    construct_Zs_efficient(model_plip, patch_dir, test_dim_dict, save_dir, device, scope="all", modelstr="plip", processor=processor, tokenizer=tokenizer, arm="test")
elif modelstr == None:
    print("No model selected for inference! Skipping inference...")
else:
    print("Not yet supported for inference! Skipping inference...")

No model selected for inference! Skipping inference...


Great! Now inference is complete, we can then take a look at the data sprites in `Cam-Step2-Viz.ipynb`