## Title: Deep Learning Solution for Automating the Processing of Observation Segmentation on Biodiversity Image Data


This notebook is for the experiments in my final year project. My project will be utilizing this state-of-the-art deep learning model,CLIP to perform interpretable observation segmentation on biodiversity image data.

Some useful markdown in Jupyter notebook: https://gtribello.github.io/mathNET/assets/notebook-writing.html

# Setting Up

## Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
! ls 

In [None]:
%cd /content/gdrive/My Drive/Colab Notebooks/data-1

## CLIP Set Up 

Code below will install the clip package and its dependencies, and check if PyTorch 1.7.1 or later is installed.

In [None]:
! pip install ftfy regex tqdm
! pip install git+https://github.com/openai/CLIP.git

In [None]:
import numpy as np
import torch
from pkg_resources import packaging

print("Torch version:", torch.__version__)

**Loading & Evaluate the Model**

we can check all the available CLIP models by using clip.available_models()

In [None]:
import clip

clip.available_models()

In [None]:
model, preprocess = clip.load("ViT-L/14@336px")
model.cuda().eval()
input_resolution = model.visual.input_resolution
context_length = model.context_length
vocab_size = model.vocab_size

print("Model parameters:", f"{np.sum([int(np.prod(p.shape)) for p in model.parameters()]):,}")
print("Input resolution:", input_resolution)
print("Context length:", context_length)
print("Vocab size:", vocab_size)

**Image Preprocessing in CLIP**

- resize the input images and center-crop them to conform with the image resolution that the model expects.Before doing so, we will normalize the pixel intensity using the dataset mean and standard deviation.

In [None]:
preprocess

**Text Preprocessing in CLIP**
- We use a case-insensitive tokenizer, which can be invoked using clip.tokenize(). By default, the outputs are padded to become 77 tokens long, which is what the CLIP models expects.

In [None]:
clip.tokenize("hello world !")

# Experiment 1 and Experiment 2 - Proof of Concept

Proof of Concept - experiment with image-caption feature to assess its feasibility and potential impact.

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
from torchvision.datasets import CIFAR100


resultList = []
device = "cuda" if torch.cuda.is_available() else "cpu"
# model, preprocess = clip.load("RN50", device=device)
# model, preprocess = clip.load("RN101", device=device)
# model, preprocess = clip.load("RN50x4", device=device)
# model, preprocess = clip.load("RN50x16", device=device)
# model, preprocess = clip.load("RN50x64", device=device)
# model, preprocess = clip.load("ViT-B/32", device=device)
# model, preprocess = clip.load("ViT-B/16", device=device)
# model, preprocess = clip.load("ViT-L/14", device=device)
model, preprocess = clip.load("ViT-L/14@336px", device=device)  #loading the ViT-L/14@336px model

# download CIFAR100 dataset
cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False)

# loading image data
source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) # returns list
dir_contents.sort()
num_pics = len(dir_contents)   # return the number of pictures in directory

for i in range(num_pics):
#Load and preprocess images and texts 
  print("\nImage Title: ",dir_contents[i]);
  image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
  text_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in cifar100.classes]).to(device)

# encode images and text via image encoder and transformer
  with torch.no_grad():
      image_features = model.encode_image(image)
      text_features = model.encode_text(text_inputs)

# compute the similarity between the images and captions, and return the most similar
  image_features /= image_features.norm(dim=-1, keepdim=True)
  text_features /= text_features.norm(dim=-1, keepdim=True)
  similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
  values, indices = similarity[0].topk(1)

# print the top prediction for each image                                   
  print("\nTop prediction:\n")
  for value, index in zip(values, indices):
      print(f"{cifar100.classes[index]:>16s}: {100 * value.item():.2f}%")
      resultList.append(f"{cifar100.classes[index]:>16s}")

    



Code below is trying with low-level captions to investigate the impact

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 

resultList = []
device = "cuda" if torch.cuda.is_available() else "cpu"
# model, preprocess = clip.load("RN50", device=device)
# model, preprocess = clip.load("RN101", device=device)
# model, preprocess = clip.load("RN50x4", device=device)
# model, preprocess = clip.load("RN50x16", device=device)
# model, preprocess = clip.load("RN50x64", device=device)
# model, preprocess = clip.load("ViT-B/32", device=device)
# model, preprocess = clip.load("ViT-B/16", device=device)
# model, preprocess = clip.load("ViT-L/14", device=device)
model, preprocess = clip.load("ViT-L/14@336px", device=device)

# load the captions from txt file into list
with open('/content/gdrive/MyDrive/Colab Notebooks/lowlevelic.txt') as f:
    image_descriptions = [line.rstrip() for line in f]

source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) # returns list
dir_contents.sort()
num_pics = len(dir_contents)   # find number of pictures in directory

for i in range(num_pics):
#Load and preprocess images and texts
  print("\nImage Title: ",dir_contents[i]);
  image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
  text_inputs = torch.cat([clip.tokenize(f" a photo of a {c}") for c in image_descriptions]).to(device)

  with torch.no_grad():
      image_features = model.encode_image(image)
      text_features = model.encode_text(text_inputs)

  image_features /= image_features.norm(dim=-1, keepdim=True)
  text_features /= text_features.norm(dim=-1, keepdim=True)
  similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
  values, indices = similarity[0].topk(1)
                                     
  print("\nTop predictions:\n")
  for value, index in zip(values, indices):
      print(f"{image_descriptions[index]:>16s}: {100 * value.item():.2f}%")
      resultList.append(f"{image_descriptions[index]:>16s}")


Process the input image data based on the predictions into a list with 0, 1 which act as the boundaries

In [None]:
boundariesList = []

x = len(resultList)
print(f"\nNumber of Pics: {num_pics}")
print(f"Length of List: {x}" )  
for i in range(num_pics):
  if resultList[i] == resultList[x-1]:    #length count from 1, so need deduct 1
    boundariesList.append(0)

  elif(resultList[i] == resultList[i+1]):
      boundariesList.append(0)

  else:
    boundariesList.append(1)


for i in range(num_pics):
   print(f"{i+1}. {resultList[i]} {boundariesList[i]}")

Processing the actual boundaries file

In [None]:
import pandas as pd

# load boundaries.txt file to act as true label
boundaries_path_file = "/content/gdrive/MyDrive/Colab Notebooks/data-1/Boundaries.txt"
boundaries_df = pd.read_csv(boundaries_path_file)
bound_strings = boundaries_df.columns.tolist()
num_bound = len(bound_strings)
bound_int = []

#convert string list to int list
for i in range(0,len(bound_strings)):
    bound_strings[i] = int(bound_strings[i])

trueboundariesList = []


for j in range(num_pics):                 # num_pics = 273
  if j == bound_strings[num_bound -1]:    # -1 cause j start from 0, when 34 meet 34 is last
      trueboundariesList.append(0)
  elif any(j +1 == y  for y in bound_strings):
    trueboundariesList.append(1)
  else:
     trueboundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {trueboundariesList[i]}")

# there are 35 of boundaries detected, which match with the Boundaries.txt for data-1
count = len([elem for elem in trueboundariesList if elem == 1])
  

Calculate F1-Score

In [None]:
# initializations
# Compute TP, FP, TN, FN
tp = 0; fp = 0; tn = 0; fn = 0;
for bi in range(num_pics):
    # If actual==1 and pred==1, increment true positives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
    tp = tp + 1
    # If actual==1 and pred==0, increment false negatives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
    fn = fn + 1
    # If actual==0 and pred==1, increment false positives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
    fp = fp + 1
    # If actual==0 and pred==0, increment true negatives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
    tn = tn + 1

# Display tp, fp, tn, fn
print('True positives: ', tp)
print('False positives: ', fp)
print('True negatives: ', tn)
print('False negatives: ', fn)

# Compute precision and recall
denom = (tp + fp)
if denom > 0:
    precision = tp / denom
else:
    precision = 0
denom = (tp + fn)
if denom > 0:
    recall = tp / denom
else:
    recall = 0
# Compute F1 score

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = {
        'tp': tp,
        'fp': fp,
        'tn': tn,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}

print(res)
 

# Experiment 3

## Alternative Approach 1: Boundaries Condition

Instead of just relying on the CLIP model to classify and make the prediction on boundaries, this experiment attempts to add one more criterion on top of CLIP prediction.

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
from torchvision.datasets import CIFAR100


resultList = []
diff_list = []

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = clip.load("ViT-L/14@336px", device=device)

cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False)

source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) # returns list
dir_contents.sort()
num_pics = len(dir_contents)   # find number of pictures in directory

for i in range(num_pics):
#Load and prepare images
  print("\nImage Title: ",dir_contents[i]);
  image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
  text_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in cifar100.classes]).to(device)

  with torch.no_grad():
      image_features = model.encode_image(image)
      text_features = model.encode_text(text_inputs)

  image_features /= image_features.norm(dim=-1, keepdim=True)
  text_features /= text_features.norm(dim=-1, keepdim=True)
  similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
  values, indices = similarity[0].topk(3)
  topOneCaption = values[0].tolist()
  topTwoCaption = values[1].tolist()
  #print(topOneCaption - topTwoCaption)
  # subtract the top 1 and top 2 captions, and saved in diff_list
  diff_list.append(topOneCaption - topTwoCaption)
                                     
  print("\nTop predictions:\n")
  for value, index in zip(values, indices):
      print(f"{cifar100.classes[index]:>16s}: {value.item():.2f}")
  
  # store all the top 1 caption for each image in resultList
  values, indices = similarity[0].topk(1)
  for value, index in zip(values, indices):
      resultList.append(f"{cifar100.classes[index]:>16s}")

print(diff_list)

Below is trying to access the top 5 captions for each image after combining ImageNet and CIFAR-100 classes.

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 

resultList = []
diff_list = []
top5List = []
device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = clip.load("ViT-L/14@336px", device=device)

with open('/content/gdrive/MyDrive/Colab Notebooks/ImageCIFARTOP5.txt') as f:
    image_descriptions = [line.rstrip() for line in f]

source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) # returns list
dir_contents.sort()
num_pics = len(dir_contents)   # find number of pictures in directory

for i in range(num_pics):
#Load and prepare images
  print("\nImage Title: ",dir_contents[i]);
  image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
  text_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in image_descriptions]).to(device)

  with torch.no_grad():
      image_features = model.encode_image(image)
      text_features = model.encode_text(text_inputs)

  image_features /= image_features.norm(dim=-1, keepdim=True)
  text_features /= text_features.norm(dim=-1, keepdim=True)
  similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
  values, indices = similarity[0].topk(5)
  topOneCaption = values[0].tolist()
  topTwoCaption = values[1].tolist()
  diff_list.append(topOneCaption - topTwoCaption)
                                     
                                     
  print("\nTop predictions:\n")
  for value, index in zip(values, indices):
      print(f"{image_descriptions[index]:>16s}: {value.item():.4f}")
      # top5List will save all the top 5 captions results
      #top5List.append(f"{image_descriptions[index]:>16s}")
  
  # store the index which is top 1 caption for result List to save
  values, indices = similarity[0].topk(1)
  for value, index in zip(values, indices):
      resultList.append(f"{image_descriptions[index]:>16s}")

#print(top5List)

In [None]:
# Union of top-5 classes of CIFAR and Image Net (can ignore if not collecting union of captions)
# removed duplication
top5List = list(dict.fromkeys(top5List))
top5List
len(top5List)
#for x in range(len(top5List)):
  #print(top5List(x))

Process the input image data into a list with 0, 1 act as the boundaries, also used to find the best threshold value for o and p

In [None]:
# process the input image data into a list with 0, 1 act as the boundaries

#o = 0.00
#p = 0.00

#diffO = []
#diffF1 = []

#while o <= 1.00:

boundariesList = []

x = len(resultList)
#print(f"\nNumber of Pics: {num_pics}")
#print(f"Length of List: {x}" )  
for i in range(num_pics):

  if resultList[i] == resultList[x-1]:    #length count from 1, so need deduct 1
    boundariesList.append(0)
    #print(f"{i+1}. {resultList[i]} 0")
  elif(resultList[i] == resultList[i+1] and (abs(diff_list[i] - diff_list[i+1])) <= 0.95): 
      boundariesList.append(0)
      #same text but if similarity too high then boundary
  elif(resultList[i] == resultList[i+1] and (abs(diff_list[i] - diff_list[i+1])) >= 0.95):
      boundariesList.append(1)
    #print(f"{i+1}. {resultList[i]} 0")
  elif(resultList[i] != resultList[i+1] and (abs(diff_list[i] - diff_list[i+1])) >= 0.35):
      boundariesList.append(1) 
  else:
    boundariesList.append(0)

#for i in range(num_pics):
#    print(f"{i+1}. {resultList[i]}    {diff_list[i]}      {boundariesList[i]}")

##############################
# Calculating F1-score
tp = 0; fp = 0; tn = 0; fn = 0;
for bi in range(num_pics):
    # If actual==1 and pred==1, increment true positives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
    tp = tp + 1
    # If actual==1 and pred==0, increment false negatives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
    fn = fn + 1
    # If actual==0 and pred==1, increment false positives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
    fp = fp + 1
    # If actual==0 and pred==0, increment true negatives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
    tn = tn + 1


# Compute precision and recall
denom = (tp + fp)
if denom > 0:
    precision = tp / denom
else:
    precision = 0
denom = (tp + fn)
if denom > 0:
    recall = tp / denom
else:
    recall = 0

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = {
        'tp': tp,
        'fp': fp,
        'tn': tn,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}
#diffO.append(o)
#diffF1.append(res['f1'])
#o += 0.05

#print(diffO)
print(res)




Utilising matplotlib to plot out the graph for for relationship between threshold and accuracy

In [None]:
import matplotlib.pyplot as plt

plt.plot(diffO, diffF1)
plt.xlabel("Diff_value (O)")
plt.ylabel("F1-score")
plt.xlim(0, 1.00)
plt.ylim(0, 1.00)
plt.show

In [None]:
import matplotlib.pyplot as plt

plt.plot(diffO, diffF1)
plt.xlabel("Diff_value (P)")
plt.ylabel("F1-score")
plt.xlim(0, 1.00)
plt.ylim(0, 1.00)
plt.show

In [None]:
# Processing the actual boundaries file
import pandas as pd


boundaries_path_file = "/content/gdrive/MyDrive/Colab Notebooks/data-1/Boundaries.txt"
boundaries_df = pd.read_csv(boundaries_path_file)
bound_strings = boundaries_df.columns.tolist()
num_bound = len(bound_strings)
bound_int = []

#convert string list to int list
for i in range(0,len(bound_strings)):
    bound_strings[i] = int(bound_strings[i])

trueboundariesList = []


for j in range(num_pics):                 # num_pics = 273
  if j == bound_strings[num_bound -1]:    # -1 cause j start from 0, when 34 meet 34 is last
      trueboundariesList.append(0)
  elif any(j +1 == y  for y in bound_strings):
    trueboundariesList.append(1)
  else:
     trueboundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {trueboundariesList[i]}")

# there are 35 of boundaries detected, which match with the Boundaries.txt for data-1
count = len([elem for elem in trueboundariesList if elem == 1])

In [None]:
# Calculate F-1 

# initializations
# Compute TP, FP, TN, FN
tp = 0; fp = 0; tn = 0; fn = 0;
for bi in range(num_pics):
    # If actual==1 and pred==1, increment true positives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
    tp = tp + 1
    # If actual==1 and pred==0, increment false negatives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
    fn = fn + 1
    # If actual==0 and pred==1, increment false positives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
    fp = fp + 1
    # If actual==0 and pred==0, increment true negatives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
    tn = tn + 1

# Display tp, fp, tn, fn
print('True positives: ', tp)
print('False positives: ', fp)
print('True negatives: ', tn)
print('False negatives: ', fn)

# Compute precision and recall
denom = (tp + fp)
if denom > 0:
    precision = tp / denom
else:
    precision = 0
denom = (tp + fn)
if denom > 0:
    recall = tp / denom
else:
    recall = 0
# Compute F1 score

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = {
        'tp': tp,
        'fp': fp,
        'tn': tn,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}

print(res)

## Alternative Approach 2: Vector Distances between Adjacent images

1)Use 5 text queries.

2)Select 5 text queries from ImageCIFARTOP5 (which has 68 classes in total):

grasshopper,
fly,
spider,
beetle,
caterpillar

3)Euclidean Distance

4) threshold distance to determine boundary
- can perform optimisation for point 2 & 4

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 

resultList = []
firstCaption = []
secondCaption = []
thirdCaption = []
fourthCaption = []
fifthCaption = []

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = clip.load("ViT-L/14@336px", device=device)
# list down the text queries in caption list
caption = ["grasshopper", "fly", "spider", "beetle", "caterpillar"]
caption.sort()  # sort captions in alphabetical order

source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) # returns list
dir_contents.sort()
num_pics = len(dir_contents)   # find number of pictures in directory

for i in range(num_pics):
#Load and prepare images
  print("\nImage Title: ",dir_contents[i]);
  image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
  caption_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in caption]).to(device)
 
  with torch.no_grad():
      image_features = model.encode_image(image)
      text_features = model.encode_text(caption_inputs)

  image_features /= image_features.norm(dim=-1, keepdim=True)
  text_features /= text_features.norm(dim=-1, keepdim=True)
  similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
  # save the similarity values and captions for top 5
  values, indices = similarity[0].topk(5, sorted = False)

  # Create a list of tuples containing the caption and its corresponding score
  captions_and_scores = [(caption[idx], values[idx].item()) for idx in indices]
  # Sort the list based on the original order of the captions
  captions_and_scores.sort(key=lambda x: caption.index(x[0]))

  firstCaption.append(captions_and_scores[0][1])
  secondCaption.append(captions_and_scores[1][1])
  thirdCaption.append(captions_and_scores[2][1])
  fourthCaption.append(captions_and_scores[3][1])
  fifthCaption.append(captions_and_scores[4][1])

  print("\nPredictions:\n")
  for caption_score in captions_and_scores:
      print(f"{caption_score[0]:>16s}: {caption_score[1]:.4f}")



Calculate Euclidean Distance Here 

In [None]:
import numpy as np

euclidList = []
x = len(firstCaption)
# initializing points in numpy arrays
for i in range (num_pics):
  if (firstCaption[i] == firstCaption [x -1]) and   (secondCaption [i] ==secondCaption[x-1]) and (thirdCaption [i] ==thirdCaption[x-1])\
    and (fourthCaption [i] ==fourthCaption[x-1]) and (fifthCaption [i] ==fifthCaption[x-1]):
    dist = 0.00

  else:
    image1 = np.array((firstCaption [i], secondCaption [i], thirdCaption [i], fourthCaption [i],fifthCaption [i]))
    image2 = np.array((firstCaption [i+1], secondCaption [i+1], thirdCaption [i+1], fourthCaption [i+1],fifthCaption [i+1]))
    dist = np.linalg.norm(image1 - image2)  
    # calculating Euclidean distance using linalg.norm()

  euclidList.append(dist)
 
# printing Euclidean distance
print(f"Length: {len(euclidList)} ")


Process the input image data into a list with 0, 1 act as the boundaries

In [None]:
boundariesList = []

x = len(euclidList)
print(f"\nNumber of Pics: {num_pics}")
print(f"Length of List: {x}" )  
for i in range(num_pics):
  if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
    boundariesList.append(0)
  elif(abs(euclidList[i]) - (euclidList[i+1]) > 0.40):
      boundariesList.append(1)
  else:
    boundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {euclidList[i]} {boundariesList[i]}")

Processing the actual boundaries file

In [None]:
import pandas as pd


boundaries_path_file = "/content/gdrive/MyDrive/Colab Notebooks/data-1/Boundaries.txt"
boundaries_df = pd.read_csv(boundaries_path_file)
bound_strings = boundaries_df.columns.tolist()
num_bound = len(bound_strings)
bound_int = []

#convert string list to int list
for i in range(0,len(bound_strings)):
    bound_strings[i] = int(bound_strings[i])

trueboundariesList = []


for j in range(num_pics):                 # num_pics = 273
  if j == bound_strings[num_bound -1]:    # -1 cause j start from 0, when 34 meet 34 is last
      trueboundariesList.append(0)
  elif any(j +1 == y  for y in bound_strings):
    trueboundariesList.append(1)
  else:
     trueboundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {trueboundariesList[i]}")

# there are 35 of boundaries detected, which match with the Boundaries.txt for data-1
count = len([elem for elem in trueboundariesList if elem == 1])

In [None]:
# Calculate F-1 

# initializations
# Compute TP, FP, TN, FN
tp = 0; fp = 0; tn = 0; fn = 0;
for bi in range(num_pics):
    # If actual==1 and pred==1, increment true positives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
    tp = tp + 1
    # If actual==1 and pred==0, increment false negatives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
    fn = fn + 1
    # If actual==0 and pred==1, increment false positives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
    fp = fp + 1
    # If actual==0 and pred==0, increment true negatives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
    tn = tn + 1

# Display tp, fp, tn, fn
print('True positives: ', tp)
print('False positives: ', fp)
print('True negatives: ', tn)
print('False negatives: ', fn)

# Compute precision and recall
denom = (tp + fp)
if denom > 0:
    precision = tp / denom
else:
    precision = 0
denom = (tp + fn)
if denom > 0:
    recall = tp / denom
else:
    recall = 0
# Compute F1 score

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = {
        'tp': tp,
        'fp': fp,
        'tn': tn,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}

print(res)
 

This block of code is to add loop to find the best value for threshold

In [None]:
o = 0.00
o_list = []
f1_list = []

while o <= 1.00:

  boundariesList = []

  x = len(euclidList)
  for i in range(num_pics):
    if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
      boundariesList.append(0)
    elif(abs(euclidList[i]) - (euclidList[i+1]) >= o):
        boundariesList.append(1)
    else:
      boundariesList.append(0)


  ################################################################
    tp = 0; fp = 0; tn = 0; fn = 0;
  for bi in range(num_pics):
      # If actual==1 and pred==1, increment true positives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
      tp = tp + 1
      # If actual==1 and pred==0, increment false negatives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
      fn = fn + 1
      # If actual==0 and pred==1, increment false positives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
      fp = fp + 1
      # If actual==0 and pred==0, increment true negatives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
      tn = tn + 1


  # Compute precision and recall
  denom = (tp + fp)
  if denom > 0:
      precision = tp / denom
  else:
      precision = 0
  denom = (tp + fn)
  if denom > 0:
      recall = tp / denom
  else:
      recall = 0

  # Compute F1 score
  denom = (precision+recall)
  if denom > 0:
      f1 = 2 * ((precision*recall)/denom)
  else:
      f1 = 0
  # Return all metrics
  res = {
          'tp': tp,
          'fp': fp,
          'tn': tn,
          'fn': fn,
          'precision': precision,
          'recall': recall,
          'f1': f1
  }
  o_list.append(o)
  f1_list.append(res['f1'])
  o += 0.05

print(o_list)
print(f1_list)



Utilising matplotlib to plot out the graph for for relationship between threshold and accuracy

In [None]:
import matplotlib.pyplot as plt

plt.plot(o_list, f1_list)
plt.xlabel("Diff_value (O)")
plt.ylabel("F1-score")
plt.xlim(0, 1.00)
plt.ylim(0, 1.00)
plt.show

# Experiment 3.2 - Genetic Algorithm


This experiment will be utilising Genetic Algorithm for selecting best set of queries on top of alternative approach 2. This section is used to generate for both imageCIFARTOP5 and insectDomainClass.

Useful resources:
  https://towardsdatascience.com/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3

  https://www.kaggle.com/code/aaawnrahman/genetic-algorithm/notebook

  https://anderfernandez.com/en/blog/genetic-algorithm-in-python/

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
import random
import numpy as np
import pandas as pd

device = "cuda" if torch.cuda.is_available() else "cpu"

# pass in the captions folder
with open('/content/gdrive/MyDrive/Colab Notebooks/insectDomainClass.txt') as f:
    image_descriptions = [line.rstrip() for line in f]

model, preprocess = clip.load("ViT-L/14@336px", device=device)
source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) 
dir_contents.sort()
num_pics = len(dir_contents)   

# Here convert list into dictionary with 'int' as key
list1 = list(range(1,len(image_descriptions)+1))
list2 = []

# here append queries into list
for c in image_descriptions:
  list2.append(c)

class_dictionary = dict(zip(list1, list2))
print(class_dictionary)

# here can access the value from key
#x = class_dictionary.get()
#y = class_dictionary.get()
#print(x)
#print(y)

Process the actual boundaries file

In [None]:
import pandas as pd


boundaries_path_file = "/content/gdrive/MyDrive/Colab Notebooks/data-1/Boundaries.txt"
boundaries_df = pd.read_csv(boundaries_path_file)
bound_strings = boundaries_df.columns.tolist()
num_bound = len(bound_strings)
bound_int = []

#convert string list to int list
for i in range(0,len(bound_strings)):
    bound_strings[i] = int(bound_strings[i])

trueboundariesList = []


for j in range(num_pics):                 # num_pics = 273
  if j == bound_strings[num_bound -1]:    # -1 cause j start from 0, when 34 meet 34 is last
      trueboundariesList.append(0)
  elif any(j +1 == y  for y in bound_strings):
    trueboundariesList.append(1)
  else:
     trueboundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {trueboundariesList[i]}")

# there are 35 of boundaries detected, which match with the Boundaries.txt for data-1
count = len([elem for elem in trueboundariesList if elem == 1])

Perform optimisation to seach optimal sets of text queries using genetic algorithm

In [None]:
# Generate initial chromosomes
from operator import itemgetter

def generate_initial_population():
    initial_population = []
    for x in range(6):
        new_chromosome = []
        for y in range(5):
            new_chromosome.append(random.randint(1,len(image_descriptions))) 
        initial_population.append(new_chromosome)
    return initial_population

# Evaluate fitness of chromosome
def evaluate_chromosome(chromosome):
    queriesList = []
    queriesList.append(class_dictionary.get(chromosome[0]))
    queriesList.append(class_dictionary.get(chromosome[1]))
    queriesList.append(class_dictionary.get(chromosome[2]))
    queriesList.append(class_dictionary.get(chromosome[3]))
    queriesList.append(class_dictionary.get(chromosome[4]))
    queriesList.sort()

    firstCaption = []
    secondCaption = []
    thirdCaption = []
    fourthCaption = []
    fifthCaption = []

    dir_contents = os.listdir(source_path) # returns list
    dir_contents.sort()
    num_pics = len(dir_contents)   # find number of pictures in directory

    for i in range(num_pics):
    #Load and prepare images
      image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
      caption_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in queriesList]).to(device)

      with torch.no_grad():
          image_features = model.encode_image(image)
          text_features = model.encode_text(caption_inputs)

      image_features /= image_features.norm(dim=-1, keepdim=True)
      text_features /= text_features.norm(dim=-1, keepdim=True)
      similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
      values, indices = similarity[0].topk(5, sorted = False)

      # Create a list of tuples containing the caption and its corresponding score
      captions_and_scores = [(queriesList[idx], values[idx].item()) for idx in indices]
      # Sort the list based on the original order of the captions
      captions_and_scores.sort(key=lambda x: queriesList.index(x[0]))

      firstCaption.append(captions_and_scores[0][1])
      secondCaption.append(captions_and_scores[1][1])
      thirdCaption.append(captions_and_scores[2][1])
      fourthCaption.append(captions_and_scores[3][1])
      fifthCaption.append(captions_and_scores[4][1])

    euclidList = []
    x = len(firstCaption)
    # initializing points in numpy arrays
    for i in range (num_pics):
      if (firstCaption[i] == firstCaption [x -1]) and   (secondCaption [i] ==secondCaption[x-1]) and (thirdCaption [i] ==thirdCaption[x-1]) and (fourthCaption [i] ==fourthCaption[x-1]) and (fifthCaption [i] ==fifthCaption[x-1]):
        dist = 0.00
      else:
        image1 = np.array((firstCaption [i], secondCaption [i], thirdCaption [i], fourthCaption [i],fifthCaption [i]))
        image2 = np.array((firstCaption [i+1], secondCaption [i+1], thirdCaption [i+1], fourthCaption [i+1],fifthCaption [i+1]))
        dist = np.linalg.norm(image1 - image2)

      euclidList.append(dist)

    boundariesList = []

    x = len(euclidList)
    for i in range(num_pics):
      if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
        boundariesList.append(0)
      elif(abs(euclidList[i]) - (euclidList[i+1]) > 0.40):
        boundariesList.append(1)
      else:
        boundariesList.append(0)

    tp = 0; fp = 0; tn = 0; fn = 0;
    for bi in range(num_pics):
        # If actual==1 and pred==1, increment true positives
      if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
        tp = tp + 1
        # If actual==1 and pred==0, increment false negatives
      if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
        fn = fn + 1
        # If actual==0 and pred==1, increment false positives
      if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
        fp = fp + 1
        # If actual==0 and pred==0, increment true negatives
      if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
        tn = tn + 1

    # Compute precision and recall
    denom = (tp + fp)
    if denom > 0:
        precision = tp / denom
    else:
        precision = 0
    denom = (tp + fn)
    if denom > 0:
        recall = tp / denom
    else:
        recall = 0
    # Compute F1 score

    denom = (precision+recall)
    if denom > 0:
        f1 = 2 * ((precision*recall)/denom)
    else:
        f1 = 0
    # Return all metrics
    res = {
            'tp': tp,
            'fp': fp,
            'tn': tn,
            'fn': fn,
            'precision': precision,
            'recall': recall,
            'f1': f1
    }
    fitness = res['f1']
    return fitness

# Evaluate fitness of population
def evaluate_population(population):
    evaluated_population = []
    for chromosome in population:
        fitness = evaluate_chromosome(chromosome)
        evaluated_population.append((chromosome, fitness))
    return evaluated_population

# Save top 3 fittest chromosome
def evaluate_fittest(old_list, new_list):
    # Adds all data to same list
    improved_list = list(old_list)
    improved_list.extend(x for x in new_list if x not in improved_list)
    # Sort list
    improved_list.sort(key=lambda x: x[1], reverse=True)
    # Return top 3 chromosomes only
    return improved_list[:3]

# Tournament selection for parents
def tournament_selection(evaluated_population):
    new_parents = []
    for x in range(6):
      # from the population generated last round, randomly select 1 to be parents from 3
        random_sample = random.sample(evaluated_population, 3)
        new_parent = max(random_sample, key=itemgetter(1))[0]     # 1 means second variables, so is saying fitness here. [0] is return highest fitness
        new_parents.append(new_parent)
    return new_parents

# Chromosome crossover
def crossover(first_parent, second_parent):
    # No crossover occurs
    if random.random() > CROSSOVER_RATE:
        chromosomes = [first_parent.copy(), second_parent.copy()]
    else:
        # Single crossover
        chromosomes = single_crossover(first_parent, second_parent)
    return chromosomes

# Single-point crossover
def single_crossover(first_parent, second_parent):
    # Get crossover point
    crossover_point = random.randint(1, 3)
    # Perform crossover
    first_chromosome = first_parent[:crossover_point] + second_parent[crossover_point:]
    second_chromosome = second_parent[:crossover_point] + first_parent[crossover_point:]
    return [first_chromosome, second_chromosome]

# Mutate chromosome
def mutate_chromosome(chromosome):
    for x in range(4):
        if random.random() < MUTATION_RATE:
            chromosome[x] = random.randint(1,len(image_descriptions))

# Genetic algorithm
def genetic_algorithm():
    # Generate initial population
    population = generate_initial_population()
    # Top 3 fittest chromosomes
    fittest_chromosomes = []
    # Run for 10 generations
    for generation in range(10):
        # Evaluate population
        evaluated_population = evaluate_population(population)
        # Reevaluate fittest chromosome
        fittest_chromosomes = evaluate_fittest(fittest_chromosomes, evaluated_population)
        print([x[1] for x in fittest_chromosomes])
        # Attain new parents
        new_parents = tournament_selection(evaluated_population)
        # New population
        new_population = []
        # Generate new population
        for x in range(0, 6, 2):
            # Get new parents
            first_parent, second_parent = new_parents[x], new_parents[x + 1]
            # Mutate new chromosomes
            for chromosome in crossover(first_parent, second_parent):
                # Mutate chromosome
                mutate_chromosome(chromosome)
                # Get mutated chromosome
                new_population.append(chromosome)
        # Replace population
        population = new_population
    # Return top 3 fittest chromosomes
    return fittest_chromosomes

CROSSOVER_RATE = 0.80
MUTATION_RATE = 0.20 

genetic_algorithm()



In [None]:
x = class_dictionary.get(44)
y = class_dictionary.get(23)
z = class_dictionary.get(58)
a = class_dictionary.get(24)
b = class_dictionary.get(12)
print(f"{x},{y},{z},{a},{b}")


# Experiment 3.3 - Number of Text Queries


This section is experimenting with the impact of different numbers of text queries. Focus here is 10 and 15.

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
import random
import numpy as np
import pandas as pd

device = "cuda" if torch.cuda.is_available() else "cpu"

# some initial set up
with open('/content/gdrive/MyDrive/Colab Notebooks/insectDomainClass.txt') as f:
    image_descriptions = [line.rstrip() for line in f]

model, preprocess = clip.load("ViT-L/14@336px", device=device)
source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) 
dir_contents.sort()
num_pics = len(dir_contents)   

# Here convert list into dictionary with 'int' as key
list1 = list(range(1,len(image_descriptions)+1))
list2 = []

# here append queries into list
for c in image_descriptions:
  list2.append(c)

class_dictionary = dict(zip(list1, list2))
print(class_dictionary)


In [None]:
# Processing the actual boundaries file
import pandas as pd


boundaries_path_file = "/content/gdrive/MyDrive/Colab Notebooks/data-1/Boundaries.txt"
boundaries_df = pd.read_csv(boundaries_path_file)
bound_strings = boundaries_df.columns.tolist()
num_bound = len(bound_strings)
bound_int = []

#convert string list to int list
for i in range(0,len(bound_strings)):
    bound_strings[i] = int(bound_strings[i])

trueboundariesList = []


for j in range(num_pics):                 # num_pics = 273
  if j == bound_strings[num_bound -1]:    # -1 cause j start from 0, when 34 meet 34 is last
      trueboundariesList.append(0)
  elif any(j +1 == y  for y in bound_strings):
    trueboundariesList.append(1)
  else:
     trueboundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {trueboundariesList[i]}")

# there are 35 of boundaries detected, which match with the Boundaries.txt for data-1
count = len([elem for elem in trueboundariesList if elem == 1])

The genetic algorithm below is used to find the optimal text queries for 10 and 15. I had commented out 10 text queries related code.

In [None]:
# Generate initial chromosomes
from operator import itemgetter

def generate_initial_population():
    initial_population = []
    for x in range(6):
        new_chromosome = []
        #for y in range(10):
        for y in range(15):
            new_chromosome.append(random.randint(1,len(image_descriptions))) 
        initial_population.append(new_chromosome)
    return initial_population

# Evaluate fitness of chromosome
def evaluate_chromosome(chromosome):
    queriesList = []
    queriesList.append(class_dictionary.get(chromosome[0]))
    queriesList.append(class_dictionary.get(chromosome[1]))
    queriesList.append(class_dictionary.get(chromosome[2]))
    queriesList.append(class_dictionary.get(chromosome[3]))
    queriesList.append(class_dictionary.get(chromosome[4]))
    queriesList.append(class_dictionary.get(chromosome[5]))
    queriesList.append(class_dictionary.get(chromosome[6]))
    queriesList.append(class_dictionary.get(chromosome[7]))
    queriesList.append(class_dictionary.get(chromosome[8]))
    queriesList.append(class_dictionary.get(chromosome[9]))

    queriesList.append(class_dictionary.get(chromosome[10]))
    queriesList.append(class_dictionary.get(chromosome[11]))
    queriesList.append(class_dictionary.get(chromosome[12]))
    queriesList.append(class_dictionary.get(chromosome[13]))
    queriesList.append(class_dictionary.get(chromosome[14]))

    queriesList.sort()
    
    firstCaption = []
    secondCaption = []
    thirdCaption = []
    fourthCaption = []
    fifthCaption = []
    sixthCaption = []
    seventhCaption = []
    eighthCaption = []
    ninthCaption = []
    tenthCaption = []

    elevenCaption = []
    twelveCaption = []
    thirteenCaption = []
    fourteenCaption = []
    fifteenCaption = []

    dir_contents = os.listdir(source_path) # returns list
    dir_contents.sort()
    num_pics = len(dir_contents)   # find number of pictures in directory

    for i in range(num_pics):
    #Load and prepare images
      image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
      caption_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in queriesList]).to(device)

      with torch.no_grad():
          image_features = model.encode_image(image)
          text_features = model.encode_text(caption_inputs)

      image_features /= image_features.norm(dim=-1, keepdim=True)
      text_features /= text_features.norm(dim=-1, keepdim=True)
      similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
      #values, indices = similarity[0].topk(10, sorted = False)
      values, indices = similarity[0].topk(15, sorted = False)

      # Create a list of tuples containing the caption and its corresponding score
      captions_and_scores = [(queriesList[idx], values[idx].item()) for idx in indices]
      # Sort the list based on the original order of the captions
      captions_and_scores.sort(key=lambda x: queriesList.index(x[0]))

      firstCaption.append(captions_and_scores[0][1])
      secondCaption.append(captions_and_scores[1][1])
      thirdCaption.append(captions_and_scores[2][1])
      fourthCaption.append(captions_and_scores[3][1])
      fifthCaption.append(captions_and_scores[4][1])
      sixthCaption.append(captions_and_scores[5][1])
      seventhCaption.append(captions_and_scores[6][1])
      eighthCaption.append(captions_and_scores[7][1])
      ninthCaption.append(captions_and_scores[8][1])
      tenthCaption.append(captions_and_scores[9][1])

      elevenCaption.append(captions_and_scores[10][1])
      twelveCaption.append(captions_and_scores[11][1])
      thirteenCaption.append(captions_and_scores[12][1])
      fourteenCaption.append(captions_and_scores[13][1])
      fifteenCaption.append(captions_and_scores[14][1])

    euclidList = []
    x = len(firstCaption)
    # initializing points in numpy arrays
    for i in range (num_pics):
      if (firstCaption[i] == firstCaption [x -1]) and   (secondCaption [i] ==secondCaption[x-1]) and (thirdCaption [i] ==thirdCaption[x-1]) and \
      (fourthCaption [i] ==fourthCaption[x-1]) and (fifthCaption [i] ==fifthCaption[x-1]) and (sixthCaption [i] == sixthCaption[x-1])\
      and (seventhCaption [i] ==seventhCaption[x-1])and (eighthCaption [i] ==eighthCaption[x-1])and (ninthCaption [i] ==ninthCaption[x-1])\
      and (tenthCaption [i] ==tenthCaption[x-1]) and (elevenCaption[i] == elevenCaption [x -1]) and   (twelveCaption [i] ==twelveCaption[x-1]) \
      and (thirteenCaption [i] ==thirteenCaption[x-1]) and (fourteenCaption [i] ==fourteenCaption[x-1]) and (fifteenCaption [i] ==fifteenCaption[x-1]):
        dist = 0.00
      else:
        image1 = np.array((firstCaption [i], secondCaption [i], thirdCaption [i], fourthCaption [i],fifthCaption [i],sixthCaption [i], seventhCaption [i], eighthCaption [i], ninthCaption [i],tenthCaption [i],\
                           elevenCaption [i], twelveCaption [i], thirteenCaption [i], fourteenCaption [i],fifteenCaption [i]))
        image2 = np.array((firstCaption [i+1], secondCaption [i+1], thirdCaption [i+1], fourthCaption [i+1],fifthCaption [i+1],sixthCaption [i+1], seventhCaption [i+1], eighthCaption [i+1], ninthCaption [i+1],tenthCaption [i+1],\
                           elevenCaption [i+1], twelveCaption [i+1], thirteenCaption [i+1], fourteenCaption [i+1],fifteenCaption [i+1]))
        dist = np.linalg.norm(image1 - image2)

      euclidList.append(dist)

    boundariesList = []

    x = len(euclidList)
    for i in range(num_pics):
      if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
        boundariesList.append(0)
      elif(abs(euclidList[i]) - (euclidList[i+1]) > 0.40):
        boundariesList.append(1)
      else:
        boundariesList.append(0)

    tp = 0; fp = 0; tn = 0; fn = 0;
    for bi in range(num_pics):
        # If actual==1 and pred==1, increment true positives
      if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
        tp = tp + 1
        # If actual==1 and pred==0, increment false negatives
      if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
        fn = fn + 1
        # If actual==0 and pred==1, increment false positives
      if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
        fp = fp + 1
        # If actual==0 and pred==0, increment true negatives
      if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
        tn = tn + 1

    # Compute precision and recall
    denom = (tp + fp)
    if denom > 0:
        precision = tp / denom
    else:
        precision = 0
    denom = (tp + fn)
    if denom > 0:
        recall = tp / denom
    else:
        recall = 0
    # Compute F1 score

    denom = (precision+recall)
    if denom > 0:
        f1 = 2 * ((precision*recall)/denom)
    else:
        f1 = 0
    # Return all metrics
    res = {
            'tp': tp,
            'fp': fp,
            'tn': tn,
            'fn': fn,
            'precision': precision,
            'recall': recall,
            'f1': f1
    }
    fitness = res['f1']
    return fitness

# Evaluate fitness of population
def evaluate_population(population):
    evaluated_population = []
    for chromosome in population:
        fitness = evaluate_chromosome(chromosome)
        evaluated_population.append((chromosome, fitness))
    return evaluated_population

# Save top 3 fittest chromosome
def evaluate_fittest(old_list, new_list):
    # Adds all data to same list
    improved_list = list(old_list)
    improved_list.extend(x for x in new_list if x not in improved_list)
    # Sort list
    improved_list.sort(key=lambda x: x[1], reverse=True)
    # Return top 3 chromosomes only
    return improved_list[:3]

# Tournament selection for parents
def tournament_selection(evaluated_population):
    new_parents = []
    for x in range(6):
        random_sample = random.sample(evaluated_population, 3)    # from the population generated last round, randomly select 1 to be parents
        new_parent = max(random_sample, key=itemgetter(1))[0]     # 1 means second variables, so is saying fitness here. [0] is return highest 1
        new_parents.append(new_parent)
    return new_parents

# Single-point crossover
def single_crossover(first_parent, second_parent):
    # Get crossover point
    #crossover_point = random.randint(1, 8)
    crossover_point = random.randint(1, 13)
    # Perform crossover
    first_chromosome = first_parent[:crossover_point] + second_parent[crossover_point:]
    second_chromosome = second_parent[:crossover_point] + first_parent[crossover_point:]
    return [first_chromosome, second_chromosome]


# Chromosome crossover
def crossover(first_parent, second_parent):
    # No crossover occurs
    if random.random() > CROSSOVER_RATE:
        chromosomes = [first_parent.copy(), second_parent.copy()]
    else:
        # Single crossover
        chromosomes = single_crossover(first_parent, second_parent)
    return chromosomes

# Mutate chromosome
def mutate_chromosome(chromosome):
    #for x in range(9):
    for x in range(14):
        if random.random() < MUTATION_RATE:
            chromosome[x] = random.randint(1,len(image_descriptions))

# Genetic algorithm
def genetic_algorithm():
    # Generate initial population
    population = generate_initial_population()
    # Top 3 fittest chromosomes
    fittest_chromosomes = []
    # Run for 10 generations
    for generation in range(10):
        # Evaluate population
        evaluated_population = evaluate_population(population)
        # Reevaluate fittest chromosome
        fittest_chromosomes = evaluate_fittest(fittest_chromosomes, evaluated_population)
        print([x[1] for x in fittest_chromosomes])
        # Get new parents
        new_parents = tournament_selection(evaluated_population)
        # New population
        new_population = []
        # Generate new population
        for x in range(0, 6, 2):
            # Attain new parents
            first_parent, second_parent = new_parents[x], new_parents[x + 1]
            # Mutate new chromosomes
            for chromosome in crossover(first_parent, second_parent):
                # Mutate chromosome
                mutate_chromosome(chromosome)
                # Get mutated chromosome
                new_population.append(chromosome)
        # Replace population
        population = new_population
    # Return top 3 fittest chromosomes
    return fittest_chromosomes

CROSSOVER_RATE = 0.80
MUTATION_RATE = 0.067   # for 15 text queries
#MUTATION_RATE = 0.10   # for 10 text queries

genetic_algorithm()



In [None]:
a = class_dictionary.get(1)
b = class_dictionary.get(25)
c = class_dictionary.get(2)
d = class_dictionary.get(53)
e = class_dictionary.get(52)
f = class_dictionary.get(37)
g = class_dictionary.get(15)
h = class_dictionary.get(44)
i = class_dictionary.get(3)
j = class_dictionary.get(7)

k = class_dictionary.get(53)
l = class_dictionary.get(31)
m = class_dictionary.get(32)
n = class_dictionary.get(26)
o = class_dictionary.get(22)

#print(f"{a},{b},{c},{d},{e},{f},{g},{h},{i},{j}")
print(f"{a},{b},{c},{d},{e},{f},{g},{h},{i},{j},{k},{l},{m},{n},{o}")

This section is to explore the interpretability of captions, test and perform threshold optimisation.

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 

resultList = []
firstCaption = []
secondCaption = []
thirdCaption = []
fourthCaption = []
fifthCaption = []
sixthCaption = []
seventhCaption = []
eighthCaption = []
ninthCaption = []
tenthCaption = []

elevenCaption = []
twelveCaption = []
thirteenCaption = []
fourteenCaption = []
fifteenCaption = []

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = clip.load("ViT-L/14@336px", device=device)
# input the text queries in caption list
caption = ["scorpion","angel insects","heelwalker","lacewing","booklice","flea"
,"stonefly","thrips","true bug","ice crawler","harvestmen","webspinner","earwig","katydid","cricket"]
caption.sort()  # sort captions in alphabetical order

source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) # returns list
dir_contents.sort()
num_pics = len(dir_contents)   # find number of pictures in directory

for i in range(num_pics):
#Load and prepare images
  print("\nImage Title: ",dir_contents[i]);
  image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
  caption_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in caption]).to(device)
 
  with torch.no_grad():
      image_features = model.encode_image(image)
      text_features = model.encode_text(caption_inputs)

  image_features /= image_features.norm(dim=-1, keepdim=True)
  text_features /= text_features.norm(dim=-1, keepdim=True)
  similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
  #values, indices = similarity[0].topk(10, sorted = False)
  values, indices = similarity[0].topk(15, sorted = False)

  # Create a list of tuples containing the caption and its corresponding score
  captions_and_scores = [(caption[idx], values[idx].item()) for idx in indices]
  # Sort the list based on the original order of the captions
  captions_and_scores.sort(key=lambda x: caption.index(x[0]))

  firstCaption.append(captions_and_scores[0][1])
  secondCaption.append(captions_and_scores[1][1])
  thirdCaption.append(captions_and_scores[2][1])
  fourthCaption.append(captions_and_scores[3][1])
  fifthCaption.append(captions_and_scores[4][1])
  sixthCaption.append(captions_and_scores[5][1])
  seventhCaption.append(captions_and_scores[6][1])
  eighthCaption.append(captions_and_scores[7][1])
  ninthCaption.append(captions_and_scores[8][1])
  tenthCaption.append(captions_and_scores[9][1])

  elevenCaption.append(captions_and_scores[10][1])
  twelveCaption.append(captions_and_scores[11][1])
  thirteenCaption.append(captions_and_scores[12][1])
  fourteenCaption.append(captions_and_scores[13][1])
  fifteenCaption.append(captions_and_scores[14][1])

  print("\nPredictions:\n")
  for caption_score in captions_and_scores:
      print(f"{caption_score[0]:>16s}: {caption_score[1]:.4f}")



In [None]:
# Calculate Euclidean Distance Here 
import numpy as np

euclidList = []
x = len(firstCaption)
# initializing points in numpy arrays
for i in range (num_pics):
  if (firstCaption[i] == firstCaption [x -1]) and   (secondCaption [i] ==secondCaption[x-1]) and (thirdCaption [i] ==thirdCaption[x-1]) and \
    (fourthCaption [i] ==fourthCaption[x-1]) and (fifthCaption [i] ==fifthCaption[x-1]) and (sixthCaption [i] == sixthCaption[x-1])\
    and (seventhCaption [i] ==seventhCaption[x-1])and (eighthCaption [i] ==eighthCaption[x-1])and (ninthCaption [i] ==ninthCaption[x-1])\
    and (tenthCaption [i] ==tenthCaption[x-1]) and (elevenCaption[i] == elevenCaption [x -1]) and   (twelveCaption [i] ==twelveCaption[x-1]) \
    and (thirteenCaption [i] ==thirteenCaption[x-1]) and (fourteenCaption [i] ==fourteenCaption[x-1]) and (fifteenCaption [i] ==fifteenCaption[x-1]):
    dist = 0.00

  else:
    image1 = np.array((firstCaption [i], secondCaption [i], thirdCaption [i], fourthCaption [i],fifthCaption [i],sixthCaption [i], seventhCaption [i], eighthCaption [i], ninthCaption [i],tenthCaption [i],\
                       elevenCaption [i], twelveCaption [i], thirteenCaption [i], fourteenCaption [i],fifteenCaption [i]))
    image2 = np.array((firstCaption [i+1], secondCaption [i+1], thirdCaption [i+1], fourthCaption [i+1],fifthCaption [i+1],sixthCaption [i+1], seventhCaption [i+1], eighthCaption [i+1], ninthCaption [i+1],tenthCaption [i+1],\
                        elevenCaption [i+1], twelveCaption [i+1], thirteenCaption [i+1], fourteenCaption [i+1],fifteenCaption [i+1]))
    dist = np.linalg.norm(image1 - image2)  
    # calculating Euclidean distance using linalg.norm()

  euclidList.append(dist)
 
# printing Euclidean distance
print(f"Length: {len(euclidList)} ")

#********************************************************************************
# process the input image data into a list with 0, 1 act as the boundaries

boundariesList = []

x = len(euclidList)
print(f"\nNumber of Pics: {num_pics}")
print(f"Length of List: {x}" )  
for i in range(num_pics):
  if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
    boundariesList.append(0)
  elif(abs(euclidList[i]) - (euclidList[i+1]) > 0.40):
      boundariesList.append(1)
  else:
    boundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {euclidList[i]} {boundariesList[i]}")

In [None]:
# Calculate F-1 

# initializations
# Compute TP, FP, TN, FN
tp = 0; fp = 0; tn = 0; fn = 0;
for bi in range(num_pics):
    # If actual==1 and pred==1, increment true positives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
    tp = tp + 1
    # If actual==1 and pred==0, increment false negatives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
    fn = fn + 1
    # If actual==0 and pred==1, increment false positives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
    fp = fp + 1
    # If actual==0 and pred==0, increment true negatives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
    tn = tn + 1

# Display tp, fp, tn, fn
print('True positives: ', tp)
print('False positives: ', fp)
print('True negatives: ', tn)
print('False negatives: ', fn)

# Compute precision and recall
denom = (tp + fp)
if denom > 0:
    precision = tp / denom
else:
    precision = 0
denom = (tp + fn)
if denom > 0:
    recall = tp / denom
else:
    recall = 0
# Compute F1 score

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = {
        'tp': tp,
        'fp': fp,
        'tn': tn,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}

print(res)
 

In [None]:
# this block of code is to add loop to find the best value for boundaries

o = 0.00
o_list = []
f1_list = []

while o <= 1.00:

  boundariesList = []

  x = len(euclidList)
  #print(f"\nNumber of Pics: {num_pics}")
  #print(f"Length of List: {x}" )  
  for i in range(num_pics):
    if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
      boundariesList.append(0)
    elif(abs(euclidList[i]) - (euclidList[i+1]) >= o):
        boundariesList.append(1)
    else:
      boundariesList.append(0)

  #for i in range(num_pics):
    #print(f"{i+1}. {euclidList[i]} {boundariesList[i]}")

  ################################################################
    tp = 0; fp = 0; tn = 0; fn = 0;
  for bi in range(num_pics):
      # If actual==1 and pred==1, increment true positives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
      tp = tp + 1
      # If actual==1 and pred==0, increment false negatives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
      fn = fn + 1
      # If actual==0 and pred==1, increment false positives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
      fp = fp + 1
      # If actual==0 and pred==0, increment true negatives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
      tn = tn + 1


  # Compute precision and recall
  denom = (tp + fp)
  if denom > 0:
      precision = tp / denom
  else:
      precision = 0
  denom = (tp + fn)
  if denom > 0:
      recall = tp / denom
  else:
      recall = 0
  # Compute F1 score

  denom = (precision+recall)
  if denom > 0:
      f1 = 2 * ((precision*recall)/denom)
  else:
      f1 = 0
  # Return all metrics
  res = {
          'tp': tp,
          'fp': fp,
          'tn': tn,
          'fn': fn,
          'precision': precision,
          'recall': recall,
          'f1': f1
  }
  o_list.append(o)
  f1_list.append(res['f1'])
  o += 0.05

print(o_list)
print(f1_list)
#*******************************************************************
import matplotlib.pyplot as plt

plt.plot(o_list, f1_list)
plt.xlabel("Diff_value (O)")
plt.ylabel("F1-score")
plt.xlim(0, 1.00)
plt.ylim(0, 1.00)
plt.show


# Experiment 3.4 - CLIP's image encoder embedding space

This experiment is using CLIP's image encoder embedding space for comparison purposes.

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = clip.load("ViT-L/14@336px", device=device)

source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) # returns list
dir_contents.sort()
num_pics = len(dir_contents)   # find number of pictures in directory
print(num_pics)

images = []
image_features = []
for i in range(num_pics):
#Load and prepare images
  image_path = os.path.join(source_path, dir_contents[i])
  image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)
  images.append(image)

  with torch.no_grad():
    features = model.encode_image(image)
    features /= features.norm(dim=-1, keepdim=True)
    image_features.append(features)

similarities = []

# saved the similarity of image features on adjacent images in similarities list
for i in range(num_pics-1):   #range start from 0
    similarity = (100.0 * image_features[i] @ image_features[i+1].T).item()
    similarities.append(similarity)

# as here compute difference only so at the end will get num_pics - 1, so I add the last 1
x = len(similarities)
similarities.append(similarities[x-1])

print(similarities)
print(len(similarities))
                                    

In [None]:
# process the input image data into a list with 0, 1 act as the boundaries

boundariesList = []
x = len(similarities)
print(f"\nNumber of Pics: {num_pics}")
print(f"Length of List: {x}" )  
for i in range(num_pics):
  if similarities[i] == similarities[x-1]:    #length count from 1, so need deduct 1
    boundariesList.append(0)
  elif(similarities[i]  < 88):
      boundariesList.append(1)
  else:
    boundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. Image{i+1} {boundariesList[i]}")

In [None]:
# Processing the actual boundaries file
import pandas as pd


boundaries_path_file = "/content/gdrive/MyDrive/Colab Notebooks/data-1/Boundaries.txt"
boundaries_df = pd.read_csv(boundaries_path_file)
bound_strings = boundaries_df.columns.tolist()
num_bound = len(bound_strings)
bound_int = []

#convert string list to int list
for i in range(0,len(bound_strings)):
    bound_strings[i] = int(bound_strings[i])

trueboundariesList = []


for j in range(num_pics):                 # num_pics = 273
  if j == bound_strings[num_bound -1]:    # -1 cause j start from 0, when 34 meet 34 is last
      trueboundariesList.append(0)
  elif any(j +1 == y  for y in bound_strings):
    trueboundariesList.append(1)
  else:
     trueboundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {trueboundariesList[i]} {boundariesList[i]}")

# there are 35 of boundaries detected, which match with the Boundaries.txt for data-1
count = len([elem for elem in trueboundariesList if elem == 1])

In [None]:
# Calculate F-1 

# initializations
# Compute TP, FP, TN, FN
tp = 0; fp = 0; tn = 0; fn = 0;
for bi in range(num_pics):
    # If actual==1 and pred==1, increment true positives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
    tp = tp + 1
    # If actual==1 and pred==0, increment false negatives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
    fn = fn + 1
    # If actual==0 and pred==1, increment false positives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
    fp = fp + 1
    # If actual==0 and pred==0, increment true negatives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
    tn = tn + 1

# Display tp, fp, tn, fn
print('True positives: ', tp)
print('False positives: ', fp)
print('True negatives: ', tn)
print('False negatives: ', fn)

# Compute precision and recall
denom = (tp + fp)
if denom > 0:
    precision = tp / denom
else:
    precision = 0
denom = (tp + fn)
if denom > 0:
    recall = tp / denom
else:
    recall = 0
# Compute F1 score

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = {
        'tp': tp,
        'fp': fp,
        'tn': tn,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}

print(res)
 

Find the best value for threshold

In [None]:
o = 0.00
o_list = []
f1_list = []

while o <= 100.00:

  boundariesList = []
  x = len(similarities)
  for i in range(num_pics):
    if similarities[i] == similarities[x-1]:    #length count from 1, so need deduct 1
      boundariesList.append(0)
    elif(similarities[i]  < o):
        boundariesList.append(1)
    else:
      boundariesList.append(0)

  ################################################################
    tp = 0; fp = 0; tn = 0; fn = 0;
  for bi in range(num_pics):
      # If actual==1 and pred==1, increment true positives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
      tp = tp + 1
      # If actual==1 and pred==0, increment false negatives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
      fn = fn + 1
      # If actual==0 and pred==1, increment false positives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
      fp = fp + 1
      # If actual==0 and pred==0, increment true negatives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
      tn = tn + 1


  # Compute precision and recall
  denom = (tp + fp)
  if denom > 0:
      precision = tp / denom
  else:
      precision = 0
  denom = (tp + fn)
  if denom > 0:
      recall = tp / denom
  else:
      recall = 0
  # Compute F1 score

  denom = (precision+recall)
  if denom > 0:
      f1 = 2 * ((precision*recall)/denom)
  else:
      f1 = 0
  # Return all metrics
  res = {
          'tp': tp,
          'fp': fp,
          'tn': tn,
          'fn': fn,
          'precision': precision,
          'recall': recall,
          'f1': f1
  }
  o_list.append(o)
  f1_list.append(res['f1'])
  o += 1.00


print(o_list)
print(f1_list)

a = (max(f1_list))
print(f1_list.index(a))
print(a)



In [None]:
import matplotlib.pyplot as plt

plt.plot(o_list, f1_list)
plt.xlabel("Diff_value (O)")
plt.ylabel("F1-score")
plt.xlim(0, 100.00)
plt.ylim(0, 1.00)
plt.show

# Experiment 3.5 - Distance Metrics

Explore the impact of using different distance metrics such as Manhattan distance, Minkowski distance and Cosine similarity.

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
import random
import numpy as np
import pandas as pd

device = "cuda" if torch.cuda.is_available() else "cpu"

# some initial set up
with open('/content/gdrive/MyDrive/Colab Notebooks/insectDomainClass.txt') as f:
    image_descriptions = [line.rstrip() for line in f]

model, preprocess = clip.load("ViT-L/14@336px", device=device)
source_path = '/content/gdrive/MyDrive/Colab Notebooks/data-1/imgs/'
dir_contents = os.listdir(source_path) 
dir_contents.sort()
num_pics = len(dir_contents)   

# Here convert list into dictionary with 'int' as key
list1 = list(range(1,len(image_descriptions)+1))
list2 = []

# here append queries into list
for c in image_descriptions:
  list2.append(c)

class_dictionary = dict(zip(list1, list2))
print(class_dictionary)


In [None]:
# Processing the actual boundaries file
import pandas as pd


boundaries_path_file = "/content/gdrive/MyDrive/Colab Notebooks/data-1/Boundaries.txt"
boundaries_df = pd.read_csv(boundaries_path_file)
bound_strings = boundaries_df.columns.tolist()
num_bound = len(bound_strings)
bound_int = []

#convert string list to int list
for i in range(0,len(bound_strings)):
    bound_strings[i] = int(bound_strings[i])

trueboundariesList = []


for j in range(num_pics):                 # num_pics = 273
  if j == bound_strings[num_bound -1]:    # -1 cause j start from 0, when 34 meet 34 is last
      trueboundariesList.append(0)
  elif any(j +1 == y  for y in bound_strings):
    trueboundariesList.append(1)
  else:
     trueboundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {trueboundariesList[i]}")

# there are 35 of boundaries detected, which match with the Boundaries.txt for data-1
count = len([elem for elem in trueboundariesList if elem == 1])

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 

resultList = []
firstCaption = []
secondCaption = []
thirdCaption = []
fourthCaption = []
fifthCaption = []
sixthCaption = []
seventhCaption = []
eighthCaption = []
ninthCaption = []
tenthCaption = []

elevenCaption = []
twelveCaption = []
thirteenCaption = []
fourteenCaption = []
fifteenCaption = []

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = clip.load("ViT-L/14@336px", device=device)
caption = ["scorpion","angel insects","heelwalker","lacewing","booklice","flea"
,"stonefly","thrips","true bug","ice crawler","harvestmen","webspinner","earwig","katydid","cricket"]
caption.sort()  # sort captions in alphabetical order


for i in range(num_pics):
#Load and prepare images
  print("\nImage Title: ",dir_contents[i]);
  image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
  caption_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in caption]).to(device)
 
  with torch.no_grad():
      image_features = model.encode_image(image)
      text_features = model.encode_text(caption_inputs)

  image_features /= image_features.norm(dim=-1, keepdim=True)
  text_features /= text_features.norm(dim=-1, keepdim=True)
  similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
  #values, indices = similarity[0].topk(10, sorted = False)
  values, indices = similarity[0].topk(15, sorted = False)

  # Create a list of tuples containing the caption and its corresponding score
  captions_and_scores = [(caption[idx], values[idx].item()) for idx in indices]
  # Sort the list based on the original order of the captions
  captions_and_scores.sort(key=lambda x: caption.index(x[0]))

  firstCaption.append(captions_and_scores[0][1])
  secondCaption.append(captions_and_scores[1][1])
  thirdCaption.append(captions_and_scores[2][1])
  fourthCaption.append(captions_and_scores[3][1])
  fifthCaption.append(captions_and_scores[4][1])
  sixthCaption.append(captions_and_scores[5][1])
  seventhCaption.append(captions_and_scores[6][1])
  eighthCaption.append(captions_and_scores[7][1])
  ninthCaption.append(captions_and_scores[8][1])
  tenthCaption.append(captions_and_scores[9][1])

  elevenCaption.append(captions_and_scores[10][1])
  twelveCaption.append(captions_and_scores[11][1])
  thirteenCaption.append(captions_and_scores[12][1])
  fourteenCaption.append(captions_and_scores[13][1])
  fifteenCaption.append(captions_and_scores[14][1])

  print("\nPredictions:\n")
  for caption_score in captions_and_scores:
      print(f"{caption_score[0]:>16s}: {caption_score[1]:.8f}")



Import the nessearcy libraries for distance metrics and define function for Manhattan distance

In [None]:
import math
from scipy.spatial import distance
from scipy import spatial

def manhattan_distance(point1, point2):
    return sum(abs(value1 - value2) for value1, value2 in zip(point1, point2))

Compute the distance here

In [None]:
import numpy as np

euclidList = []
x = len(firstCaption)
# initializing points in numpy arrays
for i in range (num_pics):
  if (firstCaption[i] == firstCaption [x -1]) and   (secondCaption [i] ==secondCaption[x-1]) and (thirdCaption [i] ==thirdCaption[x-1]) and \
    (fourthCaption [i] ==fourthCaption[x-1]) and (fifthCaption [i] ==fifthCaption[x-1]) and (sixthCaption [i] == sixthCaption[x-1])\
    and (seventhCaption [i] ==seventhCaption[x-1])and (eighthCaption [i] ==eighthCaption[x-1])and (ninthCaption [i] ==ninthCaption[x-1])\
    and (tenthCaption [i] ==tenthCaption[x-1]) and (elevenCaption[i] == elevenCaption [x -1]) and   (twelveCaption [i] ==twelveCaption[x-1]) \
    and (thirteenCaption [i] ==thirteenCaption[x-1]) and (fourteenCaption [i] ==fourteenCaption[x-1]) and (fifteenCaption [i] ==fifteenCaption[x-1]):
    dist = 0.00

  else:
    image1 = np.array((firstCaption [i], secondCaption [i], thirdCaption [i], fourthCaption [i],fifthCaption [i],sixthCaption [i], seventhCaption [i], eighthCaption [i], ninthCaption [i],tenthCaption [i],\
                       elevenCaption [i], twelveCaption [i], thirteenCaption [i], fourteenCaption [i],fifteenCaption [i]))
    image2 = np.array((firstCaption [i+1], secondCaption [i+1], thirdCaption [i+1], fourthCaption [i+1],fifthCaption [i+1],sixthCaption [i+1], seventhCaption [i+1], eighthCaption [i+1], ninthCaption [i+1],tenthCaption [i+1],\
                        elevenCaption [i+1], twelveCaption [i+1], thirteenCaption [i+1], fourteenCaption [i+1],fifteenCaption [i+1]))
    
    #dist = np.linalg.norm(image1 - image2)  # calculating Euclidean distance
    dist = manhattan_distance(image1, image2) #calculate manhattan distance
    #dist = distance.minkowski(image1,image2,3) #calculate minkowski distance
    #dist =  spatial.distance.cosine(image1,image2) #calculate consine similarity
  euclidList.append(dist)
 
# printing Euclidean distance
print(f"Length: {len(euclidList)} ")

#********************************************************************************
# process the input image data into a list with 0, 1 act as the boundaries

boundariesList = []

x = len(euclidList)
print(f"\nNumber of Pics: {num_pics}")
print(f"Length of List: {x}" )  
for i in range(num_pics):
  if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
    boundariesList.append(0)
  elif(abs(euclidList[i]) - (euclidList[i+1]) > 0.70):
      boundariesList.append(1)
  else:
    boundariesList.append(0)

for i in range(num_pics):
   print(f"{i+1}. {euclidList[i]} {boundariesList[i]}")

In [None]:
# Calculate F-1 

# initializations
# Compute TP, FP, TN, FN
tp = 0; fp = 0; tn = 0; fn = 0;
for bi in range(num_pics):
    # If actual==1 and pred==1, increment true positives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
    tp = tp + 1
    # If actual==1 and pred==0, increment false negatives
  if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
    fn = fn + 1
    # If actual==0 and pred==1, increment false positives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
    fp = fp + 1
    # If actual==0 and pred==0, increment true negatives
  if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
    tn = tn + 1

# Display tp, fp, tn, fn
print('True positives: ', tp)
print('False positives: ', fp)
print('True negatives: ', tn)
print('False negatives: ', fn)

# Compute precision and recall
denom = (tp + fp)
if denom > 0:
    precision = tp / denom
else:
    precision = 0
denom = (tp + fn)
if denom > 0:
    recall = tp / denom
else:
    recall = 0
# Compute F1 score

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = {
        'tp': tp,
        'fp': fp,
        'tn': tn,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}

print(res)
 

Perform threshold optimisation to find the best threshold value

In [None]:
o = 0.00
o_list = []
f1_list = []

while o <= 1.00:

  boundariesList = []

  x = len(euclidList)
  #print(f"\nNumber of Pics: {num_pics}")
  #print(f"Length of List: {x}" )  
  for i in range(num_pics):
    if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
      boundariesList.append(0)
    elif(abs(euclidList[i]) - (euclidList[i+1]) >= o):
        boundariesList.append(1)
    else:
      boundariesList.append(0)

################################################################
    tp = 0; fp = 0; tn = 0; fn = 0;
  for bi in range(num_pics):
      # If actual==1 and pred==1, increment true positives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
      tp = tp + 1
      # If actual==1 and pred==0, increment false negatives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
      fn = fn + 1
      # If actual==0 and pred==1, increment false positives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
      fp = fp + 1
      # If actual==0 and pred==0, increment true negatives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
      tn = tn + 1


  # Compute precision and recall
  denom = (tp + fp)
  if denom > 0:
      precision = tp / denom
  else:
      precision = 0
  denom = (tp + fn)
  if denom > 0:
      recall = tp / denom
  else:
      recall = 0
  # Compute F1 score

  denom = (precision+recall)
  if denom > 0:
      f1 = 2 * ((precision*recall)/denom)
  else:
      f1 = 0
  # Return all metrics
  res = {
          'tp': tp,
          'fp': fp,
          'tn': tn,
          'fn': fn,
          'precision': precision,
          'recall': recall,
          'f1': f1
  }
  o_list.append(o)
  f1_list.append(res['f1'])
  o += 0.05

print(o_list)
print(f1_list)
#*******************************************************************
import matplotlib.pyplot as plt

plt.plot(o_list, f1_list)
plt.xlabel("Diff_value (O)")
plt.ylabel("F1-score")
plt.xlim(0, 1.00)
plt.ylim(0, 1.00)
plt.show


# Experiment 3.6 - Scaling up

Use the best set of queries and optimal model to test on other 10 folders. (Excluding the first folder)

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
import random
import numpy as np
import pandas as pd

device = "cuda" if torch.cuda.is_available() else "cpu"

# some initial set up
with open('/content/gdrive/MyDrive/Colab Notebooks/insectDomainClass.txt') as f:
    image_descriptions = [line.rstrip() for line in f]

model, preprocess = clip.load("ViT-L/14@336px", device=device)

# Here convert list into dictionary with 'int' as key
list1 = list(range(1,len(image_descriptions)+1))
list2 = []

# here append queries into list
for c in image_descriptions:
  list2.append(c)

class_dictionary = dict(zip(list1, list2))
print(class_dictionary)


In [None]:
import math

def manhattan_distance(point1, point2):
    return sum(abs(value1 - value2) for value1, value2 in zip(point1, point2))

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
import numpy as np
import pandas as pd

counter = 2
totalImageCounter = 0
totaltp = 0
totalfp = 0
totaltn = 0
totalfn = 0

while counter <= 11:
  
  #*****************************************************************************
  base_path = '/content/gdrive/MyDrive/Colab Notebooks/'
  dir_name_prefix = 'data-'
  dir_name_suffix1 = '/imgs/'

  source_path = []
  source_path = base_path + dir_name_prefix + str(counter) + dir_name_suffix1

  dir_contents = os.listdir(source_path) 
  dir_contents.sort()
  num_pics = len(dir_contents) 
  print(f"Folder{counter} got {num_pics} images")

  #********************************************************************************
  # Processing the actual boundaries file

  dir_name_suffix2 = '/Boundaries.txt'

  boundaries_path_file = base_path + dir_name_prefix + str(counter) + dir_name_suffix2

  boundaries_df = pd.read_csv(boundaries_path_file)
  bound_strings = boundaries_df.columns.tolist()
  num_bound = len(bound_strings)
  bound_int = []

  #convert string list to int list
  for i in range(0,len(bound_strings)):
      bound_strings[i] = int(bound_strings[i])

  trueboundariesList = []


  for j in range(num_pics):                 
    if j == bound_strings[num_bound -1]:    
        trueboundariesList.append(0)
    elif any(j +1 == y  for y in bound_strings):
      trueboundariesList.append(1)
    else:
      trueboundariesList.append(0)

  count = len([elem for elem in trueboundariesList if elem == 1])

  #********************************************************************************

  resultList = []
  firstCaption = []
  secondCaption = []
  thirdCaption = []
  fourthCaption = []
  fifthCaption = []
  sixthCaption = []
  seventhCaption = []
  eighthCaption = []
  ninthCaption = []
  tenthCaption = []

  elevenCaption = []
  twelveCaption = []
  thirteenCaption = []
  fourteenCaption = []
  fifteenCaption = []

  device = "cuda" if torch.cuda.is_available() else "cpu"

  model, preprocess = clip.load("ViT-L/14@336px", device=device)
  caption = ["scorpion","angel insects","heelwalker","lacewing","booklice","flea"
  ,"stonefly","thrips","true bug","ice crawler","harvestmen","webspinner","earwig","katydid","cricket"]
  caption.sort()  # sort captions in alphabetical order


  for i in range(num_pics):
  #Load and prepare images
    #print("\nImage Title: ",dir_contents[i]);
    image = preprocess(Image.open(source_path + dir_contents[i])).unsqueeze(0).to(device)
    caption_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in caption]).to(device)
  
    with torch.no_grad():
        image_features = model.encode_image(image)
        text_features = model.encode_text(caption_inputs)

    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)
    similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
    values, indices = similarity[0].topk(15, sorted = False)

    # Create a list of tuples containing the caption and its corresponding score
    captions_and_scores = [(caption[idx], values[idx].item()) for idx in indices]
    # Sort the list based on the original order of the captions
    captions_and_scores.sort(key=lambda x: caption.index(x[0]))

    firstCaption.append(captions_and_scores[0][1])
    secondCaption.append(captions_and_scores[1][1])
    thirdCaption.append(captions_and_scores[2][1])
    fourthCaption.append(captions_and_scores[3][1])
    fifthCaption.append(captions_and_scores[4][1])
    sixthCaption.append(captions_and_scores[5][1])
    seventhCaption.append(captions_and_scores[6][1])
    eighthCaption.append(captions_and_scores[7][1])
    ninthCaption.append(captions_and_scores[8][1])
    tenthCaption.append(captions_and_scores[9][1])

    elevenCaption.append(captions_and_scores[10][1])
    twelveCaption.append(captions_and_scores[11][1])
    thirteenCaption.append(captions_and_scores[12][1])
    fourteenCaption.append(captions_and_scores[13][1])
    fifteenCaption.append(captions_and_scores[14][1])

  #*******************************************************************************
  # Calculate Manhattan Distance Here 

  euclidList = []
  x = len(firstCaption)
  # initializing points in numpy arrays
  for i in range (num_pics):
    if (firstCaption[i] == firstCaption [x -1]) and   (secondCaption [i] ==secondCaption[x-1]) and (thirdCaption [i] ==thirdCaption[x-1]) and \
      (fourthCaption [i] ==fourthCaption[x-1]) and (fifthCaption [i] ==fifthCaption[x-1]) and (sixthCaption [i] == sixthCaption[x-1])\
      and (seventhCaption [i] ==seventhCaption[x-1])and (eighthCaption [i] ==eighthCaption[x-1])and (ninthCaption [i] ==ninthCaption[x-1])\
      and (tenthCaption [i] ==tenthCaption[x-1]) and (elevenCaption[i] == elevenCaption [x -1]) and   (twelveCaption [i] ==twelveCaption[x-1]) \
      and (thirteenCaption [i] ==thirteenCaption[x-1]) and (fourteenCaption [i] ==fourteenCaption[x-1]) and (fifteenCaption [i] ==fifteenCaption[x-1]):
      dist = 0.00

    else:
      image1 = np.array((firstCaption [i], secondCaption [i], thirdCaption [i], fourthCaption [i],fifthCaption [i],sixthCaption [i], seventhCaption [i], eighthCaption [i], ninthCaption [i],tenthCaption [i],\
                        elevenCaption [i], twelveCaption [i], thirteenCaption [i], fourteenCaption [i],fifteenCaption [i]))
      image2 = np.array((firstCaption [i+1], secondCaption [i+1], thirdCaption [i+1], fourthCaption [i+1],fifthCaption [i+1],sixthCaption [i+1], seventhCaption [i+1], eighthCaption [i+1], ninthCaption [i+1],tenthCaption [i+1],\
                          elevenCaption [i+1], twelveCaption [i+1], thirteenCaption [i+1], fourteenCaption [i+1],fifteenCaption [i+1]))
      
      dist = manhattan_distance(image1, image2) #calculate manhattan distance
    euclidList.append(dist)
  

  #********************************************************************************
  # process the input image data into a list with 0, 1 act as the boundaries

  boundariesList = []

  x = len(euclidList)

  for i in range(num_pics):
    if euclidList[i] == euclidList[x-1]:    #length count from 1, so need deduct 1
      boundariesList.append(0)
    elif(abs(euclidList[i]) - (euclidList[i+1]) > 0.70):
        boundariesList.append(1)
    else:
      boundariesList.append(0)
  
  #count how many boundary had generated
  totalImageCounter += len(boundariesList)

  #******************************************************************************
  # Calculate F-1 

  # initializations
  # Compute TP, FP, TN, FN
  tp = 0; fp = 0; tn = 0; fn = 0;
  for bi in range(num_pics):
      # If actual==1 and pred==1, increment true positives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
      tp = tp + 1
      # If actual==1 and pred==0, increment false negatives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
      fn = fn + 1
      # If actual==0 and pred==1, increment false positives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
      fp = fp + 1
      # If actual==0 and pred==0, increment true negatives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
      tn = tn + 1

  # Compute precision and recall
  denom = (tp + fp)
  if denom > 0:
      precision = tp / denom
  else:
      precision = 0
  denom = (tp + fn)
  if denom > 0:
      recall = tp / denom
  else:
      recall = 0
  # Compute F1 score

  denom = (precision+recall)
  if denom > 0:
      f1 = 2 * ((precision*recall)/denom)
  else:
      f1 = 0
  # Return all metrics
  res = {
          'tp': tp,
          'fp': fp,
          'tn': tn,
          'fn': fn,
          'precision': precision,
          'recall': recall,
          'f1': f1
  }
  totaltp = totaltp +tp
  totalfp = totalfp +fp
  totaltn = totaltn +tn
  totalfn = totalfn +fn
  print(f"Folder {counter}: {res}")
  counter += 1

print(f"Total image: {totalImageCounter}")


In [None]:
# Calculate Total F-1 

# Compute precision and recall
denom = (totaltp + totalfp)
if denom > 0:
    precision = totaltp / denom
else:
    precision = 0
denom = (totaltp + totalfn)
if denom > 0:
    recall = totaltp / denom
else:
    recall = 0
# Compute F1 score

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = { 
        'tp': totaltp,
        'fp': totalfp,
        'tn': totaltn,
        'fn': totalfn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}

print(res)
 

# Experiment 3.4 + - Scaling up (Image Encoder Embedding Space)

In Experiment 3.4, I only used image-encoder to tested in data-1 images. Here is using from data-2 to data-11 folder.

In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
import random
import numpy as np
import pandas as pd

device = "cuda" if torch.cuda.is_available() else "cpu"


model, preprocess = clip.load("ViT-L/14@336px", device=device)


In [None]:
import torch
import clip
import os
from PIL import Image
import array as arr 
import numpy as np
import pandas as pd

counter = 2
totalImageCounter = 0
totaltp = 0
totalfp = 0
totaltn = 0
totalfn = 0

while counter <= 11:
  
  #*****************************************************************************
  base_path = '/content/gdrive/MyDrive/Colab Notebooks/'
  dir_name_prefix = 'data-'
  dir_name_suffix1 = '/imgs/'

  source_path = []
  source_path = base_path + dir_name_prefix + str(counter) + dir_name_suffix1

  dir_contents = os.listdir(source_path) 
  dir_contents.sort()
  num_pics = len(dir_contents) 
  print(f"Folder{counter} got {num_pics} images")

  #********************************************************************************
  # Processing the actual boundaries file

  dir_name_suffix2 = '/Boundaries.txt'

  boundaries_path_file = base_path + dir_name_prefix + str(counter) + dir_name_suffix2

  boundaries_df = pd.read_csv(boundaries_path_file)
  bound_strings = boundaries_df.columns.tolist()
  num_bound = len(bound_strings)
  bound_int = []

  #convert string list to int list
  for i in range(0,len(bound_strings)):
      bound_strings[i] = int(bound_strings[i])

  trueboundariesList = []


  for j in range(num_pics):                 
    if j == bound_strings[num_bound -1]:    
        trueboundariesList.append(0)
    elif any(j +1 == y  for y in bound_strings):
      trueboundariesList.append(1)
    else:
      trueboundariesList.append(0)

  count = len([elem for elem in trueboundariesList if elem == 1])

  #********************************************************************************
  images = []
  image_features = []
  for i in range(num_pics):
  #Load and prepare images
    image_path = os.path.join(source_path, dir_contents[i])
    image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)
    images.append(image)

    with torch.no_grad():
      features = model.encode_image(image)
      features /= features.norm(dim=-1, keepdim=True)
      image_features.append(features)


  similarities = []


  for i in range(num_pics-1):   #range start from 0
      similarity = (100.0 * image_features[i] @ image_features[i+1].T).item()
      similarities.append(similarity)

  #as here compute difference only so at the end will get num_pics - 1, so I add the last 1
  x = len(similarities)
  similarities.append(similarities[x-1])


  #********************************************************************************
  # process the input image data into a list with 0, 1 act as the boundaries

  boundariesList = []
  x = len(similarities)

  for i in range(num_pics):
    if similarities[i] == similarities[x-1]:    #length count from 1, so need deduct 1
      boundariesList.append(0)
    elif(similarities[i]  < 88):
        boundariesList.append(1)
    else:
      boundariesList.append(0)
  
  totalImageCounter += len(boundariesList)
  #for i in range(num_pics):
    #print(f"{i+1}. Image{i+1} {boundariesList[i]}")

  #******************************************************************************
  # Calculate F-1 

  # initializations
  # Compute TP, FP, TN, FN
  tp = 0; fp = 0; tn = 0; fn = 0;
  for bi in range(num_pics):
      # If actual==1 and pred==1, increment true positives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 1):
      tp = tp + 1
      # If actual==1 and pred==0, increment false negatives
    if (trueboundariesList[bi] == 1) and (boundariesList[bi] == 0):
      fn = fn + 1
      # If actual==0 and pred==1, increment false positives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 1):
      fp = fp + 1
      # If actual==0 and pred==0, increment true negatives
    if (trueboundariesList[bi] == 0) and (boundariesList[bi] == 0):
      tn = tn + 1

  # Compute precision and recall
  denom = (tp + fp)
  if denom > 0:
      precision = tp / denom
  else:
      precision = 0
  denom = (tp + fn)
  if denom > 0:
      recall = tp / denom
  else:
      recall = 0
  # Compute F1 score

  denom = (precision+recall)
  if denom > 0:
      f1 = 2 * ((precision*recall)/denom)
  else:
      f1 = 0
  # Return all metrics
  res = {
          'tp': tp,
          'fp': fp,
          'tn': tn,
          'fn': fn,
          'precision': precision,
          'recall': recall,
          'f1': f1
  }
  totaltp = totaltp +tp
  totalfp = totalfp +fp
  totaltn = totaltn +tn
  totalfn = totalfn +fn
  print(f"Folder {counter}: {res}")
  counter += 1

print(f"Total image: {totalImageCounter}")


In [None]:
# Calculate Total F-1 

# Compute precision and recall
denom = (totaltp + totalfp)
if denom > 0:
    precision = totaltp / denom
else:
    precision = 0
denom = (totaltp + totalfn)
if denom > 0:
    recall = totaltp / denom
else:
    recall = 0
# Compute F1 score

denom = (precision+recall)
if denom > 0:
    f1 = 2 * ((precision*recall)/denom)
else:
    f1 = 0
# Return all metrics
res = { 
        'tp': totaltp,
        'fp': totalfp,
        'tn': totaltn,
        'fn': totalfn,
        'precision': precision,
        'recall': recall,
        'f1': f1
}

print(res)
 