# Embedding Visualization

This example shows how to visualize embedding vectors of images using the Tensorflow **Embedding Projector**.

Before running the application, image/text embeddings need to be generated. In this example, I use the image embeddings from my other repository [simclr_pytorch_flowers](https://github.com/mxagar/simclr_pytorch_flowers), where the embedding vectors of the [Flowers dataset](https://www.kaggle.com/datasets/imsparsh/flowers-dataset) from Kaggle are generated in [`datasets/vectors_dataset.csv`](./datasets/vectors_dataset.csv). 

In [2]:
import os
import sys

import numpy as np
from tqdm import tqdm
from PIL import Image
import pandas as pd
import ast

import warnings
warnings.filterwarnings("ignore", message=".*The 'nopython' keyword.*")

In [3]:
class Config:
    def __init__(self):
        self.vector_dataset_path = "../datasets/vectors_dataset.csv"
        self.output_path = "./output"
        os.makedirs(self.output_path, exist_ok=True)  # Create the output_path directory if it doesn't exist

config = Config()

## Load Dataset (Vectors & Image Paths)

In [4]:
df = pd.read_csv(config.vector_dataset_path)
df['embedding'] = df['embedding'].apply(ast.literal_eval)
df.shape

(2746, 6)

In [5]:
df.head()

Unnamed: 0,filename,filepath,label,linear_pred,embedding,cluster
0,10140303196_b88d3d6cec.jpg,../datasets/flowers/train\daisy\10140303196_b8...,daisy,daisy,"[-0.5338386297225952, -0.7343480587005615, 0.3...",2
1,10172379554_b296050f82_n.jpg,../datasets/flowers/train\daisy\10172379554_b2...,daisy,daisy,"[0.058685123920440674, -1.1110296249389648, 0....",1
2,10172567486_2748826a8b.jpg,../datasets/flowers/train\daisy\10172567486_27...,daisy,daisy,"[-0.2931477725505829, -1.0281589031219482, 0.0...",2
3,10172636503_21bededa75_n.jpg,../datasets/flowers/train\daisy\10172636503_21...,daisy,daisy,"[-0.8797124624252319, -0.700323760509491, -0.0...",1
4,10391248763_1d16681106_n.jpg,../datasets/flowers/train\daisy\10391248763_1d...,daisy,daisy,"[0.512474000453949, -0.4610719680786133, 0.934...",1


## Create Image Sprite

In [6]:
# Fix small image size
size = 60
image_width, image_height = size, size
# Resize all images
images = [Image.open(filename).resize((image_width,image_height)) for filename in tqdm(df['filepath'])]

100%|██████████| 2746/2746 [00:07<00:00, 370.15it/s]


In [7]:
size_in_bytes = sys.getsizeof(images)
print(f"The size of 'images' in memory is {size_in_bytes} bytes.")

The size of 'images' in memory is 23720 bytes.


In [8]:
one_square_size = int(np.ceil(np.sqrt(len(images))))
master_width = (image_width * one_square_size) 
master_height = image_height * one_square_size

spriteimage = Image.new(
    mode='RGBA',
    size=(master_width, master_height),
    color=(0,0,0,0))  # fully transparent

for count, image in enumerate(images):
    div, mod = divmod(count,one_square_size)
    h_loc = image_width*div
    w_loc = image_width*mod    
    spriteimage.paste(image,(w_loc,h_loc))

image_sprinte_filepath = os.path.join(config.output_path, "sprite.jpg")
spriteimage.convert("RGB").save(image_sprinte_filepath, transparency=0)

## Save Embedding Vectors and Metadata as TSV

In [9]:
embeddings_list = df["embedding"].tolist()
embeddings_df = pd.DataFrame(embeddings_list)
embedding_tsv_filepath = os.path.join(config.output_path, "image_embeddings.tsv")
embeddings_df.to_csv(embedding_tsv_filepath, sep='\t', header=False, index=False)

In [11]:
filenames_df = df[["filename", "filepath", "label", "cluster"]]
filename_tsv_filepath = os.path.join(config.output_path, "image_metadata.tsv")
filenames_df.to_csv(filename_tsv_filepath, sep='\t', header=True, index=False)

## Save Config File

In [12]:
config_filepath = os.path.join(config.output_path, "projector_config.pbtxt")

content = '''embeddings {
  tensor_path: "image_embeddings.tsv"
  metadata_path: "image_metadata.tsv"
  sprite {
    image_path: "sprite.jpg"
    single_image_dim: 60
    single_image_dim: 60
  }
}'''

with open(config_filepath, 'w') as file:
    file.write(content)

## Run Tensorboard

```bash
# Anaconda Powershell
conda activate ds
pip install tensorflow
pip install tensorboard

# On Windows:
# Add path to tensorboard executable in the environment variables:
# Path += C:\Users\<UserName>\AppData\Roaming\Python\Python39\Scripts

cd /path/to/all/tsv/sprite/and/config/files
# Don't use blank spaces
tensorboard --logdir=./
# Open borwser at http://localhost:6006/
# Refresh it several times until it works
```