## How can images be displayed in a 2D way?
Images can be displayed in a 2D way by reducing their high-dimensional representations (embeddings) to two dimensions 
using a dimensionality reduction technique. 

- Embedding Extraction: Extract feature vectors (embeddings) from images using models like convolutional neural networks (CNNs).
- Dimensionality Reduction: Apply a dimensionality reduction algorithm, such as t-SNE (t-distributed Stochastic Neighbor Embedding), 
to map high-dimensional embeddings to a 2D space.

## How does the code work?
The code is based on the embedding_resnet.ipynb, in which the pickle files of the image_paths and the embeddigns are being created. 
This code goes through several steps: 
1. Load the files needed (image_paths and embeddigns)
2. Extract the parent folder name of each image (function: extract_category_from_path)
3. Store the category name in a list
4. Perform the TSNE dimensionality reduction on the embeddings (see further down for the explanation)
5. Store the data for the visualisation in a dataframe
6. Plot the results (with the plotly library, so one can hover over it)


In [None]:
!pip install plotly

In [None]:
import pickle
import os
import numpy as np
from collections import defaultdict
from sklearn.manifold import TSNE
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd


## How does the TSNE dimensionality reduction work?
1. Pairwise Similarities: Calculating the pairwise similarities between all points in the high-dimensional space.
2. Probability Distributions: Converting these similarities into probability distributions. Similar points have higher probabilities of being neighbors.
3. Minimizing Kullback-Leibler Divergence: Mapping these probabilities to a lower-dimensional space (usually 2D) by minimizing \
the Kullback-Leibler divergence between the probability distributions of the high-dimensional space and the \
lower-dimensional space. This ensures that similar points in high dimensions are close in the lower dimensions.

For further explanations you can watch this video: https://www.youtube.com/watch?v=NEaUSP4YerM

In [None]:
# Load embeddings and image paths
embeddings_path = 'embeddings_all.pkl'
image_paths_path = 'image_paths_all.pkl'

if not os.path.exists(embeddings_path) or not os.path.exists(image_paths_path):
    print("No file found with the specified name, maybe create the pickle files first?")
else:
    with open(embeddings_path, 'rb') as f:
        embeddings = pickle.load(f)
    with open(image_paths_path, 'rb') as f:
        image_paths = pickle.load(f)

In [None]:
def extract_category_from_path(path, depth):
    # Split the path into its components
    path_parts = path.split(sep)
    # Ensure the depth is within the valid range
    if 0 <= depth < len(path_parts):
        return path_parts[depth]
    else:
        raise ValueError(f"Specified depth {depth} is out of range for the given path")

# please test the depth of the path you want to have
sep = "\\" # how are the paths seperated 
depth = 3 # replace by the depth of the path you want (count slashes to find depth)
# Extract the first 5 paths
first_path = image_paths[0]
print(first_path)
extract_category_from_path(first_path, depth)





In [None]:
categories = [extract_category_from_path(path, depth) for path in image_paths]
print(f"This is the length of the categories list: {len(categories)}") # should be the number of images you want to plot

# Perform t-SNE dimensionality reduction
tsne = TSNE(n_components=2, random_state=42)
embeddings_2d = tsne.fit_transform(embeddings)
print(f"This is the shape of the embeddings in 2D: {embeddings_2d.shape}")

# Prepare data for plotting
plot_data = {
    'x': embeddings_2d[:, 0],
    'y': embeddings_2d[:, 1],
    'path': image_paths,
    'category': categories
}

In [None]:
# Create a DataFrame for the visualisation
df = pd.DataFrame(plot_data)
print(df.head(5))



## Why do similar images have close images?
- Preservation of Local structure: TSNE aims to preserve the local structure of the high-dimensional data in the 2D map. This means that imageas that are close \
in the high-dimensional space are also close in the 2D representation
- High Probability of being neighbors: TSNE assigns higher probabilites to pairs of points that are similar and places them closer together in the 2D space.

In [None]:
# Create the interactive plot
fig = px.scatter(
    df, 
    x='x', 
    y='y', 
    color='category',
    hover_data={'path': True, 'x': False, 'y': False}
)

# Add hover template to show path, x, and y coordinates
fig.update_traces(
    hovertemplate="<br>".join([
        "Path: %{customdata[0]}",
        "x: %{x}",
        "y: %{y}"
    ]),
    opacity = 0.2 # alpha value 
)

fig.update_layout(
    title='t-SNE Visualization of Image Embeddings',
    xaxis_title='Dimension 1',
    yaxis_title='Dimension 2',
    width=1200,
    height=800
)

fig.show()