# Image Retrival System - Data Preprocessing

In this section, we start by setting up the tools and transformations needed to process images and extract meaningful features from them. First, we import essential libraries like torch for working with deep learning models and torchvision.transforms for applying image transformations. The PIL.Image library helps us handle image files, while faiss is included for efficient similarity search—a crucial part of our image retrieval system. We also bring in os for file handling, numpy for numerical operations, and tqdm to add a handy progress bar during iterative tasks.

Next, we define a series of transformations that prepare the images for processing by a deep learning model. These include resizing the image to 256 pixels, applying a center crop to 224 pixels (the input size expected by most pre-trained models), converting the image into a tensor, and normalizing the pixel values using standard mean and standard deviation values. This normalization step is particularly important because it aligns our input images with the statistics of the data the model was originally trained on.

Finally, we load a ResNet-50 model pre-trained on ImageNet using torch.hub. This model is designed to extract high-level features from images, making it perfect for our task. By calling model.eval(), we set the model to evaluation mode, ensuring it behaves predictably and doesn't calculate gradients during inference. This setup lays the foundation for extracting image embeddings that we’ll use for retrieval later.

In [1]:
import torch
import torchvision.transforms as transforms
from PIL import Image
import faiss
import os
import numpy as np
from tqdm import tqdm

# Define image transformations
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load a pre-trained model (e.g., ResNet-50)
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)
model.eval()

Using cache found in /home/hamza-ubuntu/.cache/torch/hub/pytorch_vision_v0.10.0


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

## Feature Extraction from Images
The extract_features function processes an image to extract its feature embeddings using a pre-trained model. It starts by loading the image from the specified path and applying the defined transformations to resize, crop, normalize, and prepare the image for model input. A quick check ensures the image has three channels (RGB), skipping non-standard images with a message for clarity. Using the model in evaluation mode, it computes the feature embeddings, converts them to a NumPy array, and returns them. Errors during processing are handled gracefully, with a helpful message, ensuring the function is robust for large datasets.

In [2]:
def extract_features(image_path):
    """
    Extracts features (embeddings) from an image.

    Args:
        image_path (str): Path to the image file.

    Returns:
        np.ndarray: Extracted features as a numpy array, 
                    or None if there's an error.
    """
    try:
        img = Image.open(image_path)
        img_tensor = transform(img).unsqueeze(0)
        # Check if the tensor has 3 channels
        if img_tensor.shape[1] != 3:
            print(f"Image {image_path} has {img_tensor.shape[1]} channels, skipping.")
            return None

        with torch.no_grad():
            features = model(img_tensor).squeeze().numpy()
        return features
    except Exception as e:
        print(f"Error processing image {image_path}: {e}")
        return None

## Saving and Loading Embeddings  

This section defines two helper functions, `save_embeddings` and `load_embeddings`, for efficiently managing image embeddings.  

- **`save_embeddings`**: This function takes an array of embeddings and saves it to a specified file using Python's `pickle` module. The file is written in binary mode (`wb`), ensuring the data is stored safely and compactly for later use.  

- **`load_embeddings`**: This function retrieves the embeddings from a previously saved file. By opening the file in binary read mode (`rb`) and using `pickle.load`, it reconstructs the original embeddings for use in the image retrieval pipeline.  

These functions simplify embedding management, allowing for seamless storage and retrieval without the need to recompute features repeatedly.

In [3]:
import pickle

def save_embeddings(embeddings, filename):
  """
  Saves embeddings to a pickle file.

  Args:
      embeddings (np.ndarray): The embeddings to save.
      filename (str): The filename to save the embeddings to.
  """
  with open(filename, 'wb') as f:
    pickle.dump(embeddings, f)

def load_embeddings(filename):
  """
  Loads embeddings from a pickle file.

  Args:
      filename (str): The filename to load the embeddings from.

  Returns:
      np.ndarray: The loaded embeddings.
  """
  with open(filename, 'rb') as f:
    return pickle.load(f)

## Creating and Populating a Faiss Index  

This code sets up a **Faiss index**, which is a highly efficient tool for performing similarity searches on high-dimensional data, such as image embeddings. The `add_images_to_index` function is responsible for adding image embeddings to the index, handling both new feature extraction and previously saved embeddings.  

### How It Works  

1. **Faiss Index Initialization**:  
   A new Faiss index is created using `faiss.IndexFlatL2`, configured for L2 distance (Euclidean distance) similarity searches. The dimensionality of the index matches the output features of the model’s final layer (`model.fc.out_features`).  

2. **Embedding Extraction**:  
   If the embeddings file doesn’t already exist, the function loops through all images in the specified folder. For each image, it extracts features using the `extract_features` function. If an image cannot be processed, it’s recorded in `problematic_images`, and the count of corrupted images is tracked. The extracted embeddings are then converted into a NumPy array and saved using the `save_embeddings` function.  

3. **Loading Saved Embeddings**:  
   If the embeddings file already exists, the function loads the embeddings directly using `load_embeddings`. This avoids recomputing features and speeds up the process.  

4. **Adding Embeddings to the Index**:  
   Finally, the extracted or loaded embeddings are added to the Faiss index using the `index.add()` method.  

### Output  
The function returns the populated Faiss index, ready for similarity searches, and a list of problematic images that couldn’t be processed. This ensures both efficiency and robustness when working with large datasets.

In [4]:
# Create a Faiss index
index = faiss.IndexFlatL2(model.fc.out_features)

def add_images_to_index(image_folder, embeddings_filename=None):
    """
    Adds image embeddings to the Faiss index.

    Args:
        image_folder (str): Path to the folder containing images.
        embeddings_filename (str, optional): Path to the file containing saved embeddings. Defaults to None.

    Returns:
        faiss.Index: The Faiss index with added embeddings.
    """
    # Create a Faiss index
    index = faiss.IndexFlatL2(model.fc.out_features)

    # Extract and save embeddings if the file doesn't exist
    if not os.path.exists(embeddings_filename):
        embeddings = []
        image_paths = [os.path.join(image_folder, f) for f in os.listdir(image_folder)]
        corrupted_count = 0
        problematic_images = []
        for image_path in tqdm(image_paths, desc="Processing images"):
            features = extract_features(image_path)
            if features is not None:
                embeddings.append(features)
            else:
                corrupted_count += 1
                problematic_images.append(image_path)
                print(f"added {image_path} to problematic_images")

        embeddings = np.array(embeddings)
        save_embeddings(embeddings, embeddings_filename)
        print(f"Number of corrupted images: {corrupted_count}")
    else:
        # Load embeddings if the file exists
        embeddings = load_embeddings(embeddings_filename)

    index.add(embeddings)
    return index, problematic_images

## Searching for Similar Images  

The `search_similar_images` function allows us to find the most similar images in the dataset based on a query image. It leverages the Faiss index, which we previously populated with image embeddings, to perform efficient similarity searches.  

### How It Works  

1. **Query Image Embedding**:  
   The function starts by extracting the feature embedding of the query image using the `extract_features` function. This transforms the image into a numerical representation that can be compared against other images in the dataset.  

2. **Performing the Search**:  
   The query embedding is reshaped and passed to the Faiss index using the `index.search()` method. This method retrieves the top `k` (in this case, 5) most similar images based on their Euclidean distance from the query image. The `distances` array holds the similarity scores, while `indices` contains the indices of the most similar images.  

3. **Mapping Indices to Image Paths**:  
   The function then maps the indices returned by Faiss back to the corresponding image file paths from the `image_paths` list. This gives us the file paths to the most similar images.  

### Output  
The function returns a list of file paths to the most similar images. This allows us to efficiently retrieve and display the images that are most closely related to the query, enabling an effective image search experience.

In [5]:
def search_similar_images(query_image_path, index, image_paths):
    """
    Searches for similar images based on a query image.

    Args:
        query_image_path (str): Path to the query image.
        index (faiss.Index): The Faiss index containing image embeddings.
        image_paths (list): List of image paths.

    Returns:
        list: A list of paths to the most similar images.
    """
    query_embedding = extract_features(query_image_path)
    distances, indices = index.search(query_embedding.reshape(1, -1), k=5)
    similar_image_paths = [image_paths[i] for i in indices[0]]
    return similar_image_paths

## Extracting and Saving Embeddings  

This code snippet ensures that the embeddings for all images in the specified folder are extracted and saved for future use. The embeddings are computed and stored in a pickle file to avoid repeated calculations.  

### How It Works  

1. **Setting Paths**:  
   The path to the folder containing the images (`image_folder`) is specified, along with the filename (`embeddings_filename`) where the embeddings will be saved. If the embeddings file already exists, it will be loaded directly, skipping the extraction process.  

2. **Calling `add_images_to_index`**:  
   The function `add_images_to_index` is called to process all images in the folder. If the embeddings file doesn’t exist, it extracts features for each image, handles any problematic images, and saves the embeddings to the specified file. If the embeddings file already exists, it simply loads the embeddings.  

3. **Building the Faiss Index**:  
   As part of the function, the Faiss index is populated with the image embeddings, making it ready for fast similarity searches. Any problematic images that couldn’t be processed are also returned for further review.  

### Output  
The code returns the Faiss index, which now contains all the image embeddings, and a list of problematic images that couldn’t be processed. This allows for a smooth image retrieval process and provides useful feedback on any issues with the dataset.

In [None]:
# Extract and save embeddings (if not already done)
image_folder = "SemArt/Images"
embeddings_filename = "artifacts_image_retrieval/mega_embeddings.pkl"
index, problem_images = add_images_to_index(image_folder, embeddings_filename)


Processing images:   1%|          | 140/21384 [00:13<28:08, 12.58it/s]  

Error processing image SemArt/Images/07522-25conta.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/07522-25conta.jpg to problematic_images


Processing images:   2%|▏         | 466/21384 [01:19<45:37,  7.64it/s]  

Error processing image SemArt/Images/23964-2chris01.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23964-2chris01.jpg to problematic_images


Processing images:   9%|▉         | 1943/21384 [04:29<26:49, 12.08it/s]  

Error processing image SemArt/Images/31303-triumph.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/31303-triumph.jpg to problematic_images


Processing images:  10%|█         | 2160/21384 [04:54<31:19, 10.23it/s]

Error processing image SemArt/Images/23965-2chris02.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23965-2chris02.jpg to problematic_images


Processing images:  14%|█▎        | 2917/21384 [06:30<29:18, 10.50it/s]  

Error processing image SemArt/Images/23966-2chris03.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23966-2chris03.jpg to problematic_images


Processing images:  15%|█▌        | 3227/21384 [07:09<26:04, 11.61it/s]  

Error processing image SemArt/Images/23961-1james06.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23961-1james06.jpg to problematic_images


Processing images:  23%|██▎       | 4849/21384 [10:29<26:30, 10.39it/s]  

Error processing image SemArt/Images/23967-2chris04.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23967-2chris04.jpg to problematic_images


Processing images:  24%|██▍       | 5142/21384 [11:07<27:52,  9.71it/s]  

Error processing image SemArt/Images/23956-1james01.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23956-1james01.jpg to problematic_images


Processing images:  25%|██▍       | 5245/21384 [11:20<25:03, 10.73it/s]

Error processing image SemArt/Images/23959-1james04.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23959-1james04.jpg to problematic_images


Processing images:  25%|██▌       | 5360/21384 [11:35<26:49,  9.96it/s]

Error processing image SemArt/Images/28351-thebapti.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/28351-thebapti.jpg to problematic_images


Processing images:  27%|██▋       | 5728/21384 [12:21<23:38, 11.04it/s]  

Error processing image SemArt/Images/26737-lunettes.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/26737-lunettes.jpg to problematic_images


Processing images:  28%|██▊       | 5955/21384 [12:46<25:21, 10.14it/s]  

Error processing image SemArt/Images/14278-3allegor.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/14278-3allegor.jpg to problematic_images


Processing images:  33%|███▎      | 7035/21384 [15:05<23:44, 10.07it/s]  

Error processing image SemArt/Images/23958-1james03.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23958-1james03.jpg to problematic_images


Processing images:  34%|███▍      | 7307/21384 [15:40<22:36, 10.38it/s]

Error processing image SemArt/Images/23957-1james02.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23957-1james02.jpg to problematic_images


Processing images:  36%|███▌      | 7727/21384 [16:34<21:32, 10.56it/s]  

Error processing image SemArt/Images/44237-dutchsqu.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/44237-dutchsqu.jpg to problematic_images


Processing images:  47%|████▋     | 10026/21384 [21:30<20:23,  9.29it/s] 

Error processing image SemArt/Images/23960-1james05.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23960-1james05.jpg to problematic_images


Processing images:  51%|█████▏    | 10984/21384 [23:31<18:13,  9.51it/s]  

Error processing image SemArt/Images/28344-farnese.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/28344-farnese.jpg to problematic_images


Processing images:  57%|█████▋    | 12292/21384 [26:17<14:44, 10.28it/s]  

Error processing image SemArt/Images/44328-dance_de.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/44328-dance_de.jpg to problematic_images


Processing images:  60%|██████    | 12924/21384 [27:37<15:31,  9.08it/s]

Error processing image SemArt/Images/26670-5outsidf.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/26670-5outsidf.jpg to problematic_images


Processing images:  62%|██████▏   | 13316/21384 [28:27<13:14, 10.16it/s]

Error processing image SemArt/Images/38222-2liszt.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/38222-2liszt.jpg to problematic_images


Processing images:  68%|██████▊   | 14469/21384 [30:54<10:33, 10.92it/s]

Error processing image SemArt/Images/26735-lu15pha.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/26735-lu15pha.jpg to problematic_images


Processing images:  71%|███████▏  | 15262/21384 [32:32<09:36, 10.62it/s]

Error processing image SemArt/Images/26736-lu16abr.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/26736-lu16abr.jpg to problematic_images


Processing images:  72%|███████▏  | 15374/21384 [32:45<09:22, 10.68it/s]

Error processing image SemArt/Images/38547-pius_8.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/38547-pius_8.jpg to problematic_images


Processing images:  78%|███████▊  | 16584/21384 [35:19<06:39, 12.02it/s]

Error processing image SemArt/Images/39599-3martyr1.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/39599-3martyr1.jpg to problematic_images


Processing images:  86%|████████▋ | 18466/21384 [39:16<04:35, 10.60it/s]

Error processing image SemArt/Images/29586-arca2.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/29586-arca2.jpg to problematic_images


Processing images:  88%|████████▊ | 18821/21384 [39:58<04:11, 10.19it/s]

Error processing image SemArt/Images/33171-waterloo.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/33171-waterloo.jpg to problematic_images


Processing images:  90%|████████▉ | 19244/21384 [40:54<03:25, 10.41it/s]

Error processing image SemArt/Images/38221-1erkel.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/38221-1erkel.jpg to problematic_images


Processing images:  93%|█████████▎| 19890/21384 [42:20<03:10,  7.85it/s]

Error processing image SemArt/Images/23968-2chris05.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/23968-2chris05.jpg to problematic_images


Processing images:  95%|█████████▌| 20383/21384 [43:24<01:37, 10.23it/s]

Error processing image SemArt/Images/44327-circe_ul.jpg: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
added SemArt/Images/44327-circe_ul.jpg to problematic_images


Processing images: 100%|██████████| 21384/21384 [45:41<00:00,  7.80it/s]

Number of corrupted images: 29





## Handling and Deleting Problematic Images  

This section of code focuses on managing problematic images that couldn't be processed during the feature extraction phase. The goal is to save these images' names in a CSV file for tracking and remove them from the system to avoid errors during future operations.  

### How It Works  

1. **Saving Problematic Image Names**:  
   The code begins by opening a CSV file (`mega_problematic_images.csv`) in write mode. It writes a header row with the column name "Image Name" to clearly label the data. Then, for each problematic image path in `problem_images`, it extracts the file name using `os.path.basename()` and writes it as a new row in the CSV file. This helps maintain a record of all images that were skipped during processing.  

2. **Deleting Problematic Images**:  
   After saving the names, the code proceeds to delete the problematic images from the system using `os.remove()`. This ensures that any images that caused issues during feature extraction are no longer present in the folder, preventing them from interfering with the rest of the workflow. The deletion of each image is logged with a print statement for tracking purposes.  

### Output  
The problematic images are removed from the system, and their names are saved in a CSV file for future reference. This improves the overall robustness of the system by ensuring only valid images remain for further processing and retrieval tasks.

In [None]:
import csv
with open('artifacts_image_retrieval/mega_problematic_images.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Image Name'])
    for image_path in problem_images:
        filename = os.path.basename(image_path)
        writer.writerow([filename])

for image_path in problem_images:
    os.remove(image_path)
    print(f"Deleted problematic image: {image_path}")

Deleted problematic image: SemArt/Images/07522-25conta.jpg
Deleted problematic image: SemArt/Images/23964-2chris01.jpg
Deleted problematic image: SemArt/Images/31303-triumph.jpg
Deleted problematic image: SemArt/Images/23965-2chris02.jpg
Deleted problematic image: SemArt/Images/23966-2chris03.jpg
Deleted problematic image: SemArt/Images/23961-1james06.jpg
Deleted problematic image: SemArt/Images/23967-2chris04.jpg
Deleted problematic image: SemArt/Images/23956-1james01.jpg
Deleted problematic image: SemArt/Images/23959-1james04.jpg
Deleted problematic image: SemArt/Images/28351-thebapti.jpg
Deleted problematic image: SemArt/Images/26737-lunettes.jpg
Deleted problematic image: SemArt/Images/14278-3allegor.jpg
Deleted problematic image: SemArt/Images/23958-1james03.jpg
Deleted problematic image: SemArt/Images/23957-1james02.jpg
Deleted problematic image: SemArt/Images/44237-dutchsqu.jpg
Deleted problematic image: SemArt/Images/23960-1james05.jpg
Deleted problematic image: SemArt/Images/2

In [None]:
# Load embeddings and create index (if embeddings are already saved)
# index = add_images_to_index(image_folder, embeddings_filename)

# Search for similar images
query_image_path = "test_set/vase1.jpeg"
image_paths = [os.path.join(image_folder, f) for f in os.listdir(image_folder)]
similar_images = search_similar_images(query_image_path, index, image_paths)
print(similar_images)

['SemArt/Images/30580-stil_lif.jpg', 'SemArt/Images/08686-10vase.jpg', 'SemArt/Images/34434-stillif.jpg', 'SemArt/Images/26087-24nojov4.jpg', 'SemArt/Images/37724-stillife.jpg']


### Markdown Description  

## Creating the Image Paths CSV  

This section of code is used to generate a CSV file containing the file paths of all images in the dataset. This CSV file acts as a reference to map image paths to their corresponding embeddings, which is essential for performing image retrieval and matching operations later on.  

### How It Works  

1. **Opening the CSV File**:  
   The code opens a CSV file (`image_paths.csv`) in write mode. The `csv.writer()` function is used to write the image paths into the file, ensuring each image path is stored in its own row.  

2. **Writing Image Paths**:  
   The loop iterates over the list of image paths (`image_paths`) and writes each one into the CSV file. Each path is written as a single value in a row. This CSV file can later be loaded to easily retrieve the image paths when matching them with the corresponding embeddings in the Faiss index.  

### Output  
The code generates a CSV file (`image_paths.csv`) that contains a list of all image paths. This file is crucial for associating image paths with their corresponding embeddings, enabling efficient image search and retrieval tasks.

In [None]:
# USED TO CREATE THE IMAGE PATH CSV THAT YOU CAN USE LATER ON TO FETCH IMAGE NAMES TO CORRESPONDING EMBEDDINGS
#  with open('artifacts_image_retrieval/image_paths.csv', 'w', newline='') as csvfile:
#     writer = csv.writer(csvfile)
#     for image_path in image_paths:
#         writer.writerow([image_path])

## Loading Image Paths and Searching for Similar Images  

In this section, the code performs two tasks: loading the list of image paths from a CSV file and using that list to find similar images based on a query image.  

### How It Works  

1. **Loading Image Paths**:  
   The code begins by opening the CSV file (`image_paths.csv`) in read mode. It uses the `csv.reader()` function to read each row and extracts the image paths from the first column. These paths are then stored in the `image_paths_loaded` list. This list serves as a reference for all images in the dataset that will be compared against the query image.  

2. **Searching for Similar Images**:  
   The query image path (`query_image_path`) is specified, pointing to a specific image file (in this case, `"test_set/vase1.jpeg"`). The function `search_similar_images` is called, which takes the query image, the Faiss index, and the loaded image paths as inputs. It then retrieves the top 5 most similar images from the dataset based on the query image’s feature embedding.  

3. **Displaying Results**:  
   The list of similar image paths (`similar_images`) is printed to the console. This provides the user with the file paths of the images that are most similar to the query image.  

### Output  
The output is a list of file paths to the most similar images in the dataset, allowing the user to visually compare the query image with others that are closely related.

In [None]:
with open('artifacts_image_retrieval/image_paths.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    image_paths_loaded = [row[0] for row in reader]
query_image_path = "test_set/vase1.jpeg"
similar_images = search_similar_images(query_image_path, index, image_paths_loaded)
print(similar_images)

['SemArt/Images/30580-stil_lif.jpg', 'SemArt/Images/08686-10vase.jpg', 'SemArt/Images/34434-stillif.jpg', 'SemArt/Images/26087-24nojov4.jpg', 'SemArt/Images/37724-stillife.jpg']
