<img src="https://github.com/retkowsky/visual-search-azureAI/blob/main/logo.jpg?raw=true">

# Fashion Visual Search demo

## 7. Visual search - Index management

This code demonstrates how to use **Azure Cognitive Search** with **Cognitive Services Florence Vision API** and Azure Python SDK for visual search.

## Visual search with vector embeddings
Vector embeddings are a way of representing content such as text or images as vectors of real numbers in a high-dimensional space. These embeddings are often learned from large amounts of textual and visual data using machine learning algorithms like neural networks. Each dimension of the vector corresponds to a different feature or attribute of the content, such as its semantic meaning, syntactic role, or context in which it commonly appears. By representing content as vectors, we can perform mathematical operations on them to compare their similarity or use them as inputs to machine learning models.

## Process
<img src="https://raw.githubusercontent.com/retkowsky/Azure-Computer-Vision-in-a-day-workshop/72c07afc4fcc04a29ca19b84d3d343a09a22368e//fashionprocess.png" width=512>

## Business applications
- Digital asset management: Image retrieval can be used to manage large collections of digital images, such as in museums, archives, or online galleries. Users can search for images based on visual features and retrieve the images that match their criteria.
- Medical image retrieval: Image retrieval can be used in medical imaging to search for images based on their diagnostic features or disease patterns. This can help doctors or researchers to identify similar cases or track disease progression.
- Security and surveillance: Image retrieval can be used in security and surveillance systems to search for images based on specific features or patterns, such as in, people & object tracking, or threat detection.
- Forensic image retrieval: Image retrieval can be used in forensic investigations to search for images based on their visual content or metadata, such as in cases of cyber-crime.
- E-commerce: Image retrieval can be used in online shopping applications to search for similar products based on their features or descriptions or provide recommendations based on previous purchases.
- Fashion and design: Image retrieval can be used in fashion and design to search for images based on their visual features, such as color, pattern, or texture. This can help designers or retailers to identify similar products or trends.

## To learn more
- https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-image-retrieval
- https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search

In this notebook we took some samples fashion images are taken from this link:<br>
https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset

> Note: Image retrieval is curently in public preview

## 1. Python librairies

In [1]:
import datetime
import io
import json
import glob
import math
import os
import requests
import sys
import time

from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
    PrioritizedFields,
    SearchableField,
    SearchField,
    SearchFieldDataType,
    SearchIndex,
    SearchIndexerDataContainer,
    SearchIndexerDataSourceConnection,
    SemanticConfiguration,
    SemanticField,
    SemanticSettings,
    SimpleField,
    VectorSearch,
    VectorSearchAlgorithmConfiguration,
)
from azure.storage.blob import BlobServiceClient
from dotenv import load_dotenv
from io import BytesIO
from PIL import Image

In [2]:
sys.version

'3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]'

In [3]:
print("Today is", datetime.datetime.today())

Today is 2023-10-18 13:39:42.340307


## 2. Azure AI Services

In [4]:
load_dotenv("azure.env")

# Azure Computer Vision 4
acv_key = os.getenv("acv_key")
acv_endpoint = os.getenv("acv_endpoint")

# Azure Cognitive Search
acs_endpoint = os.getenv("acs_endpoint")
acs_key = os.getenv("acs_key")

In [5]:
# Ensure that the azure endpoints should not finished a /
if acv_endpoint.endswith("/"):
    acv_endpoint = acv_endpoint[:-1]

if acs_endpoint.endswith("/"):
    acs_endpoint = acv_endpoint[:-1]

In [6]:
# Azure Cognitive Search index name to create
index_name = "azure-fashion-demo"

# Azure Cognitive Search api version
api_version = "2023-02-01-preview"

## 3. Functions

In [7]:
session = requests.Session()


def image_embedding(imagefile):
    """
    Image embedding using Azure Computer Vision 4.0
    """
    version = "?api-version=" + api_version + "&modelVersion=latest"
    vec_img_url = acv_endpoint + "/computervision/retrieval:vectorizeImage" + version
    headers = {
        "Content-type": "application/octet-stream",
        "Ocp-Apim-Subscription-Key": acv_key,
    }

    try:
        blob_service_client = BlobServiceClient.from_connection_string(
            blob_connection_string
        )
        container_client = blob_service_client.get_container_client(container_name)

        blob_client = container_client.get_blob_client(imagefile)
        stream = BytesIO()
        blob_data = blob_client.download_blob()
        blob_data.readinto(stream)

        stream.seek(0)  # Reset stream position to the beginning

        response = session.post(vec_img_url, data=stream, headers=headers)
        response.raise_for_status()  # Raise an exception if response is not 200

        image_emb = response.json()["vector"]
        return image_emb

    except requests.exceptions.RequestException as e:
        print(f"Request Exception: {e}")
    except Exception as ex:
        print(f"Error: {ex}")

    return None

In [16]:
def index_status(index_name):
    """
    Azure Cognitive Search index status
    """
    print("Azure Cognitive Search Index:", index_name, "\n")

    headers = {"Content-Type": "application/json", "api-key": acs_key}
    params = {"api-version": "2021-04-30-Preview"}
    index_status = requests.get(
        acs_endpoint + "/indexes/" + index_name, headers=headers, params=params
    )
    try:
        print(json.dumps((index_status.json()), indent=5))
    except:
        print("Request failed")


def index_stats(index_name):
    """
    Get statistics about Azure Cognitive Search index
    """
    url = (
        acs_endpoint
        + "/indexes/"
        + index_name
        + "/stats?api-version=2021-04-30-Preview"
    )
    headers = {
        "Content-Type": "application/json",
        "api-key": acs_key,
    }
    response = requests.get(url, headers=headers)
    print("Azure Cognitive Search index status for:", index_name, "\n")

    if response.status_code == 200:
        res = response.json()
        print(json.dumps(res, indent=2))
        document_count = res["documentCount"]
        storage_size = res["storageSize"]

    else:
        print("Request failed with status code:", response.status_code)

    return document_count, storage_size


def acs_service_statistics(index_name):
    """
    Azure Cognitive Search service statistics
    """
    url = os.getenv("acs_endpoint") + "/servicestats?api-version=2021-04-30-Preview"
    headers = {
        "Content-Type": "application/json",
        "api-key": os.getenv("acs_key"),
    }
    response = requests.get(url, headers=headers)
    print("Azure Cognitive Search index status for:", index_name, "\n")

    if response.status_code == 200:
        res = response.json()
        print(json.dumps(res, indent=2))

    else:
        print("Request failed with status code:", response.status_code)
        
        
def delete_acs_document_by_id(document_id):
    """
    Delete Azure Cognitive Search document
    """
    # Search client
    search_client = SearchClient(acs_endpoint, index_name, AzureKeyCredential(acs_key))

    # Delete document by its id number
    print(f"Document with id {document_id} will be deleted from index {index_name}")
    document_to_delete = {"idfile": document_id}  # Delete the entry based on the idfile
    search_client.delete_documents([document_to_delete])

    print("Done")

## 4. Azure Cognitive search index informations

In [9]:
acs_service_statistics(index_name)

Azure Cognitive Search index status for: azure-fashion-demo 

{
  "@odata.context": "https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.ServiceStatistics",
  "counters": {
    "documentCount": {
      "usage": 33242,
      "quota": null
    },
    "indexesCount": {
      "usage": 21,
      "quota": 50
    },
    "indexersCount": {
      "usage": 1,
      "quota": 50
    },
    "dataSourcesCount": {
      "usage": 3,
      "quota": 50
    },
    "storageSize": {
      "usage": 1026951396,
      "quota": 26843545600
    },
    "synonymMaps": {
      "usage": 0,
      "quota": 5
    },
    "skillsetCount": {
      "usage": 0,
      "quota": 50
    },
    "aliasesCount": {
      "usage": 0,
      "quota": 100
    }
  },
  "limits": {
    "maxFieldsPerIndex": 3000,
    "maxFieldNestingDepthPerIndex": 10,
    "maxComplexCollectionFieldsPerIndex": 40,
    "maxComplexObjectsInCollectionsPerDocument": 3000
  }
}


In [10]:
index_status(index_name)

Azure Cognitive Search Index: azure-fashion-demo 

{
     "@odata.context": "https://azurecogsearcheastussr.search.windows.net/$metadata#indexes/$entity",
     "@odata.etag": "\"0x8DB79721D17ED54\"",
     "name": "azure-fashion-demo",
     "defaultScoringProfile": null,
     "fields": [
          {
               "name": "idfile",
               "type": "Edm.String",
               "searchable": false,
               "filterable": false,
               "retrievable": true,
               "sortable": false,
               "facetable": false,
               "key": true,
               "indexAnalyzer": null,
               "searchAnalyzer": null,
               "analyzer": null,
               "normalizer": null,
               "synonymMaps": []
          },
          {
               "name": "imagefile",
               "type": "Edm.String",
               "searchable": true,
               "filterable": false,
               "retrievable": true,
               "sortable": false,
        

In [17]:
document_count, storage_size = index_stats(index_name)

Azure Cognitive Search index status for: azure-fashion-demo 

{
  "@odata.context": "https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics",
  "documentCount": 10219,
  "storageSize": 155597306
}


In [18]:
print("Number of documents in the index =", f"{document_count:,}")
print("Size of the index =", round(storage_size / (1024 * 1024), 2), "MB")

Number of documents in the index = 10,219
Size of the index = 148.39 MB


## 5. Quick search on a document

In [19]:
search_client = SearchClient(acs_endpoint, index_name, AzureKeyCredential(acs_key))

In [22]:
text = "0627214001"
response = search_client.search(search_text=text)

for result in response:
    print("Id file:", result["idfile"])
    print("Filename:", result["imagefile"])

Id file: 3062
Filename: fashion/0627214001.jpg


In [23]:
text = "0628469001"
print("Let's query the index with text =", text, "\n")

response = search_client.search(search_text=text)

for result in response:
    print("Id file:", result["idfile"])
    print("Filename:", result["imagefile"])

Let's query the index with text = 0628469001 

Id file: 3267
Filename: fashion/0628469001.jpg


## 6. Let's delete some documents from the index

In [24]:
document_count, storage_size = index_stats(index_name)

Azure Cognitive Search index status for: azure-fashion-demo 

{
  "@odata.context": "https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics",
  "documentCount": 10221,
  "storageSize": 155633480
}


In [25]:
document_id = "3062"
delete_acs_document_by_id(document_id)

Document with id 3062 will be deleted from index azure-fashion-demo
Done


In [29]:
document_count, storage_size = index_stats(index_name)

Azure Cognitive Search index status for: azure-fashion-demo 

{
  "@odata.context": "https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics",
  "documentCount": 10220,
  "storageSize": 155618254
}


In [30]:
document_id = "3267"
delete_acs_document_by_id(document_id)

Document with id 3267 will be deleted from index azure-fashion-demo
Done


In [38]:
document_count, storage_size = index_stats(index_name)

Azure Cognitive Search index status for: azure-fashion-demo 

{
  "@odata.context": "https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics",
  "documentCount": 10219,
  "storageSize": 155603130
}


> We have deleted 2 documents from the index as we can see

## 7. Let's add some new images to the index

Let's use a new blob to use that contains some new images to add

In [39]:
# Azure storage account
blob_connection_string = "tobereplaced"
container_name = "tobereplaced"

In [40]:
# Connect to Blob Storage
blob_service_client = BlobServiceClient.from_connection_string(blob_connection_string)
container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs()

first_blob = next(blobs)
blob_url = container_client.get_blob_client(first_blob).url
print(f"URL of the first blob: {blob_url}")

URL of the first blob: https://azurestorageaccountsr.blob.core.windows.net/fashion-images-new/shirt1.jpg


In [41]:
# Create a data source
ds_client = SearchIndexerClient(acs_endpoint, AzureKeyCredential(acs_key))
container = SearchIndexerDataContainer(name=container_name)
data_source_connection = SearchIndexerDataSourceConnection(
    name=f"{index_name}-blob",
    type="azureblob",
    connection_string=blob_connection_string,
    container=container,
)
data_source = ds_client.create_or_update_data_source_connection(data_source_connection)

print(f"Done. Data source '{data_source.name}' has been created or updated.")

Done. Data source 'azure-fashion-demo-blob' has been created or updated.


In [42]:
blob_service_client = BlobServiceClient.from_connection_string(blob_connection_string)
container_client = blob_service_client.get_container_client(container_name)
number_images = len(list(container_client.list_blobs()))

print("Total number of images =", number_images, "in blob:", container_name)

Total number of images = 2 in blob: fashion-images-new


In [43]:
EMBEDDINGS_DIR = "embeddings"

os.makedirs(EMBEDDINGS_DIR, exist_ok=True)

In [44]:
list_of_images = container_client.list_blobs()

new_images_list = []

for image in list_of_images:
    imagefile = image["name"]
    new_images_list.append(imagefile)

In [45]:
new_images_list

['shirt1.jpg', 'shirt2.jpg']

In [61]:
startindex = 100003

data = [
    {"idfile": str(i + startindex +1), "imagefile": image} for i, image in enumerate(new_images_list)
]

with open(os.path.join(EMBEDDINGS_DIR, "list_of_images_new.json"), "w") as f:
    json.dump(data, f)
    
!ls $EMBEDDINGS_DIR/list_of_images_new.json -lh

-rwxrwxrwx 1 root root 98 Oct 18 13:55 embeddings/list_of_images_new.json


In [62]:
data

[{'idfile': '100004', 'imagefile': 'shirt1.jpg'},
 {'idfile': '100005', 'imagefile': 'shirt2.jpg'}]

In [63]:
batch_size = 1

start = time.time()
print("Running the image files embeddings...")
print("Total number of images to embed =", len(new_images_list), "\n")

with open(
    os.path.join(EMBEDDINGS_DIR, "list_of_images_new.json"), "r", encoding="utf-8"
) as file:
    input_data = json.load(file)

image_count = len(input_data)
processed_count = 0

for batch_start in range(0, image_count, batch_size):
    batch_end = min(batch_start + batch_size, image_count)
    batch_data = input_data[batch_start:batch_end]

    for idx, item in enumerate(batch_data, start=batch_start + 1):
        imgindex = item["idfile"]
        imgfile = item["imagefile"]
        item["imagevector"] = image_embedding(imgfile)

        if idx % batch_size == 1:
            pctdone = round(idx / image_count * 100)
            dt = datetime.datetime.today().strftime("%d-%b-%Y %H:%M:%S")
            print(
                dt,
                f"Number of processed image files = {idx:06} of {image_count:06} | Done: {pctdone}%",
            )

    processed_count += len(batch_data)

elapsed = time.time() - start
print("\nDone")
print(
    "\nElapsed time: "
    + time.strftime(
        "%H:%M:%S.{}".format(str(elapsed % 1)[2:])[:15], time.gmtime(elapsed)
    )
)
print("Time per image =", round(elapsed / processed_count, 5), "seconds")

Running the image files embeddings...
Total number of images to embed = 2 


Done

Elapsed time: 00:00:01.166330
Time per image = 0.58317 seconds


In [64]:
# Save embeddings to documents.json file
start = time.time()

print("Saving the results into a json file...")
with open(os.path.join(EMBEDDINGS_DIR, "documents_new.json"), "w") as f:
    json.dump(input_data, f)

print("Done. Elapsed time:", round(time.time() - start, 2), "secs")

Saving the results into a json file...
Done. Elapsed time: 0.05 secs


In [65]:
with open(os.path.join(EMBEDDINGS_DIR, "documents_new.json"), "r") as file:
    documents = json.load(file)

print("Size of the documents to load =", len(documents))

Size of the documents to load = 2


In [66]:
def loading_documents(documents):
    """
    Loading documents into the Azure Cognitive Search index
    """
    # Upload some documents to the index
    print("Uploading the documents into the index", index_name, "...")

    # Setting the Azure Cognitive Search client
    search_client = SearchClient(
        endpoint=acs_endpoint,
        index_name=index_name,
        credential=AzureKeyCredential(acs_key),
    )
    response = search_client.upload_documents(documents)
    print(
        f"\nDone. Uploaded {len(documents)} documents into the Azure Cognitive Search index.\n"
    )
    return len(documents)

In [67]:
loading_documents(documents)

Uploading the documents into the index azure-fashion-demo ...

Done. Uploaded 2 documents into the Azure Cognitive Search index.



2

In [73]:
document_count, storage_size = index_stats(index_name)

Azure Cognitive Search index status for: azure-fashion-demo 

{
  "@odata.context": "https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics",
  "documentCount": 10221,
  "storageSize": 155639272
}


> This is the new size of the index

## 8. Post processing

In [56]:
index_status(index_name)

Azure Cognitive Search Index: azure-fashion-demo 

{
     "@odata.context": "https://azurecogsearcheastussr.search.windows.net/$metadata#indexes/$entity",
     "@odata.etag": "\"0x8DB79721D17ED54\"",
     "name": "azure-fashion-demo",
     "defaultScoringProfile": null,
     "fields": [
          {
               "name": "idfile",
               "type": "Edm.String",
               "searchable": false,
               "filterable": false,
               "retrievable": true,
               "sortable": false,
               "facetable": false,
               "key": true,
               "indexAnalyzer": null,
               "searchAnalyzer": null,
               "analyzer": null,
               "normalizer": null,
               "synonymMaps": []
          },
          {
               "name": "imagefile",
               "type": "Edm.String",
               "searchable": true,
               "filterable": false,
               "retrievable": true,
               "sortable": false,
        

In [None]:
#delete_index(index_name)