# Vector Search using Azure Cognitive Search, Azure Open AI, and C#

## Vector Databases
A Vector Database is a type of database that stores data as high-dimensional vectors. These vectors can represent a wide range of information, such as numerical features, embeddings from text or images, and even complex data like molecular structures. They are typically powered by k-nearest neighbor (k-NN) indexes and built with algorithms like the Hierarchical Navigable Small World (HNSW) and Inverted File Index (IVF) algorithms. Vector databases provide additional capabilities like data management, fault tolerance, authentication and access control, and a query engine.

### What is Azure Cognitive Search?
Azure Cognitive Search (formerly known as “Azure Search”) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications. It integrates with other Azure services in the form of indexers that automate data ingestion/retrieval from Azure data sources, and skillsets that incorporate consumable AI from Azure AI services.

## Initialise the following
- Service Endpoints and Keys
- Nuget packages and references
- C# Client objects

In [1]:
#!set --name serviceEndpoint --value https://XXXXX.search.windows.net
#!set --name indexName --value azugmnl-vector-azure-XXXXX
#!set --name key --value XXXXX
#!set --name openaiEndpoint --value https://XXXXX.openai.azure.com/
#!set --name openaiApiKey --value XXXXX
#!set --name TextEmbeddingModel --value text-embedding-ada-002
#!set --name SearchConfigName --value my-search-config
#!set --name SemanticSearchConfigName --value my-semantic-search-config
#!set --name VectorSearchConfigName --value my-vector-search-config
#!set --name ResultSize --value 5


In [2]:
#r "nuget:Azure.AI.OpenAI,1.0.0-beta.5"
#r "nuget:Azure.Identity,1.9.0"
#r "nuget:Azure.Search.Documents,11.5.0-beta.4"

using System.Text.Json;
using Azure;
using Azure.AI.OpenAI;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;
using Azure.Search.Documents.Models;
using Microsoft.Extensions.Configuration;

In [3]:
// Initialize OpenAI client  
var credential = new AzureKeyCredential(openaiApiKey);
var openAIClient = new OpenAIClient(new Uri(openaiEndpoint), credential);

// Initialize Azure Cognitive Search clients  
var searchCredential = new AzureKeyCredential(key);
var indexClient = new SearchIndexClient(new Uri(serviceEndpoint), searchCredential);
var searchClient = indexClient.GetSearchClient(indexName);

## Create the search index

In [4]:
#!value --from-file azure.json --name inputJson

In [5]:
#!share inputJson --from value

SearchIndex searchIndex = new(indexName)
{
    VectorSearch = new()
    {
        AlgorithmConfigurations =
        {
            new HnswVectorSearchAlgorithmConfiguration(VectorSearchConfigName)
        }
    },
    SemanticSettings = new()
    {

        Configurations =
        {
            new SemanticConfiguration(SemanticSearchConfigName, new()
            {
                TitleField = new(){ FieldName = "title" },
                ContentFields =
                {
                    new() { FieldName = "content" }
                },
                KeywordFields =
                {
                    new() { FieldName = "category" }
                }

            })

        },
    },
    Fields =
    {
        new SimpleField("id", SearchFieldDataType.String) { IsKey = true, IsFilterable = true, IsSortable = true, IsFacetable = true },
        new SearchableField("title") { IsFilterable = true, IsSortable = true },
        new SearchableField("content") { IsFilterable = true },
        new SearchField("titleVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
        {
            IsSearchable = true,
            VectorSearchDimensions = 1536,
            VectorSearchConfiguration = VectorSearchConfigName
        },
        new SearchField("contentVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
        {
            IsSearchable = true,
            VectorSearchDimensions = 1536,
            VectorSearchConfiguration = VectorSearchConfigName
        },
        new SearchableField("category") { IsFilterable = true, IsSortable = true, IsFacetable = true }
    }
};

indexClient.CreateOrUpdateIndex(searchIndex);

var inputDocuments = JsonSerializer.Deserialize<List<Dictionary<string, object>>>(inputJson) ?? new List<Dictionary<string, object>>();

var sampleDocuments = await GetSampleDocumentsAsync(openAIClient, inputDocuments);
await searchClient.IndexDocumentsAsync(IndexDocumentsBatch.Upload(sampleDocuments));

async Task<List<SearchDocument>> GetSampleDocumentsAsync(OpenAIClient openAIClient, List<Dictionary<string, object>> inputDocuments)
{
    List<SearchDocument> sampleDocuments = new List<SearchDocument>();

    foreach (var document in inputDocuments)
    {
        string title = document["title"]?.ToString() ?? string.Empty;
        string content = document["content"]?.ToString() ?? string.Empty;

        float[] contentEmbeddings = (await GenerateEmbeddings(content, openAIClient)).ToArray();

        document["contentVector"] = contentEmbeddings;
        sampleDocuments.Add(new SearchDocument(document));
    }

    return sampleDocuments;
}

async Task<IReadOnlyList<float>> GenerateEmbeddings(string text, OpenAIClient openAIClient)
{
    var response = await openAIClient.GetEmbeddingsAsync(TextEmbeddingModel, new EmbeddingsOptions(text));
    return response.Value.Data[0].Embedding;
}

## Single Vector Search

This demo showcases the power of Azure Cognitive Search in performing high-dimensional vector searches. It demonstrates how to use vector embeddings to find similar items in a large dataset, based on their semantic similarity rather than syntactic match.

In [10]:
string query = "Find services similar to Azure Data Factory";

 // Generate the embedding for the query  
var queryEmbeddings = await GenerateEmbeddings(query, openAIClient);

// Perform the vector similarity search  
var searchOptions = new SearchOptions
{
    Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = Int32.Parse(ResultSize), Fields = { "contentVector" } } },
    Size = Int32.Parse(ResultSize),
    Select = { "title", "content", "category" },
};

SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(null, searchOptions);

int count = 0;
await foreach (SearchResult<SearchDocument> result in response.GetResultsAsync())
{
    count++;
    Console.WriteLine($"Title: {result.Document["title"]}");
    Console.WriteLine($"Score: {result.Score}\n");
    Console.WriteLine($"Content: {result.Document["content"]}");
    Console.WriteLine($"Category: {result.Document["category"]}\n");
}
Console.WriteLine($"Total Results: {count}");

async Task<IReadOnlyList<float>> GenerateEmbeddings(string text, OpenAIClient openAIClient)
{
    var response = await openAIClient.GetEmbeddingsAsync(TextEmbeddingModel, new EmbeddingsOptions(text));
    return response.Value.Data[0].Embedding;
}

Title: Azure Data Factory
Score: 0.8826109

Content: Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and manage data workflows. It supports a wide range of sources and destinations, such as Azure Blob Storage, Azure SQL Database, and on-premises file systems. Data Factory provides a visual interface for designing and monitoring your workflows, as well as support for popular programming languages like Python and .NET. You can use Data Factory to move and transform data, orchestrate complex data processing tasks, and integrate with other Azure services.
Category: Analytics

Title: Azure Data Factory
Score: 0.8815417

Content: Azure Data Factory is a cloud-based data integration service that enables you to create, schedule, and manage your data workflows. It provides features like data movement, data transformation, and integration with Azure Machine Learning. Data Factory supports various data sources, such as Azure Blob Storage, Azure Da

## Vector Search with Filter

This demo extends the Single Vector Search by adding filters. It shows how you can combine traditional search capabilities with vector search to refine results. For example, you might search for semantically similar items within a certain category or items that meet specific criteria.

In [11]:
string query = "Find services similar to Azure Data Factory";
string filter = "category eq 'Databases'";

// Generate the embedding for the query  
var queryEmbeddings = await GenerateEmbeddings(query, openAIClient);

// Perform the vector similarity search  
var searchOptions = new SearchOptions
{
    Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = Int32.Parse(ResultSize), Fields = { "contentVector" } } },
    Filter = filter,
    Select = { "title", "content", "category" },
    Size = Int32.Parse(ResultSize)
};

SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(null, searchOptions);

int count = 0;
await foreach (SearchResult<SearchDocument> result in response.GetResultsAsync())
{
    count++;
    Console.WriteLine($"Title: {result.Document["title"]}");
    Console.WriteLine($"Score: {result.Score}\n");
    Console.WriteLine($"Content: {result.Document["content"]}");
    Console.WriteLine($"Category: {result.Document["category"]}\n");

}
Console.WriteLine($"Total Results: {count}");

            
async Task<IReadOnlyList<float>> GenerateEmbeddings(string text, OpenAIClient openAIClient)
{
    var response = await openAIClient.GetEmbeddingsAsync(TextEmbeddingModel, new EmbeddingsOptions(text));
    return response.Value.Data[0].Embedding;
}

Title: Azure SQL Data Warehouse
Score: 0.84135795

Content: Azure SQL Data Warehouse is a fully managed, petabyte-scale cloud data warehouse service that enables you to store and analyze your structured and semi-structured data. It provides features like automatic scaling, data movement, and integration with Azure Machine Learning. SQL Data Warehouse supports various data sources, such as Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database. You can use Azure SQL Data Warehouse to build data lakes, develop big data analytics solutions, and ensure the performance and security of your data. It also integrates with other Azure services, such as Azure Synapse Analytics and Azure Data Factory.
Category: Databases

Title: Azure Database Migration Service
Score: 0.8318007

Content: Azure Database Migration Service is a fully managed, end-to-end migration service that enables you to migrate your databases to Azure with minimal downtime. It supports various source and target plat

## Simple Hybrid Search

This demo illustrates the combination of traditional keyword-based search with vector search. It demonstrates how Azure Cognitive Search can return results that not only match the keywords but also have high semantic similarity to the query, providing a more nuanced and context-aware search experience.

In [12]:
string query = "Find services related to ‘data warehousing’ that are similar to Azure Synapse Analytics";

 // Generate the embedding for the query  
var queryEmbeddings = await GenerateEmbeddings(query, openAIClient);

// Perform the vector similarity search  
var searchOptions = new SearchOptions
{
    Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = Int32.Parse(ResultSize), Fields = { "contentVector" } } },
    Size = Int32.Parse(ResultSize),
    Select = { "title", "content", "category" },
};


SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(query, searchOptions);

int count = 0;
await foreach (SearchResult<SearchDocument> result in response.GetResultsAsync())
{
    count++;
    Console.WriteLine($"Title: {result.Document["title"]}");
    Console.WriteLine($"Score: {result.Score}\n");
    Console.WriteLine($"Content: {result.Document["content"]}");
    Console.WriteLine($"Category: {result.Document["category"]}\n");
}
Console.WriteLine($"Total Results: {count}");

async Task<IReadOnlyList<float>> GenerateEmbeddings(string text, OpenAIClient openAIClient)
{
    var response = await openAIClient.GetEmbeddingsAsync(TextEmbeddingModel, new EmbeddingsOptions(text));
    return response.Value.Data[0].Embedding;
}

Title: Azure Synapse Analytics
Score: 0.03333333507180214

Content: Azure Synapse Analytics is an integrated analytics service that brings together big data and data warehousing. It enables you to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. Synapse Analytics provides a unified workspace for data engineers, data scientists, and business analysts to collaborate and build solutions. It supports various data sources, including Azure Data Lake Storage, Azure Blob Storage, and Azure Cosmos DB. You can use Synapse Analytics with other Azure services, such as Azure Machine Learning and Power BI.
Category: Analytics

Title: Azure Data Lake Analytics
Score: 0.03226646035909653

Content: Azure Data Lake Analytics is an on-demand, cloud-based analytics service that enables you to process and analyze big data. It provides features like job scheduling, parallel processing, and built-in analytics functions. Data Lake Analytics supports vario

## Semantic Hybrid Search

This demo takes the hybrid search approach a step further by leveraging advanced NLP techniques for semantic understanding. It shows how Azure Cognitive Search can understand the intent behind a query and return results that are semantically relevant, even if they don’t exactly match the query terms.

### What is Semantic Search?

Semantic search is an information retrieval process used by modern search engines to return the most relevant search results. It focuses on the meaning behind search queries instead of the traditional keyword matching. The terminology comes from a branch of linguistics called semantics, which is concerned with the study of meaning. Semantic search is about visualizing relationships between concepts and entities (as well as relationships between related concepts and entities).

In [13]:
string query = "Find services that are used for ‘big data processing’ and are similar to Azure Databricks.";

try
{
    // Generate the embedding for the query    
    var queryEmbeddings = await GenerateEmbeddings(query, openAIClient);

    // Perform the vector similarity search    
    var searchOptions = new SearchOptions
    {
        Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = Int32.Parse(ResultSize), Fields = { "contentVector" } } },
        Size = Int32.Parse(ResultSize),
        QueryType = SearchQueryType.Semantic,
        QueryLanguage = QueryLanguage.EnUs,
        SemanticConfigurationName = SemanticSearchConfigName,
        QueryCaption = QueryCaptionType.Extractive,
        QueryAnswer = QueryAnswerType.Extractive,
        QueryCaptionHighlightEnabled = true,
        Select = { "title", "content", "category" },
    };

    SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(query, searchOptions);

    int count = 0;
    Console.WriteLine("Semantic Hybrid Search Results:\n");

    Console.WriteLine("Query Answer:");
    foreach (AnswerResult result in response.Answers)
    {
        Console.WriteLine($"Answer Highlights: {result.Highlights}");
        Console.WriteLine($"Answer Text: {result.Text}\n");
    }

    await foreach (SearchResult<SearchDocument> result in response.GetResultsAsync())
    {
        count++;
        Console.WriteLine($"Title: {result.Document["title"]}");
        Console.WriteLine($"Score: {result.Score}\n");
        Console.WriteLine($"Content: {result.Document["content"]}");
        Console.WriteLine($"Category: {result.Document["category"]}\n");

        if (result.Captions != null)
        {
            var caption = result.Captions.FirstOrDefault();
            if (caption != null)
            {
                if (!string.IsNullOrEmpty(caption.Highlights))
                {
                    Console.WriteLine($"Caption Highlights: {caption.Highlights}\n");
                }
                else
                {
                    Console.WriteLine($"Caption Text: {caption.Text}\n");
                }
            }
        }
    }
    Console.WriteLine($"Total Results: {count}");

    async Task<IReadOnlyList<float>> GenerateEmbeddings(string text, OpenAIClient openAIClient)
    {
        var response = await openAIClient.GetEmbeddingsAsync(TextEmbeddingModel, new EmbeddingsOptions(text));
        return response.Value.Data[0].Embedding;
    }
}
catch (NullReferenceException)
{
    Console.WriteLine("Total Results: 0");
}

Semantic Hybrid Search Results:

Query Answer:
Title: Azure HDInsight
Score: 0.031054403632879257

Content: Azure HDInsight is a fully managed, open-source analytics service for processing big data workloads. It provides popular open-source frameworks, such as Apache Hadoop, Apache Spark, Apache Kafka, and Apache HBase. HDInsight supports various data sources, such as Azure Blob Storage, Azure Data Lake Storage, and Azure Cosmos DB. You can use HDInsight to analyze and process large volumes of data, build real-time analytics solutions, and develop machine learning models. It also integrates with other Azure services, such as Azure Synapse Analytics and Azure Machine Learning.
Category: Analytics

Caption Highlights: <em>Azure HDInsight</em> is a fully managed, open-source analytics service for processing big data workloads. It provides popular open-source frameworks, such as Apache Hadoop, Apache Spark, Apache Kafka, and Apache HBase.<em> HDInsight</em> supports various data sources, s

# Other resources

Here's some more places to explore:
> https://learn.microsoft.com/en-us/azure/search/vector-search-overview
> https://learn.microsoft.com/en-us/azure/search/semantic-search-overview
> https://github.com/Azure/cognitive-search-vector-pr