# Azure AI Search

[Azure AI Search](https://learn.microsoft.com/azure/search/search-what-is-azure-search) (formerly known as `Azure Cognitive Search`) is a Microsoft cloud search service that gives developers infrastructure, APIs, and tools for information retrieval of vector, keyword, and hybrid queries at scale.

`AzureAISearchRetriever` is an integration module that returns documents from an unstructured query. It's based on the BaseRetriever class and it targets the 2023-11-01 stable REST API version of Azure AI Search, which means it supports vector indexing and queries.

To use this module, you need:

+ An Azure AI Search service. You can [create one](https://learn.microsoft.com/azure/search/search-create-service-portal) for free if you sign up for the Azure trial. A free service has lower quotas, but it's sufficient for running the code in this notebook.

+ An existing index with vector fields. There are several ways to create one, including using the [vector store module](../vectorstores/azuresearch.ipynb). Or, [try the Azure AI Search REST APIs](https://learn.microsoft.com/azure/search/search-get-started-vector).

+ An API key. API keys are generated when you create the search service. If you're just querying an index, you can use the query API key, otherwise use an admin API key. See [Find your API keys](https://learn.microsoft.com/azure/search/search-security-api-keys?tabs=rest-use%2Cportal-find%2Cportal-query#find-existing-keys) for details.

`AzureAISearchRetriever` replaces `AzureCognitiveSearchRetriever`, which will soon be deprecated. We recommend switching to the newer version that's based on the most recent stable version of the search APIs.

## Install packages

Use azure-documents-search package 11.4 or later.

In [None]:
%pip install --upgrade --quiet langchain
%pip install --upgrade --quiet langchain-openai
%pip install --upgrade --quiet  azure-search-documents
%pip install --upgrade --quiet  azure-identity

## Import required libraries

In [None]:
import os

from langchain_community.retrievers import (
    AzureAISearchRetriever,
)

## Configure search settings

To use the Azure Search Retriever you'll need to set the search service name and index name as environment variables or pass them as arguments to `AzureAISearchRetriever`. The search index provides the searchable content. You can set these as environment variables as follows. 

In [None]:
os.environ["AZURE_AI_SEARCH_SERVICE_NAME"] = "<YOUR_SEARCH_SERVICE_NAME>"
os.environ["AZURE_AI_SEARCH_INDEX_NAME"] = "<YOUR_SEARCH_INDEX_NAME>"

You will also need to either set an API key or provide a token provider if using keyless authentication, both will be demostrated below.

## Keyless Authentication 

For better security, we recommend using keyless authentication with Azure Entra ID. For the retriever to use managed identity, do not set or pass through an API key. Instead, sign into azd, and use your default credentials to create a token provider that you can pass to the retriever using the `azure_ad_token_provider` parameter. You will need to have the correct permissions set. Read more about using managed identity with Azure [here](https://learn.microsoft.com/en-us/azure/search/search-howto-managed-identities-data-sources?tabs=portal-sys%2Cportal-user). 

## Create the retriever

For `AzureAISearchRetriever`, provide an `index_name`, `content_key`, and `top_k` set to the number of number of results you'd like to retrieve. Setting `top_k` to zero (the default) returns all results.

### With an API Key 

In [23]:
retriever = AzureAISearchRetriever(
    content_key="content",
    top_k=1,
    service_name=os.getenv("AZURE_SEARCH_ENDPOINT"),
    api_key=os.getenv("AZURE_SEARCH_API_KEY"),
    index_name="langchain-vector-demo",
)

### With Keyless Authentication 

In [None]:
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

bearer_token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://search.azure.com/.default"
)

retriever = AzureAISearchRetriever(
    content_key="content",
    top_k=1,
    service_name=os.getenv("AZURE_SEARCH_ENDPOINT"),
    azure_ad_token_provider=bearer_token_provider,
    index_name="langchain-vector-demo",
)

Now you can use it to retrieve documents from Azure AI Search. 
This is the method you would call to do so. It will return all documents relevant to the query. 

In [None]:
retriever.invoke("here is my unstructured query string")

## Example 

This section demonstrates using the retriever over built-in sample data. You can skip this step if you already have a vector index on your search service.

Start by providing the endpoints and keys. Since we're creating a vector index in this step, specify a text embedding model to get a vector representation of the text. This example assumes Azure OpenAI with a deployment of text-embedding-ada-002. Because this step creates an index, be sure to use an admin API key for your search service.

In [None]:
import os

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from dotenv import load_dotenv
from langchain_community.retrievers import AzureAISearchRetriever
from langchain_community.vectorstores import AzureSearch
from langchain_openai import AzureOpenAIEmbeddings

load_dotenv()

### Create Azure Search vector store 

We'll use an embedding model from Azure OpenAI to turn our documents into embeddings stored in the Azure AI Search vector store. We'll also set the index name to `langchain-vector-demo`. This will create a new vector store associated with that index name. We can create our embeddings and vector store either with API keys or with keyless authentication (recommended.) Both options are shown below. 

#### With API Keys 

In [None]:
embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2023-05-15",
)

vector_store = AzureSearch(
    embedding_function=embeddings.embed_query,
    azure_search_endpoint=os.getenv("AZURE_SEARCH_AI_ENDPOINT"),
    azure_search_key=os.getenv("AZURE_AI_SEARCH_API_KEY"),
    index_name="langchain-vector-demo",
)

#### With Keyless Authentication 

In [None]:
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version="2023-05-15",
    azure_ad_token_provider=token_provider,
)


vector_store = AzureSearch(
    embedding_function=embeddings.embed_query,
    azure_search_endpoint=os.getenv("AZURE_SEARCH_ENDPOINT"),
    azure_credential=DefaultAzureCredential(),
    index_name="langchain-vector-demo",
)

Next, we'll load data into our newly created vector store. For this example, we load the `state_of_the_union.txt` file. We'll split the text in 400 token chunks with no overlap. Finally, the documents are added to our vector store as emeddings.

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt", encoding="utf-8")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

vector_store.add_documents(documents=docs)

### Create retriever 

Next, we'll create a retriever. The current `index_name` variable is `langchain-vector-demo` from the retriever creation earlier in the docs. If you skipped vector store creation, provide your index name in the parameter. In this query, the top result is returned.

In [21]:
retriever = AzureAISearchRetriever(
    content_key="content", top_k=1, index_name="langchain-vector-demo"
)

Now we can retrieve the data that is relevant to our query from the documents we uploaded. 

In [None]:
answer = retriever.invoke("does the president have a plan for covid-19?")
print(answer[0].page_content)