# Develop a conversational chatbot for your own data platfrom   

In this lab, we are going to construct a conversational AI specifically for your data platform. The integration of such an AI into your platform can greatly enhance accessibility, efficiency, and the overall user experience. This AI simplifies the process of accessing and understanding complex data by allowing users to pose queries in natural language and receive succinct responses, thereby eliminating the need for laborious manual data sorting. Moreover, the AI has the capability to retrieve and analyze data faster than traditional methods, delivering specific metrics or analyses instantly and saving valuable time and resources. Additionally, a conversational AI offers a more interactive and engaging user experience by enabling users to 'converse' with the data, which makes the process of data analysis more intuitive and less daunting.


## Create Azure Cognitive Search Vector Store in Azure

First, we need to create an Azure Cognitive Search service in Azure, which will act as a vector store. We'll use the Azure CLI to do this.

**NOTE:** Update **`<INITIALS>`** to make the name unique.

In [None]:
# RESOURCE_GROUP="azure-cognitive-search-rg"
# LOCATION="westeurope"
# NAME="acs-vectorstore-<INITIALS>"
# !az group create --name $RESOURCE_GROUP --location $LOCATION
# !az search service create -g $RESOURCE_GROUP -n $NAME -l $LOCATION --sku Basic --partition-count 1 --replica-count 1

Next, we need to find and update the following values in the `.env` file with the Azure Cognitive Search **endpoint**, **admin key**, and **index name** values. Use the Azure Portal or CLI.

```
AZURE_COGNITIVE_SEARCH_SERVICE_NAME = "<YOUR AZURE COGNITIVE SEARCH SERVICE NAME - e.g. cognitive-search-service>"
AZURE_COGNITIVE_SEARCH_ENDPOINT_NAME = "<YOUR AZURE COGNITIVE SEARCH ENDPOINT NAME - e.g. https://cognitive-search-service.search.windows.net"
AZURE_COGNITIVE_SEARCH_INDEX_NAME = "<YOUR AZURE COGNITIVE SEARCH INDEX NAME - e.g. cognitive-search-index>"
AZURE_COGNITIVE_SEARCH_API_KEY = "<YOUR AZURE COGNITIVE SEARCH ADMIN API KEY - e.g. cognitive-search-admin-api-key>"
```

## Setup Azure OpenAI

We'll start as usual by defining our Azure OpenAI service API key and endpoint details, specifying the model deployment we want to use and then we'll initiate a connection to the Azure OpenAI service.

**NOTE**: As with previous labs, we'll use the values from the `.env` file in the root of this repository.

In [19]:
import os
from dotenv import load_dotenv

 # Load environment variables
if load_dotenv():
    print("Found OpenAPI Base Endpoint: " + os.getenv("OPENAI_API_BASE"))
else: 
    print("No file .env found")
openai_api_type = os.getenv("OPENAI_API_TYPE")
openai_api_key = os.getenv("OPENAI_API_KEY")
openai_api_base = os.getenv("OPENAI_API_BASE")
openai_api_version = os.getenv("OPENAI_API_VERSION")
deployment_name = os.getenv("OPENAI_DEPLOYMENT_NAME")
embedding_name = os.getenv("OPENAI_EMBEDDING_DEPLOYMENTE")
acs_service_name = os.getenv("AZURE_SEARCH_SERVICE_NAME")
acs_endpoint_name = os.getenv("AZURE_SEARCH_ENDPOINT")
acs_index_name = "metadata-index"
acs_api_key = os.getenv("AZURE_SEARCH_KEY")

Found OpenAPI Base Endpoint: https://trefoil.openai.azure.com/


First, we will load the data from the metadata.csv file using the Langchain CSV document loader.

In [10]:
from langchain.document_loaders.csv_loader import CSVLoader

# metadata Fields in CSV
# id,original_language,original_title,popularity,release_date,vote_average,vote_count,genre,overview,revenue,runtime,tagline
loader = CSVLoader(file_path='../data/metadata/metadata.csv',
                   source_column='GoldenDataSetName', 
                   encoding='utf-8', 
                   csv_args= {'delimiter':',', 
                              'fieldnames': ['GDSId','SourceSysId','SourceSysName','businessLine','BusinessEntity','Maturity','DataLifecycle','Location',
                                             'dataDomain','DataSubDomain','GoldenDataSetName','DataExpert','DataValidator','DataDescription','data_steward_id',
                                            'DataStewardID','data_owner_id','DataOwnerID','DataOwnerName','DataStewardName','DataClassification','LegalGroundCollection',
                                            'HistoricalData','UnlockedGDP','CIARating','NbDataElements'

                                            ]
                             }
                 )

data = loader.load()

data = data[1:51] # reduce dataset if you want
print('Loaded %s Golden DataSets' % len(data))

Loaded 50 Golden DataSets


Next, we will create an Azure OpenAI embedding and completion deployments in order to create the vector representation of the `metadata` so we can start asking our questions.

In [8]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import AzureChatOpenAI

# Create an Embeddings Instance of Azure OpenAI

embeddings = OpenAIEmbeddings(
    openai_api_base = openai_api_base,
    openai_api_version = openai_api_version,
    deployment_name ="text-embedding-ada-002",
    openai_api_key = openai_api_key,
    openai_api_type = openai_api_type,
    embedding_ctx_length=8191,
    chunk_size=1000,
    max_retries=6)

# Create a Completion Instance of Azure OpenAI
llm = AzureChatOpenAI(
    deployment_name = deployment_name,
    openai_api_type = openai_api_type,
    openai_api_version = openai_api_version,
    openai_api_base = openai_api_base,
    openai_api_key = openai_api_key,
    temperature=0.7,
    max_retries=6,
    max_tokens=4000
)

print('Completed creation of embedding and completion instances.')

Completed creation of embedding and completion instances.


## Load de metatda into Azure Cognitive Search

Next, we'll create the Azure Cognitive Search index, embed the loaded movies from the CSV file, and upload the data into the newly created index. Depending on the number of movies loaded and rate limiting, this might take a while to do the embeddings so be patient.

In [28]:
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SearchField,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    SearchIndex,
    SemanticConfiguration,
    PrioritizedFields,
    SemanticField,
    SearchField,
    SemanticSettings,
    VectorSearch,
    HnswVectorSearchAlgorithmConfiguration,
)

# Let's Create the Azure Cognitive Search Index
index_client = SearchIndexClient(
    acs_endpoint_name,
    AzureKeyCredential(acs_api_key)
)
# Golden datasets fields in CSV
# id,original_language,original_title,popularity,release_date,vote_average,vote_count,genre,overview,revenue,runtime,tagline

 # ['GDSId','SourceSysId','SourceSysName','businessLine','BusinessEntity','Maturity','DataLifecycle','Location',
 #                                           'dataDomain','DataSubDomain','GoldenDataSetName','DataExpert','DataValidator',
 #                                           'DataDescription','data_steward_id',
 #                                           'DataStewardID','data_owner_id','DataOwnerID','DataOwnerName',
 #                                          'DataStewardName','DataClassification','LegalGroundCollection',
 #                                           'HistoricalData','UnlockedGDP','CIARating','NbDataElements'

fields = [
    SimpleField(name="id", type=SearchFieldDataType.String,key=True),
    SimpleField(name="GDSId", type=SearchFieldDataType.String),
    SimpleField(name="SourceSysId", type=SearchFieldDataType.String),
    SimpleField(name="SourceSysName", type=SearchFieldDataType.String),
    SearchableField(name="businessLine", type=SearchFieldDataType.String),
    SearchableField(name="BusinessEntity", type=SearchFieldDataType.String),
    SimpleField(name="Maturity", type=SearchFieldDataType.String),
    SimpleField(name="DataLifecycle", type=SearchFieldDataType.String),
    SimpleField(name="dataDomain", type=SearchFieldDataType.String),
    SimpleField(name="DataSubDomain", type=SearchFieldDataType.String),
    SimpleField(name="DataExpert", type=SearchFieldDataType.String),
    SimpleField(name="DataValidator", type=SearchFieldDataType.String),
    SimpleField(name="data_steward_id", type=SearchFieldDataType.String),
    SimpleField(name="DataStewardID", type=SearchFieldDataType.String),
    SimpleField(name="data_owner_id", type=SearchFieldDataType.String),
    SimpleField(name="DataOwnerID", type=SearchFieldDataType.String),
    SearchableField(name="DataStewardName", type=SearchFieldDataType.String),   
    SearchableField(name="DataOwnerName", type=SearchFieldDataType.String),
    SimpleField(name="DataClassification", type=SearchFieldDataType.String),
    SearchableField(name="LegalGroundCollection", type=SearchFieldDataType.String),
    SimpleField(name="HistoricalData", type=SearchFieldDataType.String),
    SimpleField(name="UnlockedGDP", type=SearchFieldDataType.String),
    SimpleField(name="CIARating", type=SearchFieldDataType.String),
    SearchableField(name="Location", type=SearchFieldDataType.String),
    SearchableField(name="GoldenDataSetName", type=SearchFieldDataType.String),
    SearchableField(name="DataDescription", type=SearchFieldDataType.String),
    SearchableField(name="NbDataElements", type=SearchFieldDataType.Double, sortable=True),
    SearchableField(name="content", type=SearchFieldDataType.String),
    SearchField(name="content_vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), searchable=True, vector_search_dimensions=1536, vector_search_configuration="my-vector-config"),
]

# Configure Vector Search Configuration
vector_search = VectorSearch(
    algorithm_configurations=[
        HnswVectorSearchAlgorithmConfiguration(
            name="my-vector-config",
            kind="hnsw",
            parameters={
                "m": 4,
                "efConstruction": 400,
                "efSearch": 500,
                "metric": "cosine"
            }
        )
    ]
)

# Configure Semantic Configuration
semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=PrioritizedFields(
        title_field=SemanticField(field_name="DataDescription"),
        prioritized_keywords_fields=[SemanticField(field_name="DataDescription"), SemanticField(field_name="DataOwnerName")],
        prioritized_content_fields=[SemanticField(field_name="content")]
    )
)

# Create the semantic settings with the configuration
semantic_settings = SemanticSettings(configurations=[semantic_config])

# Create the search index with the desired vector search and semantic configurations
index = SearchIndex(
    name=acs_index_name,
    fields=fields,
    vector_search=vector_search,
    semantic_settings=semantic_settings
)
result = index_client.create_or_update_index(index)
print(f'The {result.name} index was created.')

The metadata-index index was created.


Next we will create the document structure needed to upload the data into the Azure Cognitive Search index.

In [30]:
# Now that the index is created, let's load the documents into it.

import uuid

# Let's take a quick look at the data structure of the CSVLoader
print(data[0])
print(data[0].metadata['source'])
print("----------")

# Generate Document Embeddings for page_content field in the movies CSVLoader dataset using Azure OpenAI
items = []
for gds in data:
    content = gds.page_content
    items.append(dict([("id", str(uuid.uuid4())), ("GoldenDataSetName", gds.metadata['source']), ("content", content), ("content_vector", embeddings.embed_query(content))]))

# Print out a sample item to validate the updated data structure.
# It should have the id, content, and content_vector values.
print(items[49])
print(f"Golden Data Set Count: {len(items)}")

page_content='GDSId: GDS98394\nSourceSysId: SYSUID.288941\nSourceSysName: Dataedo CRDM\nbusinessLine: Leasing\nBusinessEntity: Masreph\nMaturity: Prepared for distribution\nDataLifecycle: Active\nLocation: Europe\ndataDomain: Product\nDataSubDomain: Lease\nGoldenDataSetName: Enterprise Equity Segmentation Map\nDataExpert: Braxton, Eddie\nDataValidator: Hussein, Jazmyne\nDataDescription: This dataset provides a comprehensive view of enterprise equity segmentation, enabling financial institutions to identify and target high-value customers.\ndata_steward_id: 463889\nDataStewardID: DOWID384111\ndata_owner_id: 373140\nDataOwnerID: DOWID384111\nDataOwnerName: Fernandez, Chelsea\nDataStewardName: Amos, Katelyn\nDataClassification: Non-personal data\nLegalGroundCollection: Corporate restructuring and bankruptcy\nHistoricalData: No\nUnlockedGDP: Achieved (Production)\nCIARating: 1-1-1\nNbDataElements: 14' metadata={'source': 'Enterprise Equity Segmentation Map', 'row': 1}
Enterprise Equity Seg

Next we will upload the `metadata-golden datasets` documents in the newly created structure to the Azure Cognitive Search index.

In [31]:
# Upload golden data sets metadata to Azure Cognitive Search index.
from azure.search.documents.models import Vector
from azure.search.documents import SearchClient

# Insert Text and Embeddings into the Azure Cognitive Search index created.
search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)
result = search_client.upload_documents(items)
print("Successfully added documents to Azure Cognitive Search index.")
print(f"Uploaded {len(data)} documents")

Successfully added documents to Azure Cognitive Search index.
Uploaded 50 documents


## Vector Store Searching using Azure Cognitive Search

Now that we have the movies loaded into Azure Cognitive Search, let's do some different types of searches using the Azure Cognitive Search SDK.

In [32]:
# First, let's do a plain vanilla text search, no vectors or embeddings.
query = "Who is the data owner of the dataset `Finance Claims Analytics'"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# Execute the search
results = list(search_client.search(
    search_text=query,
    include_total_count=True,
    top=5
))

# Print count of total results.
print(f"Returned {len(results)} results using only text-based search.")
print("----------")
# Iterate over Results
# Index Fields - id, content, content_vector
for result in results:
    print("Golden DataSet: {}".format(result["content"]))
    print("----------")

Returned 5 results using only text-based search.
----------
Golden DataSet: GDSId: GDS86247
SourceSysId: SYSUID.326317
SourceSysName: ICNL
businessLine: Consumer Finance
BusinessEntity: Life Insurance
Maturity: Catalogued for processing
DataLifecycle: Active
Location: NA
dataDomain: Product
DataSubDomain: Insurances
GoldenDataSetName: Finance Claims Analytics
DataExpert: Bustos, Michael
DataValidator: el-Nour, Samraa
DataDescription: This dataset provides insights into the analysis of financial claims, helping businesses make informed decisions and mitigate risks.
data_steward_id: 386854
DataStewardID: DOWID566674
data_owner_id: 223242
DataOwnerID: DOWID566674
DataOwnerName: Halliburton, Shavawn
DataStewardName: Tran, Madina
DataClassification: Natural data
LegalGroundCollection: Provision of financial products and services
HistoricalData: Unsure
UnlockedGDP: Achieved (Production)
CIARating: 1-1-1
NbDataElements: 39
----------
Golden DataSet: GDSId: GDS99585
SourceSysId: SYSUID.351105


In [33]:
# Now let's do a vector search that uses the embeddings we created and inserted into content_vector field in the index.
query = "Who is the data owner of the dataset `Finance Claims Analytics'"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=3,
    fields="content_vector"
)

# Execute the search
results = list(search_client.search(
    search_text="",
    include_total_count=True,
    vectors=[vector],
    select=["id", "content", "GoldenDataSetName"],
))

# Print count of total results.
print(f"Returned {len(results)} results using only vector-based search.")
print("----------")
# Iterate over results and print out the content.
for result in results:
    print(result["GoldenDataSetName"])
    print("----------")

Returned 3 results using only vector-based search.
----------
Finance Claims Analytics
----------
Finance Email Analytics
----------
CreditRisk Analytics Dataset
----------


Did that return what you expected? Probably not, let's dig deeper to see why.

Let's do the same search again, but this time let's return the **Search Score** so we can see the value returned by the cosine similarity vector store calculation.

In [34]:
# Try again, but this time let's add the relevance score to maybe see why
query = "Who is the data owner of the dataset `Finance Claims Analytics'"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=3,
    fields="content_vector"
)

# Execute the search
results = list(search_client.search(
    search_text="",
    include_total_count=True,
    vectors=[vector],
    select=["id", "content", "GoldenDataSetName"],
))

# Print count of total results.
print(f"Returned {len(results)} results using vector search.")
print("----------")
# Iterate over results and print out the id and search score.
for result in results:  
    print(f"Id: {result['id']}")
    print(f"Id: {result['GoldenDataSetName']}")
    print(f"Score: {result['@search.score']}")
    print("----------")

Returned 3 results using vector search.
----------
Id: 5a3f1b14-ba12-47d8-90bf-9331759f831c
Id: Finance Claims Analytics
Score: 0.84391737
----------
Id: 57675d05-34f7-45d2-b008-40897caec5df
Id: Finance Email Analytics
Score: 0.8410209
----------
Id: c29c2bf0-453a-42ae-8bad-d61e08f6fbc6
Id: CreditRisk Analytics Dataset
Score: 0.8392132
----------


If you look at the Search Score you will see the relevant ranking of the closest vector match to the query inputted. The lower the score the farther apart the two vectors are. Let's change the search term and see if we can get a higher Search Score which means a higher match and closer vector proximity.

In [35]:
# Try again, but this time let's add the relevance score to maybe see why
query = "Who is the data owner of the dataset `Finance Claims Analytics'"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=3,
    fields="content_vector"
)

# Execute the search
results = list(search_client.search(
    search_text="",
    include_total_count=True,
    vectors=[vector],
    select=["id", "content", "GoldenDataSetName"],
))

# Print count of total results.
print(f"Returned {len(results)} results using vector search.")
print("----------")
# Iterate over results and print out the id and search score.
for result in results:  
    print(f"Id: {result['id']}")
    print(f"Id: {result['GoldenDataSetName']}")
    print(f"Score: {result['@search.score']}")
    print("----------")

Returned 3 results using vector search.
----------
Id: 5a3f1b14-ba12-47d8-90bf-9331759f831c
Id: Finance Claims Analytics
Score: 0.84391737
----------
Id: 57675d05-34f7-45d2-b008-40897caec5df
Id: Finance Email Analytics
Score: 0.8410209
----------
Id: c29c2bf0-453a-42ae-8bad-d61e08f6fbc6
Id: CreditRisk Analytics Dataset
Score: 0.8392132
----------


**NOTE:** As you have seen from the results, different inputs can return different results, it all depends on what data is in the Vector Store. The higher the score the higher the likelihood of a match.

## Hybrid Searching using Azure Cognitive Search

What is Hybrid Search? The search is implemented at the field level, which means you can build queries that include vector fields and searchable text fields. The queries execute in parallel and the results are merged into a single response. Optionally, add semantic search, currently in preview, for even more accuracy with L2 reranking using the same language models that power Bing.

**NOTE:** Hybrid Search is a key value proposition of Azure Cognitive Search in comparison to vector only data stores. Click [Hybrid Search](https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview) for more details.

In [36]:
# Hybrid Search
# Let's try our original query again using Hybrid Search (ie. Combination of Text & Vector Search)
query = "Who is the data owner of the dataset `Finance Claims Analytics'"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=3,
    fields="content_vector"
)

# Notice we also fill in the search_text parameter with the query.
results = list(search_client.search(
    search_text=query,
    include_total_count=True,
    top=5,
    vectors=[vector],
    select=["id", "content", "GoldenDataSetName"],
))

# Print count of total results.
print(f"Returned {len(results)} results using vector search.")
print("----------")
# Iterate over results and print out the id and search score.
for result in results:  
    print(f"Id: {result['id']}")
    print(result['GoldenDataSetName'])
    print(f"Hybrid Search Score: {result['@search.score']}")
    print("----------")

Returned 5 results using vector search.
----------
Id: 5a3f1b14-ba12-47d8-90bf-9331759f831c
Finance Claims Analytics
Hybrid Search Score: 0.03333333507180214
----------
Id: 57675d05-34f7-45d2-b008-40897caec5df
Finance Email Analytics
Hybrid Search Score: 0.032786883413791656
----------
Id: c29c2bf0-453a-42ae-8bad-d61e08f6fbc6
CreditRisk Analytics Dataset
Hybrid Search Score: 0.032258063554763794
----------
Id: 814581fe-2a17-47f4-ac1f-7a61eff8f561
RiskConnect Finance Dataset
Hybrid Search Score: 0.01587301678955555
----------
Id: 613a77d6-885c-4a21-a474-8fb869c71bf9
Credit Finance Agreement Dataset
Hybrid Search Score: 0.015625
----------


In [37]:
# Hybrid Search
# Let's try our more specific query again to see the difference in the score returned.
query = "What is the Enterprise Equity Segmentation Map dataset?"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=5,
    fields="content_vector"
)

# -----
# Notice we also fill in the search_text parameter with the query along with the vector.
# -----
results = list(search_client.search(
    search_text=query,
    include_total_count=True,
    top=10,
    vectors=[vector],
    select=["id", "content", "GoldenDataSetName"],
))

# Print count of total results.
print(f"Returned {len(results)} results using hybrid search.")
print("----------")
# Iterate over results and print out the id and search score.
for result in results:  
    print(f"Id: {result['id']}")
    print(f"Title: {result['GoldenDataSetName']}")
    print(f"Hybrid Search Score: {result['@search.score']}")
    print("----------")

Returned 10 results using hybrid search.
----------
Id: 5cff76ad-7fea-4fe6-8ae8-89505d2be04e
Title: Enterprise Equity Segmentation Map
Hybrid Search Score: 0.03333333507180214
----------
Id: c29c2bf0-453a-42ae-8bad-d61e08f6fbc6
Title: CreditRisk Analytics Dataset
Hybrid Search Score: 0.03100961446762085
----------
Id: 6530c61a-0c1f-4bd2-99c3-b8a4f93413f0
Title: EcoSector Classification Dataset
Hybrid Search Score: 0.02736498787999153
----------
Id: 0a61077c-616f-4a31-b8f0-f79615ae4a74
Title: Finance Regions Mapping
Hybrid Search Score: 0.025739235803484917
----------
Id: 76c39104-2a47-4fe3-be66-f9b7498f7423
Title: Finance Exposure Data
Hybrid Search Score: 0.025581754744052887
----------
Id: 613a77d6-885c-4a21-a474-8fb869c71bf9
Title: Credit Finance Agreement Dataset
Hybrid Search Score: 0.016393441706895828
----------
Id: 34840fdd-e67b-4a49-8495-2208352e262e
Title: EC Finance Codes Dataset
Hybrid Search Score: 0.016129031777381897
----------
Id: 814581fe-2a17-47f4-ac1f-7a61eff8f561
Ti

## Bringing it All Together with Retrieval Augmented Generation (RAG) + Langchain (LC)

Now that we have our Vector Store setup and data loaded, we are now ready to implement the RAG pattern using AI Orchestration. At a high-level, the following steps are required:
1. Ask the question
2. Create Prompt Template with inputs
3. Get Embedding representation of inputted question
4. Use embedded version of the question to search Azure Cognitive Search (ie. The Vector Store)
5. Inject the results of the search into the Prompt Template & Execute the Prompt to get the completion

In [None]:
# Implement RAG using Langchain (LC)

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain

# Setup Langchain
# Create an Embeddings Instance of Azure OpenAI
embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    deployment=embedding_name,
    openai_api_type = openai_api_type,
    openai_api_version = openai_api_version,
    openai_api_base = openai_api_base,
    openai_api_key = openai_api_key,
    embedding_ctx_length=8191,
    chunk_size=1000,
    max_retries=6
)

# Create a Completion Instance of Azure OpenAI
llm = AzureChatOpenAI(
    model="gpt-3.5-turbo",
    deployment_name = deployment_name,
    openai_api_type = openai_api_type,
    openai_api_version = openai_api_version,
    openai_api_base = openai_api_base,
    openai_api_key = openai_api_key,
    temperature=0.7,
    max_retries=6,
    max_tokens=4000
)

# Ask the question
question = "What is the Enterprise Equity Segmentation Map dataset?"

# Create a prompt template with variables, note the curly braces
from langchain.prompts import PromptTemplate
prompt = PromptTemplate(
    input_variables=["original_question","search_results"],
    template="""
    Question: {original_question}

    Do not use any other data.
    Only use the metadata data below when responding.
    {search_results}
    """,
)

# Get Embedding for the original question
question_embedded=embeddings.embed_query(question)

# Search Vector Store
search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)
vector = Vector(
    value=question_embedded,
    k=5,
    fields="content_vector"
)
results = list(search_client.search(
    search_text="",
    include_total_count=True,
    vectors=[vector],
    select=["title"],
))

# Build the Prompt and Execute against the Azure OpenAI to get the completion
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
response = chain.run({"original_question": question, "search_results": results})
print(response)