## Load our environment variables and Connect to our Opensearch Service

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

from opensearchpy import OpenSearch

connection_string = os.getenv("OPENSEARCH_SERVICE_URI")

# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(connection_string, use_ssl=True, timeout=100)
client.info()

In [None]:
from pprint import pprint

index_name = "openai_wikipedia_index"

# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(connection_string, use_ssl=True, timeout=100)

res = client.search(index=index_name, body={
    "_source": {},
    "query": {
        "match": {
            "text": {
                "query": "Pizza"
            }
        }
    }
})

pprint(res["hits"]["hits"][0]['_source']['title'])

## Run semantic search queries with LangChain and OpenSearch

With the above embedding calculated, we can now run semantic searches against the OpenSearch index. We're using `knn` as query type and scan the content of the `content_vector` field.

After running the block below, we should see content semantically similar to the question. Expect documents based on Pineapples, Pizza, Hawaii, Italy, etc.

In [None]:
from pprint import pprint
from dotenv import load_dotenv
from langchain_community.vectorstores import OpenSearchVectorSearch
from langchain_openai import OpenAIEmbeddings

load_dotenv()

embeddings = OpenAIEmbeddings()

docsearch = OpenSearchVectorSearch(
    index_name=index_name,
    embedding_function=embeddings,
    opensearch_url=connection_string,
)

# Define question
question = 'is Spam and Salami a good ingredient for Pizza?'

docs = docsearch.similarity_search(
    question,
    vector_field='content_vector',
    top_k=5
)

data = ""

for doc in docs:
  data += doc.page_content + "\n\n"

pprint(data)

In [None]:
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

llm = ChatOpenAI()
question = 'is Spam and Salami a good ingredient for Pizza?'

prompt_no_data = PromptTemplate(
    input_variables=["question"],
    template="""
Answer the following question: {question}
Avoid mentioning terms in preference but focus your answer on if other successful restaurant as used the same ingredients in the question.
"""
)

prompt = PromptTemplate(
    input_variables=["question", "data"],
    template="""
Answer the following question: {question} using the data below:
{data}

Avoid mentioning terms in preference but focus your answer on if other successful restaurant including their name if possible as used the same ingredients in the question.
""")

chain = LLMChain(llm=llm, prompt=prompt_no_data)
# pprint(chain.run({"question": question}))
pprint(chain.run({"question": question, "data":data}))

## Conclusion

OpenSearch is a powerful tool providing both text and vector search capabilities. Used alongside LangChain allows you to craft personalized AI applications able to augment the context based on semantic search. LangChain's extensive modularity allows you to choose your

You can try Aiven for OpenSearch, or any of the other Open Source tools, in the Aiven platform free trial by [signing up](https://go.aiven.io/openai-opensearch-signup).