# SAP HANA Cloud Vector Engine

>SAP HANA Cloud Vector Engine is a vector store fully integrated into the SAP HANA Cloud database.

Installation of the HANA database driver.

In [None]:
# Pip install necessary package
%pip install --upgrade --quiet  hdbcli

To use `OpenAIEmbeddings` so we have to get the OpenAI API Key.

In [1]:
import os
# Use OPENAI_API_KEY env variable 
# os.environ["OPENAI_API_KEY"] = "Your OpenAI API key"

Load the sample document "state_of_the_union.txt" and create chunks from it

In [38]:
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.hanavector import HanaDB
from langchain_openai import OpenAIEmbeddings

text_documents = TextLoader("../../modules/state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
text_chunks = text_splitter.split_documents(text_documents)
print(f"Number of document chunks: {len(text_chunks)}")

embeddings = OpenAIEmbeddings()

Number of document chunks: 88


Create a database connection to a HANA Cloud instance

In [39]:
from hdbcli import dbapi

# Use connection settings from the environment
connection = dbapi.connect(
    address=os.environ.get("HANA_DB_ADDRESS"),
    port=os.environ.get("HANA_DB_PORT"),
    user=os.environ.get("HANA_DB_USER"),
    password=os.environ.get("HANA_DB_PASSWORD"),
    autocommit=True,
    sslValidateCertificate=False,
)

Create a LangChain VectorStore interface for the HANA database and specify the table (collection) to use for accessing the vector embeddings

In [40]:
db = HanaDB(
    embedding=embeddings,
    connection=connection,
    table_name = "STATE_OF_THE_UNION"
)

Add the loaded document chunks into the table. For this example, we delete any previos content from the table which might exist from previous runs.

In [41]:
# Delete already existing documents from the table
db.delete(filter={})

# add the loaded document chunks
db.add_documents(text_chunks)

[]

Perform a query to get the two best matching document chunks from the ones that we added in the previous step.
By default "Cosine Similarity" is used for the search.

In [43]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=2)

for doc in docs:
    print("-" * 80)
    print(doc.page_content)

--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. 

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.


Query the same content with "Euclidian Distance". The results shoud be the same as with "Cosine Similarity"

In [45]:
from langchain_community.vectorstores.utils import DistanceStrategy
db = HanaDB(
    embedding=embeddings,
    connection=connection,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
    table_name = "STATE_OF_THE_UNION"
)

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=2)
for doc in docs:
    print("-" * 80)
    print(doc.page_content)

--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. 

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.


Maximal Marginal Relevance Search (MMR)

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

In [46]:
docs = db.max_marginal_relevance_search(query, k=2, fetch_k=20)
for doc in docs:
    print("-" * 80)
    print(doc.page_content)


--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. 

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. 

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.


## Basic Vectorstore Operations

In [12]:
db = HanaDB(
    connection=connection,
    embedding=embeddings,
    table_name = "LANGCHAIN_DEMO_BASIC"
)

# Delete already existing documents from the table
db.delete(filter={})

True

### Add plain documents
We can add documents to the existing table.

In [13]:
docs = [Document(page_content="plain"), Document(page_content="docs")]
db.add_documents(docs)

[]

Add documents with metadata

In [14]:
docs = [Document(page_content="foo", metadata={"start": 100, "end": 150, "doc_name": "foo.txt", "quality": "bad"}), 
        Document(page_content="bar", metadata={"start": 200, "end": 250, "doc_name": "bar.txt", "quality": "good"})]
db.add_documents(docs)

[]

Query documents with specific metadata

In [15]:
docs = db.similarity_search("foobar", k=2, filter={"quality": "bad"})
# With filtering on "quality"=="bad", only one document should be returned
for doc in docs:
    print("-" * 80)
    print(doc.page_content)
    print(doc.metadata)

--------------------------------------------------------------------------------
foo
{'start': 100, 'end': 150, 'doc_name': 'foo.txt', 'quality': 'bad'}


### Using a VectorStore as a Retriever in Chains for retrieval augmented generation (RAG)


In [32]:
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory

# Access the vector DB with a new table
db = HanaDB(
    connection=connection,
    embedding=embeddings,
    table_name = "LANGCHAIN_DEMO_RETRIEVAL_CHAIN"
)

# Delete already existing entries from the table
db.delete(filter={})

# add the loaded document chunks from the "State Of The Union" file
db.add_documents(text_chunks)

# Create a retriever instance of the vector store
retriever = db.as_retriever()

#### Define the prompt

In [33]:
from langchain.prompts import PromptTemplate
prompt_template = '''
You are an expert state of the union topics. You are provided multiple context items that are related to the prompt you have to answer.
Use the following pieces of context to answer the question at the end.

```
{context}
```

Question: {question}
'''

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT}

#### Create the ConversationalRetrievalChain which handles the chat history and the retrieval of similar document chunks to be added to the prompt

In [34]:
from langchain.chains import ConversationalRetrievalChain

llm = ChatOpenAI(model_name='gpt-3.5-turbo')
memory = ConversationBufferMemory(memory_key="chat_history", output_key='answer', return_messages=True)
qa_chain = ConversationalRetrievalChain.from_llm(
    llm,
    db.as_retriever(search_kwargs={'k': 5}),
    return_source_documents=True,
    memory=memory,
    verbose=False,
    combine_docs_chain_kwargs={'prompt': PROMPT})

#### Ask the first question (and verify how many text chunks have been used)

In [35]:
question = "What about Mexico and Guatemala?"

result = qa_chain({"question": question})
print('Answer from LLM:')
print('================')
print(result["answer"])

source_docs = result["source_documents"]
print('================')
print(f"Number of used source document chunks: {len(source_docs)}")


Answer from LLM:
Mexico and Guatemala are mentioned in the context as partners in joint patrols to catch more human traffickers. This implies that both countries are working together with the United States to address the issue of human trafficking.
Number of used source document chunks: 5


Examine the used chunks of the Chain in detail. Check if the besr ranked chunk contains info about "Mexico and Guatemala" as mentioned in the question

In [36]:
for doc in source_docs:
    print("-" * 80)
    print(doc.page_content)
    print(doc.metadata)

--------------------------------------------------------------------------------
We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. 

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
{'source': '../../modules/state_of_the_union.txt'}
--------------------------------------------------------------------------------
We can do all this while keeping lit the torch of liberty that has led generations of immigrants to this land—my forefathers and so many of yours. 

Provide a pathway to citizenship for Dreamers, those on temporary status, farm workers, and essential workers. 

Revise our laws so businesses have the worker

Ask another question on the same conversational chain. The answer should relate to the previos answer given.

In [37]:
question = "What about other countries?"

result = qa_chain({"question": question})
print('Answer from LLM:')
print('================')
print(result["answer"])


Answer from LLM:
No, there are no other countries mentioned in the context as partners in joint patrols to catch human traffickers.
