Using langchain's vectore store plugin for HANA Vector Engine
to store embeddings generated by AI Core.

Prerequisites:
- langchain >= 0.1.4
- generative-ai-hub-sdk 1.2.0
- openAI ada deployment on AI Core

See:<br>
https://pypi.org/project/generative-ai-hub-sdk/<br>
https://github.wdf.sap.corp/AI/generative-ai-hub-sdk/blob/main/docs/gen_ai_hub/examples/gen_ai_hub.ipynb<br>
https://python.langchain.com/docs/integrations/vectorstores/sap_hanavector<br>


In [1]:
import langchain
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter

import langchain_community
from langchain_community.document_loaders import TextLoader, PyPDFLoader, PyPDFDirectoryLoader, SitemapLoader
from langchain_community.vectorstores.hanavector import HanaDB

from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
import nest_asyncio

nest_asyncio.apply()

print('langchain version:', langchain.__version__)
print('langchain_community version:', langchain_community.__version__)
# How to get the gen Ai Hub SDK version?

langchain version: 0.1.6
langchain_community version: 0.0.19


In [4]:
# using langchain to read and split the doc

filePath = ".data/..."
# text_documents = TextLoader("data/state_of_the_union.txt").load()
loader = PyPDFDirectoryLoader(filePath)   
documents = loader.load()
#Load document 


text_splitter = RecursiveCharacterTextSplitter(
        chunk_size = 500,
        chunk_overlap = 10,
        separators=["\n\n", "\n", " ", ""]
    )
text_chunks = text_splitter.split_documents(documents)
print(f"Number of document chunks: {len(text_chunks)}")

# using ai core to embed
embeddings = OpenAIEmbeddings(proxy_model_name='text-embedding-ada-002')


TypeError: 'PyPDFDirectoryLoader' object is not iterable

In [14]:
# Creating a connection using hana-ml
from hana_ml import ConnectionContext
# cc = ConnectionContext(userkey='VDB_BETA', encrypt=True)
cc= ConnectionContext(
    address='[somehost].hanacloud.ondemand.com', 
    port='443', 
    user='[your user]', 
    password='[your password]', 
    encrypt=True
    )
connection = cc.connection

print(cc.hana_version())
print(cc.get_current_schema())

4.00.000.00.1708429435 (fa/CE2024.2)
RODRIGO


In [15]:
# creates a table if not exists
db = HanaDB(
    embedding=embeddings, connection=connection, table_name="PDF_SAMPLE"
)

In [16]:
# Delete already existing documents from the table
# db.delete(filter={})

# add the loaded document chunks
db.add_documents(text_chunks)

[]

In [17]:
# take a look at the table
hdf = cc.sql(''' SELECT "VEC_TEXT", "VEC_META", TO_NVARCHAR("VEC_VECTOR") AS "VEC_VECTOR" FROM "PDF_SAMPLE" ''')
df = hdf.head(5).collect()
df


Unnamed: 0,VEC_TEXT,VEC_META,VEC_VECTOR
0,"PublicMay 2023Torsten Ammon, Head of SAP Datas...","{""source"": ""/Users/rodrigofior/Library/CloudSt...","[-0.0010294608,-0.022634504,0.00949013,-0.0238..."
1,2 PublicDisclaimer\nThe information in this pr...,"{""source"": ""/Users/rodrigofior/Library/CloudSt...","[0.0067297798,-0.019288087,-0.0057627438,-0.02..."
2,"3 Publicintelligent, \nsustainable \nenterprise","{""source"": ""/Users/rodrigofior/Library/CloudSt...","[0.004607734,-0.0124952085,0.0055064764,0.0049..."
3,4 Public\nSAP BTP is the \nfoundation \nRun wi...,"{""source"": ""/Users/rodrigofior/Library/CloudSt...","[0.0023692,-0.012698081,-0.0032368677,-0.02945..."
4,5 PublicSAP Business Technology Platform\nBusi...,"{""source"": ""/Users/rodrigofior/Library/CloudSt...","[-0.0010066105,-0.024821356,0.005258138,-0.025..."


In [25]:
query = "Capabilities of SAP Logistics Business Network"
docs = db.similarity_search(query, k=1)




context = ""
for doc in docs:
    print("-" * 80)
    ##print(doc.page_content)
    context = context + doc.page_content + " "

print (context)

--------------------------------------------------------------------------------
5 PUBLIC ©2020 SAP SE or an SAP affiliate company. All rights reserved.  ǀSAP Logistics Business Network
Increased business speed through an always on, secure network of networks
•Onboard once –
collaborate with many, 
anywhere and anytime
•Allow different stake -
holders to consume 
logistics services and 
share insights
•Integrate with SAP 
digital core by design
Robust, scalable 
cloud service with 
global coverageConnect multiple business 
partners for inter -company 
collaboration and 
transparencyStandardized services 
for logistics 
collaboration and 
insights
©2020 SAP SE or an SAP affiliate company. All rights reserved.  |  PUBLIC 5CarriersShippersGlobal Multi -party
Interoperable 
Other 
Networks
CustomersSuppliers
Freight 
Forwarders
…Track and 
Trace
Connectivity & PlatformLogistics
Business Network
Freight 
CollaborationGlobal Track 
and TraceCapability
Multi -modal Intelligent Suite
Material 

In [26]:
promptTemplate_fstring = """
You are an Analyzing Given context.
You are provided multiple context items that are related to the prompt you have to answer.
Use the following pieces of context to answer the question at the end with no more than 3 sentences. If the query is not part of the given context, reply 'not under my scope'

Context:
{context}

Question:
{query}
"""

In [27]:

from langchain.prompts import PromptTemplate
promptTemplate = PromptTemplate.from_template(promptTemplate_fstring)
 

In [28]:
from gen_ai_hub.proxy.langchain import ChatOpenAI
llm = ChatOpenAI(proxy_model_name='gpt-35-turbo', temperature=0)
prompt = promptTemplate.format(query=query, context=context)
response = llm.predict(prompt)

print (response)

The capabilities of SAP Logistics Business Network include the ability to onboard once and collaborate with multiple stakeholders, the integration with SAP digital core, standardized services for logistics collaboration and insights, and the provision of a robust and scalable cloud service with global coverage. Additionally, it allows for the connection of multiple business partners for inter-company collaboration and transparency, and offers global track and trace capabilities.
