Install Python packages
Run the following package installations. pip is the package installer for Python. You can use pip to install packages from the Python Package Index and other indexes.

In [None]:
!pip install hdbcli --break-system-packages
!pip install generative-ai-hub-sdk[all] --break-system-packages
!pip install folium --break-system-packages
!pip install ipywidgets --break-system-packages

# kernel restart required!!!

Configure SAP Generative AI Hub credentials
A configuration module has already been executed to enable access to SAP Generative AI foundation models. The detail of this configuration is outside the scope of this workshop.

However, the typical configuration is in the following format:

{
  "AICORE_AUTH_URL": "https://* * * .authentication.sap.hana.ondemand.com",
  "AICORE_CLIENT_ID": "* * * ",
  "AICORE_CLIENT_SECRET": "* * * ",
  "AICORE_RESOURCE_GROUP": "* * * ",
  "AICORE_BASE_URL": "https://api.ai.* * *.cfapps.sap.hana.ondemand.com/v2"
}

In [None]:
# Test embeddings

from gen_ai_hub.proxy.native.openai import embeddings

response = embeddings.create(
    input="SAP Generative AI Hub is awesome!",
    model_name="text-embedding-ada-002"
    
)
print(response.data)

In [None]:
# Initialize embeddings

from gen_ai_hub.proxy.langchain.init_models import init_embedding_model
embeddings = init_embedding_model('text-embedding-ada-002')

In [None]:
# Set llm

from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

proxy_client = get_proxy_client('gen-ai-hub')
llm = ChatOpenAI(proxy_model_name='gpt-4o', proxy_client=proxy_client)

Implementing RAG Embeddings
Now that all SAP Generative AI Hub configuration steps have been completed, let's continue to process the product catalog data file.

Prepare the documentation for the product catalog in CSV format with each row representing a product
This code snippet demonstrates how to load and process text data from a CSV file using the CSVLoader from the langchain.document_loaders.csv_loader module.

This process is useful for handling large text data, making it more manageable or suitable for further processing, analysis, or input into machine learning models, especially when dealing with limitations on input size.

In [None]:
# Process CSV data file

from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(
    file_path="data/new_product.csv",
    csv_args={
        "delimiter": ";",
        "quotechar": '"',
        "fieldnames": ["PRODUCT_ID","PRODUCT_NAME","CATEGORY","DESCRIPTION","UNIT_PRICE","UNIT_MEASURE","SUPPLIER_ID","SUPPLIER_NAME","LEAD_TIME_DAYS","MIN_ORDER","CURRENCY","SUPPLIER_COUNTRY","SUPPLIER_ADDRESS","AVAILABILITY_DAYS","SUPPLIER_CITY","STOCK_QUANTITY","MANUFACTURER","CITY_LAT","CITY_LONG", "RATING"],
    },
)

# Process data

text_documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
text_chunks = text_splitter.split_documents(text_documents)
print(f"Number of document chunks: {len(text_chunks)}")
# print(text_chunks)

for chunks in text_chunks:
    print(chunks.metadata)
    print(chunks.page_content)

SAP HANA Cloud vector engine
Storing vector embeddings within the same database is a strategic move that aligns seamlessly with SAP's commitment to providing a unified platform. This integration eliminates the hurdles posed by data silos, offering a holistic approach to data management. In SAP HANA Cloud, the storage of vector embeddings is seamlessly integrated into the platform's existing structure, allowing users to store them in a designated table. Developers can perform SQL-like queries effortlessly.

This means you can execute joins, apply filters, and perform selects by combining vector embeddings with various data types, including transactional, spatial, graph, and JSON data, all within the same SQL environment. The Vector Engine ensures a user-friendly experience, eliminating the need for extensive learning or the adoption of new querying methodologies. Essentially, working with vector embeddings in SAP HANA Cloud is as straightforward as crafting queries in a standard SQL database, offering familiarity and ease of use for developers.

Connect to the HANA vector storage instance and create a table to store the documentation data
The provided Python script imports database connection modules and initiates a connection to a SAP HANA Cloud instance using the dbapi module. The user is prompted to enter their username and password, which are then used to establish a secure connection to the SAP HANA Cloud database.

The langchain_community.vectorstores.hanavector library, specifically the HanaDB class, from the LangChain community, is designed to interact with vector data stored in SAP HANA Cloud database, and enables developers to utilize SAP HANA Cloud's advanced capabilities for managing and querying vector data, in the context of AI and machine learning applications.

Note Use your username and password supplied to logon to the SAP HANA Cloud database. Find the host_address in the lesson content.

In [None]:
# HC Vector Engine

from hdbcli import dbapi
from langchain_community.vectorstores.hanavector import HanaDB

host_address = input("Enter HANA Cloud Hostname")
hdb_user = input("Enter Username")
hdb_password = input("Enter Password :")

connection = dbapi.connect(
    host_address,
    port="443",
    user=hdb_user,
    password=hdb_password,
    autocommit=True,
    sslValidateCertificate=False,
)

Populate the table with data and creates a REAL_VECTOR column to store embeddings
Create a LangChain VectorStore interface for the HANA database and specify the table (collection) to use for accessing the vector embeddings. Embeddings are vector representations of text data that incorporate the semantic meaning of the text.

In [None]:
#Create a LangChain VectorStore interface for the HANA database and specify the table (collection) to use for accessing the vector embeddings
db = HanaDB(
    embedding=embeddings, connection=connection, table_name="CATALOG_UPDATED_DEV_1_"
)

In [None]:
# Delete already existing documents from the table
db.delete(filter={})

# add the loaded document chunks
db.add_documents(text_chunks)

Verify product embeddings in SAP HANA Cloud


In [None]:
# Query the table to verify embeddings
cursor = connection.cursor()
sql = f'SELECT VEC_TEXT, TO_NVARCHAR(VEC_VECTOR) FROM "{db.table_name}"'

cursor.execute(sql)
vectors = cursor.fetchall()

for vector in vectors:
    print(vector)

Enhancing Query Responses
Define a prompt template to provide context to queries
Define a prompt template to provide context to our prompts. Thus, when passed to the model, the template will add the necessary context to the prompt so that more accurate results are generated.

The answer should contain the requested information about products and their descriptions, formatted according to the specified JSON structure for further use in the SAP HANA JSON Document store.

The created template for the prompt contains two variables - context and question. These variables will be replaced with the context and question in the upcoming steps.

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

prompt_template = """use the following pieces of context to answer the question at the end. If you don't know the answer,
    just say you don't know, don't try to make up an answer. Format the results in a list of JSON items with the following keys:

        "PRODUCT_ID", 
        "PRODUCT_NAME",
        "CATEGORY",
        "DESCRIPTION",
        "UNIT_PRICE",
        "UNIT_MEASURE",
        "SUPPLIER_ID",
        "SUPPLIER_NAME",
        "LEAD_TIME_DAYS",
        "MIN_ORDER",
        "CURRENCY",
        "SUPPLIER_COUNTRY",
        "SUPPLIER_ADDRESS",
        "SUPPLIER_CITY",
        "CITY_LAT",
        "CITY_LONG",
        "RATING"
      
    
    The 'RATING' key value is an integer datatype ranging from 0 stars to 5 stars. Where 0 stars is 'bad' and 5 stars is 'excellent'. Do not include json markdown codeblock syntax in the results for example: ```json ```

    {context}

    question: {question}

    """


PROMPT = PromptTemplate(template = prompt_template, 
                        input_variables=["context", "question"]
                       )
    
chain_type_kwargs = {"prompt": PROMPT}

Create the Conversational Retrieval Chain with SAP HANA Cloud vector engine
This code snippet integrates various components from the langchain library to create a retrieval-based question-answering (QA) system. Here's a breakdown of the key parts and their functionality:

Retriever Initialization: The db.as_retriever function is used to initialize a retriever object with specific search arguments ('k':20), which likely defines the number of search results to consider.

Prompt Template : The PromptTemplate was defined in the previous step that instructs how to use the context to answer a question. It emphasizes not to fabricate answers if the information is unavailable. The template also outlines the structure for the expected JSON output with various product and supplier details.

In [None]:
question = "Find products with a rating of 4 and more."
retriever = db.as_retriever(search_kwargs={'k':20})

qa = RetrievalQA.from_chain_type(llm=llm,
                 retriever=retriever, 
                 chain_type="stuff",
                 chain_type_kwargs= chain_type_kwargs)

answer = qa.run(question)
print(answer)