* see: https://python.langchain.com/docs/use_cases/question_answering/

In [1]:
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# two possible vector store
from langchain.vectorstores import Chroma
from langchain.vectorstores import FAISS

# removed OpenAI, using HF
from langchain.embeddings import HuggingFaceEmbeddings

from langchain import hub

# removed OpenAI, using OCI GenAI
import oci

# oci_llm is in a local file
from oci_llm import OCIGenAILLM

from langchain.schema.runnable import RunnablePassthrough

# private configs
from config_private import COMPARTMENT_OCID

In [2]:
# to enable some debugging
DEBUG = False

In [3]:
# read OCI config to connect to OCI with API key
CONFIG_PROFILE = "DEFAULT"
config = oci.config.from_file("~/.oci/config", CONFIG_PROFILE)

# OCI GenAI endpoint (for now Chicago)
ENDPOINT = "https://generativeai.aiservice.us-chicago-1.oci.oraclecloud.com"

# check the config to access to api keys
if DEBUG:
    print(config)

#### Loading the document

In [4]:
# BLOG_POST = "https://python.langchain.com/docs/get_started/introduction"
BLOG_POST = "https://blogs.oracle.com/database/post/oracle-database-23c-the-next-long-term-support-release"
loader = WebBaseLoader(BLOG_POST)

data = loader.load()

#### Splitting the document in chunks

In [5]:
CHUNK_SIZE = 512

text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=0)

splits = text_splitter.split_documents(data)

In [6]:
print(f"We have {len(splits)} splits...")

We have 72 splits...


In [7]:
# have a look at a single split
if DEBUG:
    print(splits[1])

#### Embeddings and Vectore Store

In [8]:
# We have substituted OpenAI with HF
EMBED_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"

model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": False}


hf = HuggingFaceEmbeddings(
    model_name=EMBED_MODEL_NAME, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

# using Chroma or FAISS as Vector store
# vectorstore = Chroma.from_documents(documents=splits,
#                                    embedding=hf)
vectorstore = FAISS.from_documents(documents=splits, embedding=hf)

retriever = vectorstore.as_retriever()

#### Define the prompt structure

In [9]:
rag_prompt = hub.pull("rlm/rag-prompt")

#### Define the LLM

In [10]:
# compartment OCID from config_private.py

llm = OCIGenAILLM(
    temperature=1,
    max_tokens=1000,
    config=config,
    compartment_id=COMPARTMENT_OCID,
    endpoint=ENDPOINT,
    debug=DEBUG,
)

#### Define the (Lang)Chain

In [11]:
rag_chain = {"context": retriever, "question": RunnablePassthrough()} | rag_prompt | llm

#### Process the question

In [12]:
# a list of possible questions
QUESTION1 = "What is the best architecture for an LLM?"
QUESTION2 = "What is LangChain?"
QUESTION3 = "Make a list of database 23c innovations in AI"
QUESTION4 = "Can we use natural language to make queries in Oracle  database 23c?"

In [13]:
%%time

# the question
QUESTION = QUESTION4

response = rag_chain.invoke(QUESTION)

print("The response:")
print(response)
print()

The response:
 Oracle Database 23c does support natural language queries. One can "chat" in natural language with their Oracle Database (or via APIs) to get it to answer complex queries using the context of their data.

CPU times: user 96.2 ms, sys: 41.4 ms, total: 138 ms
Wall time: 2.33 s


#### Explore the vectore store

In [22]:
# Retrieve relevant splits for any question using similarity search.

# This is simply "top K" retrieval where we select documents based on embedding similarity to the query.

TOP_K = 5

docs = vectorstore.similarity_search(QUESTION, k=TOP_K)

len(docs)

5

In [23]:
for doc in docs:
    print(doc.page_content)
    print()

conjunction with your databases without compromising the security of your data. You can now “chat” in natural language with your Oracle Database (or via APIs) to get it to answer complex queries using the context of your data. Over the coming months, we plan to extend this AI capability to allow you to generate code for all aspects of the database and development tools.

Recently, things have taken an enormous step forward in the world of AI with massive advances in Generative AI and Large Language Models (LLM) such as GPT-3. We are just beginning to feel the impact of these changes as they start to lead to advancements in AI-powered applications across education, healthcare, business, and more. To assist developers and analysts in exploiting these recent changes in the Oracle Database, we are introducing new functionality that allows you to access the power of LLMs in

Oracle Database 23c: The Next Long Term Support Release





























































