### Oracle Vector DB wrapped as a llama-index custom Vector Store

* inspired by: https://docs.llamaindex.ai/en/stable/examples/low_level/vector_store.html
* updated after **OCI GenAI GA**

In this **first demo** we show:
* how to embed a Text using OCI GenAI Embeddings (Cohere V3)
* How to query the Oracle AI Vector Store
* How to create a simplified QA retriever using LlamaIndex

In [1]:
import logging
import sys
import numpy as np

from typing import List, Any, Optional, Dict, Tuple
from llama_index.vector_stores.types import (
    VectorStore,
    VectorStoreQuery,
    VectorStoreQueryResult,
)
from llama_index import StorageContext, VectorStoreIndex, ServiceContext
from llama_index.schema import TextNode, BaseNode, Document

import oci
import ads

# only
import oracledb
from oci_utils import load_oci_config
from ads.llm import GenerativeAIEmbeddings, GenerativeAI
from oracle_vector_db import OracleVectorStore

from config import EMBED_MODEL
from config_private import COMPARTMENT_OCID

In [2]:
# this is the endpoint after GA, for now Chicago Region
ENDPOINT = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"

In [3]:
# version I'm using
print(f"oracledb version: {oracledb.__version__}")
print(f"oci version: {oci.__version__}")

oracledb version: 2.0.0.dev20231121
oci version: 2.119.1


In [4]:
# for debugging
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [5]:
# setup
oci_config = load_oci_config()

# need to do this way
api_keys_config = ads.auth.api_keys(oci_config)

# english, or for other language use: multilingual

embed_model = GenerativeAIEmbeddings(
    compartment_id=COMPARTMENT_OCID,
    model=EMBED_MODEL,
    auth=api_keys_config,
    # Optionally you can specify keyword arguments for the OCI client, e.g. service_endpoint.
    client_kwargs={"service_endpoint": ENDPOINT},
)

#### Using the wrapper for the DB Vector Store

In [6]:
v_store = OracleVectorStore(verbose=True)

In [7]:
question = (
    # "What is JSON Relational Duality in Oracle Database 23c? Explain with details"
    "What is Oracle Data Guard? Can it be used for Disaster recovery"
)

In [8]:
# embed the query using OCI GenAI

query_embedding = embed_model.embed_documents([question])[0]

#  wrap in llama-index
query_obj = VectorStoreQuery(query_embedding=query_embedding, similarity_top_k=6)

In [9]:
np.array(query_embedding)

array([ 0.01454926,  0.03277588,  0.00219727, ...,  0.0045433 ,
        0.03466797, -0.02716065])

#### Use our Vector Store DB

In [10]:
%%time

q_result = v_store.query(query_obj)

2024-02-18 22:15:57,745 - INFO - ---> Calling query on DB
2024-02-18 22:15:57,978 - INFO - select: select V.id, C.CHUNK, C.PAGE_NUM, 
                            ROUND(VECTOR_DISTANCE(V.VEC, :1, DOT), 3) as d,
                            B.NAME 
                            from VECTORS V, CHUNKS C, BOOKS B
                            where C.ID = V.ID and
                            C.BOOK_ID = B.ID
                            order by d
                            FETCH FIRST 6 ROWS ONLY
2024-02-18 22:15:58,278 - INFO - Query duration: 0.5 sec.


CPU times: user 29.1 ms, sys: 5.47 ms, total: 34.5 ms
Wall time: 537 ms


#### Displays results

In [11]:
for n, id, sim in zip(q_result.nodes, q_result.ids, q_result.similarities):
    print(f"Doc. id: {id}")
    print(f"Similarity: {-sim}")
    print(n.text)
    print(n.metadata)
    print("")

Doc. id: 00b4673ed5c0292f90910328363940db9ae24d2c3ce251327486652b581d5db4
Similarity: 0.693
Part IV Oracle Data Guard Best Practices •Overview of MAA Best Practices for Oracle Data Guard •Plan an Oracle Data Guard Deployment •Configure and Deploy Oracle Data Guard •Tune and Troubleshoot Oracle Data Guard •Monitor an Oracle Data Guard Configuration 
{'file_name': 'high-availability-23c.pdf', 'page_label': '1'}

Doc. id: 9cba3643f9001a783c80e94c93b17ce4dae9bc794d99de7c2e161fea3dd6f151
Similarity: 0.662
Part II Oracle Database High Availability Best Practices •Overview of Oracle Database High Availability Best Practices •Oracle Database Configuration Best Practices •Oracle Flashback Best Practices 
{'file_name': 'high-availability-23c.pdf', 'page_label': '1'}

Doc. id: 654e1cb02d94374751947e659973a253ba94bcc32e7f7c4db6d10e217d86c357
Similarity: 0.651
7 High Availability Data Guard Oracle Data Guard Redo Decryption for Hybrid Disaster Recovery Configurations Oracle Data Guard now provides 

#### Integrate in the RAG picture

In [12]:
# instantiate the client for the LLM
llm_oci = GenerativeAI(
    compartment_id=COMPARTMENT_OCID,
    max_tokens=1024,
    # Optionally you can specify keyword arguments for the OCI client, e.g. service_endpoint.
    client_kwargs={"service_endpoint": ENDPOINT},
)

In [13]:
service_context = ServiceContext.from_defaults(llm=llm_oci, embed_model=embed_model)

In [14]:
index = VectorStoreIndex.from_vector_store(
    vector_store=v_store, service_context=service_context
)

In [17]:
query_engine = index.as_query_engine(similarity_top_k=5)

In [18]:
%%time

response = query_engine.query(question)

print(f"Question: {question}")
print("")
print(response.response)
print("")

2024-02-18 22:22:07,069 - INFO - ---> Calling query on DB
2024-02-18 22:22:07,241 - INFO - select: select V.id, C.CHUNK, C.PAGE_NUM, 
                            ROUND(VECTOR_DISTANCE(V.VEC, :1, DOT), 3) as d,
                            B.NAME 
                            from VECTORS V, CHUNKS C, BOOKS B
                            where C.ID = V.ID and
                            C.BOOK_ID = B.ID
                            order by d
                            FETCH FIRST 5 ROWS ONLY
2024-02-18 22:22:07,512 - INFO - Query duration: 0.4 sec.
  warn_deprecated(



Question: What is Oracle Data Guard? Can it be used for Disaster recovery

Oracle Data Guard is a tool for database replication, meant for high availability, data protection and disaster recovery. It ensures that there is at least one standby database that can survive any form of outage, including data corruption and natural disasters. This database is an exact replica of the production database and can thus be used to restore data using traditional backup methods. It's composed of one primary database and one or more standby databases, with the primary database being either a single instance or an Oracle RAC database and the standbys being single instances as well. 

Data Guard is included in Oracle Enterprise Edition and a primary database can support up to 30 standbys. There's also Oracle Active Data Guard, an extension of Data Guard providing additional features like offloading processing from the production database and enhancing data protection. 

Yes, Oracle Data Guard can defin