### Oracle Vector DB wrapped as a llama-index custom Vector Store

* ispired by: https://docs.llamaindex.ai/en/stable/examples/low_level/vector_store.html
* Demo on **Medicine Book**
* Demo on Python book

In [1]:
import logging
import sys

from typing import List, Any, Optional, Dict, Tuple
from llama_index.vector_stores.types import (
    VectorStore,
    VectorStoreQuery,
    VectorStoreQueryResult,
)
from llama_index import StorageContext, VectorStoreIndex, ServiceContext
from llama_index.schema import TextNode, BaseNode, Document

import oci
import ads
# only 
import oracledb
from oci_utils import load_oci_config
from ads.llm import GenerativeAIEmbeddings, GenerativeAI
from oracle_vector_db import OracleVectorStore

from config_private import COMPARTMENT_OCID, ENDPOINT

In [2]:
# version I'm using
print(f"oracledb version: {oracledb.__version__}")
print(f"oci version: {oci.__version__}")

oracledb version: 2.0.0.dev20231121
oci version: 2.112.1+preview.1.1649


In [3]:
# for debugging
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [4]:
# setup
oci_config = load_oci_config()

# need to do this way
api_keys_config = ads.auth.api_keys(oci_config)

# english, or for other language use: multilingual
MODEL_NAME = "cohere.embed-english-v3.0"

embed_model = GenerativeAIEmbeddings(
    compartment_id=COMPARTMENT_OCID,
    model=MODEL_NAME,
    auth=ads.auth.api_keys(oci_config),
    # Optionally you can specify keyword arguments for the OCI client, e.g. service_endpoint.
    client_kwargs={
        "service_endpoint": ENDPOINT
    },
)

In [5]:
llm_oci = GenerativeAI(
    compartment_id=COMPARTMENT_OCID,
    max_tokens=1024,
    # Optionally you can specify keyword arguments for the OCI client, e.g. service_endpoint.
    client_kwargs={
        "service_endpoint": ENDPOINT
    },
)

In [6]:
v_store = OracleVectorStore(verbose=False)

In [7]:
service_context = ServiceContext.from_defaults(llm=llm_oci, embed_model=embed_model)

In [8]:
index = VectorStoreIndex.from_vector_store(vector_store=v_store,
    service_context=service_context
)

In [9]:
query_engine = index.as_query_engine(similarity_top_k=5)

#### Using the wrapper for the DB Vector Store

In [25]:
question = "How would you write a Python code snippet that uses regular expressions to find all email addresses in a given string?"

In [26]:
# embed the query using OCI GenAI
query_embedding = embed_model.embed_documents([question])[0]

#  wrap in llama-index
query_obj = VectorStoreQuery(
    query_embedding=query_embedding, similarity_top_k=6
)

#### Use our Vector Store DB

In [27]:
%%time

q_result = v_store.query(query_obj)

CPU times: user 20.8 ms, sys: 4.14 ms, total: 24.9 ms
Wall time: 475 ms


In [28]:
for n, id, sim in zip(q_result.nodes, q_result.ids, q_result.similarities):
    print(f"Dod. id: {id}")
    print(f"Similarity: {-sim}")
    print(n.text)
    print("")

Dod. id: 51f98b82b12b8929b1c25baf68e163a4419903c6fee13bb96788ba813bad1103
Similarity: 0.54
130 CHAPTER 11. REGULAR EXPRESSIONS From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008 Return-Path: <postmaster@collab.sakaiproject.org> for <source@collab.sakaiproject.org>; Received: (from apache@localhost) Author: stephen.marquard@uct.ac.za We don’t want to write code for each of the types of lines, splitting and slici ng diﬀerently for each line. This following program uses findall() to ﬁnd the lines with email addresses in them and extract one or more addresses from each of those lines. import re s=/quotesingle.ts1A message from csev@umich.edu to cwen@iupui.edu about mee ting @2PM/quotesingle.ts1 lst =re.findall( /quotesingle.ts1\S+@\S+/quotesingle.ts1 , s) print(lst) # Code: http://www.py4e.com/code3/re05.py Thefindall() method searches the string in the second argument and returns a list of all of the strings that look like email addresses. We are using a tw o-character sequence that 

#### Integrate in the bigger RAG picture

In [29]:
%%time

response = query_engine.query(question)

print(f"Question: {question}") 
print(response.response)
print("")

Question: How would you write a Python code snippet that uses regular expressions to find all email addresses in a given string?
To find all email addresses in a given string using Python regular expressions, you can use the `re.findall()` function. Here's a code snippet to accomplish this:
```python
import re

email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'

def find_emails(input_string):
    return re.findall(email_pattern, input_string)

input_string = "Contact us at john.doe@example.com or jane.smith@example.org for more information."

emails = find_emails(input_string)
print(emails)
```

In this code, we define a regular expression pattern `r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'` that matches common email address formats. The `\b` boundary anchors ensure that we match whole email addresses.

The `find_emails()` function takes an input string, uses `re.findall()` with the defined pattern to find all email addresses, and returns them as a lis