### Oracle Vector DB wrapped as a llama-index custom Vector Store

* ispired by: https://docs.llamaindex.ai/en/stable/examples/low_level/vector_store.html
* adding **reranker** after retrieval from Vector Store

This demo shows how, **adding a reranker** after the retrieval from Vector Store, we can improve the list of documents retrieved.

In this way, we can also reduce the Context size.

In this demo we use **Cohere Reranker**.

In [1]:
import logging
import sys

from typing import List, Any, Optional, Dict, Tuple
from llama_index.vector_stores.types import (
    VectorStore,
    VectorStoreQuery,
    VectorStoreQueryResult,
)
from llama_index import StorageContext, VectorStoreIndex, ServiceContext
from llama_index.schema import TextNode, BaseNode, Document
from llama_index.postprocessor.cohere_rerank import CohereRerank

import oci
import ads

# only
import oracledb
from oci_utils import load_oci_config
from ads.llm import GenerativeAIEmbeddings, GenerativeAI
from oracle_vector_db import OracleVectorStore

from config import EMBED_MODEL
from config_private import COMPARTMENT_OCID, ENDPOINT, COHERE_API_KEY

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# version I'm using
print(f"oracledb version: {oracledb.__version__}")
print(f"oci version: {oci.__version__}")

oracledb version: 2.0.0.dev20231121
oci version: 2.119.1


In [3]:
# for debugging
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [4]:
# setup
oci_config = load_oci_config()

# need to do this way
api_keys_config = ads.auth.api_keys(oci_config)

# english, or for other language use: multilingual
# EMBED_MODEL  from config

embed_model = GenerativeAIEmbeddings(
    compartment_id=COMPARTMENT_OCID,
    model=EMBED_MODEL,
    auth=ads.auth.api_keys(oci_config),
    # Optionally you can specify keyword arguments for the OCI client, e.g. service_endpoint.
    client_kwargs={"service_endpoint": ENDPOINT},
)

# adding Cohere reranker
cohere_rerank = CohereRerank(api_key=COHERE_API_KEY, top_n=4)

In [5]:
llm_oci = GenerativeAI(
    compartment_id=COMPARTMENT_OCID,
    max_tokens=1024,
    # Optionally you can specify keyword arguments for the OCI client, e.g. service_endpoint.
    client_kwargs={"service_endpoint": ENDPOINT},
)

In [6]:
v_store = OracleVectorStore(verbose=True)

In [7]:
service_context = ServiceContext.from_defaults(llm=llm_oci, embed_model=embed_model)

In [8]:
index = VectorStoreIndex.from_vector_store(
    vector_store=v_store, service_context=service_context
)

In [9]:
# added reranker to the chain
query_engine = index.as_query_engine(
    similarity_top_k=6,
    # after the query on the Vector Store we do reranking
    node_postprocessors=[cohere_rerank],
)

#### Using the wrapper for the DB Vector Store

In [10]:
question = "What are the symptoms of Long Covid? Make a list."

In [11]:
# embed the query using OCI GenAI
query_embedding = embed_model.embed_documents([question])[0]

#  wrap in llama-index
query_obj = VectorStoreQuery(query_embedding=query_embedding, similarity_top_k=6)

#### Use our Vector Store DB

In [12]:
%%time

q_result = v_store.query(query_obj)

2024-02-02 11:13:43,893 - INFO - ---> Calling query on DB
2024-02-02 11:13:44,134 - INFO - select: select V.id, C.CHUNK, C.PAGE_NUM, 
                            ROUND(VECTOR_DISTANCE(V.VEC, :1, DOT), 3) as d,
                            B.NAME 
                            from VECTORS V, CHUNKS C, BOOKS B
                            where C.ID = V.ID and
                            C.BOOK_ID = B.ID
                            order by d
                            FETCH FIRST 6 ROWS ONLY
2024-02-02 11:13:44,478 - INFO - Query duration: 0.6 sec.


CPU times: user 23.8 ms, sys: 9.92 ms, total: 33.8 ms
Wall time: 587 ms


In [13]:
for n, id, sim in zip(q_result.nodes, q_result.ids, q_result.similarities):
    print(f"Dod. id: {id}")
    print(f"Similarity: {-sim}")
    print(n.text)
    print("")

Dod. id: 77c85b21c1a77b9d03e2b141ed9b563824981f58d0f88f9c204cc9f12c26e288
Similarity: 0.586
COVID-19 Treatment Guidelines 88nlm.nih.gov/pubmed/32730233 . 69. Zimmermann P, Pittet LF, Curtis N. How common is long COVID in children and adolescents? Pediatr Infect Dis J . 2021;40(12):e482-e487. Available at: https://www.ncbi.nlm.nih.gov/pubmed/34870392 . 70. Zimmermann P, Pittet LF, Curtis N. Long COVID in children and adolescents. BMJ . 2022;376:o143. Available at: https://www.ncbi.nlm.nih.gov/pubmed/35058281 . 71. Molteni E, Sudre CH, Canas LS, et al. Illness duration and symptom profile in symptomatic UK school-aged children tested for SARS-CoV-2. Lancet Child Adolesc Health . 2021;5(10):708-718. Available at: https://www.ncbi.nlm.nih.gov/pubmed/34358472 . 72. Zheng YB, Zeng N, Yuan K, et al. Prevalence and risk factor for long COVID in children and adolescents: a meta-analysis and systematic review. J Infect Public Health . 2023;16(5):660-672. Available at: https://pubmed.ncbi.nlm.nih

#### Integrate in the bigger RAG picture

In [14]:
%%time

response = query_engine.query(question)

print(f"Question: {question}")
print(response.response)
print("")

2024-02-02 11:13:51,367 - INFO - ---> Calling query on DB
2024-02-02 11:13:51,551 - INFO - select: select V.id, C.CHUNK, C.PAGE_NUM, 
                            ROUND(VECTOR_DISTANCE(V.VEC, :1, DOT), 3) as d,
                            B.NAME 
                            from VECTORS V, CHUNKS C, BOOKS B
                            where C.ID = V.ID and
                            C.BOOK_ID = B.ID
                            order by d
                            FETCH FIRST 6 ROWS ONLY
2024-02-02 11:13:51,883 - INFO - Query duration: 0.5 sec.
  warn_deprecated(



Question: What are the symptoms of Long Covid? Make a list.
According to the provided context, long COVID is a condition in children and adolescents that lasts longer than four weeks and is often characterized by symptoms such as fatigue, shortness of breath, and chest pain. Below is a list of common symptoms of long COVID based on the information provided: 

- Fatigue
- Shortness of breath
- Chest pain
- Joint pain
- Headache
- Dizziness
- Brain fog
- Insomnia
- Gastrointestinal issues

Would you like to know more about any of these symptoms? 

CPU times: user 278 ms, sys: 97.7 ms, total: 376 ms
Wall time: 7.93 s


#### pages with metadata (page_num)

In [15]:
for node in response.source_nodes:
    print(f"{node.text}\npag:{node.metadata['page_label']}\n")

COVID-19 Treatment Guidelines 88nlm.nih.gov/pubmed/32730233 . 69. Zimmermann P, Pittet LF, Curtis N. How common is long COVID in children and adolescents? Pediatr Infect Dis J . 2021;40(12):e482-e487. Available at: https://www.ncbi.nlm.nih.gov/pubmed/34870392 . 70. Zimmermann P, Pittet LF, Curtis N. Long COVID in children and adolescents. BMJ . 2022;376:o143. Available at: https://www.ncbi.nlm.nih.gov/pubmed/35058281 . 71. Molteni E, Sudre CH, Canas LS, et al. Illness duration and symptom profile in symptomatic UK school-aged children tested for SARS-CoV-2. Lancet Child Adolesc Health . 2021;5(10):708-718. Available at: https://www.ncbi.nlm.nih.gov/pubmed/34358472 . 72. Zheng YB, Zeng N, Yuan K, et al. Prevalence and risk factor for long COVID in children and adolescents: a meta-analysis and systematic review. J Infect Public Health . 2023;16(5):660-672. Available at: https://pubmed.ncbi.nlm.nih.gov/36931142/ . 73. Pinto Pereira SM, Mensah A, Nugawela MD, et al. Long COVID in children 