### Oracle AI Vector Search: Loading the Vector Store

With this Notebook you can load your Knowledge Base in Oracle DB and create and  store the Embeddings Vectors.

The KB is made by a set of pdf files, stored in a directory. This NB:
* Reads all the pdf files and splits in chunks
* Compute the embeddings for all chunks
* Store chunks and embeddings in **ORACLE_KNOWLEDGE** table

* This demo is based on the **LangChain** integration
* **OCI GenAI multi-lingual (Cohere) embeddings**
* Data will be stored in a single table

Afterward, you can do a similarity search and run an assistant, based on OCI GenAI, on top.

In [1]:
import logging
from glob import glob

import oracledb

# to compute embeddings vectors
from oci_cohere_embeddings_utils import OCIGenAIEmbeddingsWithBatch
from langchain_community.vectorstores import oraclevs
from langchain_community.vectorstores.oraclevs import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy

# the class to integrate OCI AI Vector Search with LangChain
from chunk_index_utils import load_books_and_split
from utils import enable_tracing
from config import OCI_EMBED_MODEL, ENDPOINT
from config_private import COMPARTMENT_ID, DB_USER, DB_PWD, DB_HOST_IP, DB_SERVICE

#### Setup

In [2]:
#
# Some configurations
#

# directory where our Knowledge base is contained in txt files
BOOKS_DIR = "./books"

# to connect to DB
username = DB_USER
password = DB_PWD
dsn = f"{DB_HOST_IP}:1521/{DB_SERVICE}"

# Configure logging
logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
)

embed_model = OCIGenAIEmbeddingsWithBatch(
    # this code is done to be run in OCI DS.
    # If outside replace with API_KEY and provide API_KEYS
    # auth_type = "RESOURCE_PRINCIPAL"
    auth_type="API_KEY",
    model_id=OCI_EMBED_MODEL,
    service_endpoint=ENDPOINT,
    compartment_id=COMPARTMENT_ID,
)

enable_tracing()

In [3]:
# this is the file list containing the Knowledge base
file_list = sorted(glob(BOOKS_DIR + "/" + "*.pdf"))

print(f"There are {len(file_list)} files to be loaded...")
for f_name in file_list:
    print(f_name)

There are 8 files to be loaded...
./books/CurrentEssentialsofMedicine.pdf
./books/Il conto corrente in parole semplici.pdf
./books/La storia del Gruppo-iccrea.pdf
./books/La_Centrale_dei_Rischi_in_parole_semplici.pdf
./books/covid19_treatment_guidelines.pdf
./books/database-concepts.pdf
./books/high-availability-23c.pdf
./books/the-side-effects-of-metformin-a-review.pdf


#### Load all files and then splits in chunks

In [4]:
docs = load_books_and_split(BOOKS_DIR)

2024-05-09 11:39:06,700 - Loading documents from ./books...
2024-05-09 11:39:06,706 - Loading books: 
2024-05-09 11:39:06,706 - * ./books/CurrentEssentialsofMedicine.pdf
2024-05-09 11:39:06,707 - * ./books/Il conto corrente in parole semplici.pdf
2024-05-09 11:39:06,708 - * ./books/La storia del Gruppo-iccrea.pdf
2024-05-09 11:39:06,708 - * ./books/La_Centrale_dei_Rischi_in_parole_semplici.pdf
2024-05-09 11:39:06,709 - * ./books/covid19_treatment_guidelines.pdf
2024-05-09 11:39:06,710 - * ./books/database-concepts.pdf
2024-05-09 11:39:06,711 - * ./books/high-availability-23c.pdf
2024-05-09 11:39:06,712 - * ./books/the-side-effects-of-metformin-a-review.pdf


  0%|          | 0/8 [00:00<?, ?it/s]

2024-05-09 11:39:26,504 - Loaded 4832 chunks of text...


#### Vector Store and load vectors + embeddings in the DB

In [5]:
try:
    connection = oracledb.connect(user=username, password=password, dsn=dsn)
    print("Connection successful!")
except Exception as e:
    print("Connection failed!")
    print(e)

v_store = OracleVS.from_documents(
    docs,
    embed_model,
    client=connection,
    table_name="ORACLE_KNOWLEDGE",
    distance_strategy=DistanceStrategy.COSINE,
)

Connection successful!


  0%|          | 0/54 [00:00<?, ?it/s]

In [None]:
# create an index
oraclevs.create_index(
    connection,
    v_store,
    params={
        "idx_name": "ivf_idx1",
        "idx_type": "IVF",
        "accuracy": 90,
        "parallel": 8,
    },
)

In [None]:
try:
    connection = oracledb.connect(user=username, password=password, dsn=dsn)
    print("Connection successful!")
except Exception as e:
    print(e)
    print("Connection failed!")

# oraclevs.drop_index_if_exists(connection, index_name="ivf_idx1")

#### Do a query for test

In [18]:
# k is the number of docs we want to retrieve
try:
    connection = oracledb.connect(user=username, password=password, dsn=dsn)
    print("Connection successful!")

    v_store = OracleVS(
        client=connection,
        table_name="ORACLE_KNOWLEDGE",
        distance_strategy=DistanceStrategy.COSINE,
        embedding_function=embed_model,
    )

    retriever = v_store.as_retriever(search_kwargs={"k": 6})

    print("Retriever created...")

except Exception as e:
    print("Connection failed!")
    print(e)

Connection successful!
Retriever created...


In [19]:
question = "What is Oracle RAC?"

result_docs = retriever.invoke(question)

In [20]:
for doc in result_docs:
    print(doc.page_content)
    print(doc.metadata)
    print("----------------------------")
    print("")

Figure 3-2    Oracle Database with Oracle RAC Architecture
Note:
After Oracle release 11.2, Oracle RAC One Node or Oracle RAC is the
preferred solution over Oracle Clusterware (Cold Cluster Failover) because it
is a more complete and feature-rich solution.
See Also:
Oracle RAC Administration and Deployment Guide
Oracle Clusterware Administration and Deployment GuideChapter 3
Oracle Real Application Clusters and Oracle Clusterware
3-20
{'source': './books/high-availability-23c.pdf', 'page': 54}
----------------------------

Part III
Oracle RAC and Clusterware Best Practices
•Overview of Oracle RAC and Clusterware Best Practices
{'source': './books/high-availability-23c.pdf', 'page': 121}
----------------------------

11
Overview of Oracle RAC and Clusterware
Best Practices
Oracle Clusterware and Oracle Real Application Clusters (RAC) are Oracle's strategic high
availability and resource management database framework in a cluster environment, and an
integral part of the Oracle MAA Silver