# Simple Way to Get Context from Vector Database
This notebook demonstrates how to retrieve contextual information from a vector database. Vector databases are used to store and query high-dimensional vectors, which are often derived from text. The notebook covers the following steps:


1. Connecting to the Vector Database: Code to establish a connection to the vector database.
2. Embedding query: Code to use OCIGenAIEmbeddings to generate a vector from a query
2. Querying the Database: Techniques for querying the database to retrieve relevant context based on input vectors.




In [2]:
from langchain_community.embeddings.oci_generative_ai import OCIGenAIEmbeddings
from langchain_community.vectorstores import OracleVS
import oracledb
import os


In [None]:
def get_context_by_query():
    """
    Retrieves context by performing a similarity search using a query embedding.
    This function performs the following steps:
    1. Loads environment variables from a .env file.
    2. Initializes the OCI embedding service using the loaded environment variables.
    3. Connects to an Oracle Autonomous Database using the loaded environment variables.
    4. Creates an embedding for a predefined query.
    5. Initializes a vector store with the database connection, embeddings, and table name.
    6. Performs a similarity search using the query embedding to retrieve the top results.
    Returns:
        list: A list of top results based on the similarity search.
    """

    # Load environment variables from a .env file
    from dotenv import load_dotenv

    #OCI embedding service
    embeddings =  OCIGenAIEmbeddings(
        model_id=os.getenv('CON_GEN_AI_EMB_MODEL_ID'),
        service_endpoint=os.getenv('CON_GEN_AI_SERVICE_ENDPOINT'),
        compartment_id=os.getenv('CON_GEN_AI_COMPARTMENT_ID')
        )

    # Connect to Oracle Autonomous Database
    conn = oracledb.connect(
        user=os.getenv('CON_ADB_DEV_USER_NAME'), 
        password=os.getenv('CON_ADB_DEV_PASSWORD'), 
        dsn=os.getenv('CON_ADB_DEV_SERVICE_NAME')
        )

    #query
    query = ["Retrieval Augmented Generation (RAG), Large Language Models (LLMs)"]
    # create query embedding 
    query_embedding = embeddings.embed_documents(query)

    #table containing embeddings
    table_name = "DOCS"

    #create vector store
    vector_store = OracleVS(conn, embeddings, table_name)

    #get context
    results = vector_store.similarity_search_by_vector(embedding=query_embedding[0], 
        k=10  # Number of top results to retrieve
    )
    return results

In [7]:


results = get_context_by_query()
#print context
for i, result in enumerate(results):
    print(f"Result {i + 1}:")
    print(f"ID: {result.metadata['file_id']}")
    print(f"ID: {result.metadata}")
    print(f"ID: {result.page_content}")
    print("-" * 60)

Result 1:
ID: 13
ID: {'file_id': 13, 'file_trg_obj_name': 'None', 'file_version': 1, 'file_date': '2025-01-21 18:34:42', 'obj_name': 'None'}
ID: Keywords: Retrieval Augmented Generation (RAG), Large Language Models
(LLMs), Generative AI in Software Development, Transparent AI.
1
Introduction
Large language models (LLMs) excel at generating human like responses, but
base AI models can't keep up with the constantly evolving information within
dynamic sectors. They rely on static training data, leading to outdated or incom-
plete answers. Thus they often lack transparency and accuracy in high stakes
arXiv:2410.15944v1 [cs.SE] 21 Oct 2024
decision making. Retrieval Augmented Generation (RAG) presents a powerful
solution to this problem. RAG systems pull in information from external data
sources, like PDFs, databases, or websites, grounding the generated content in
------------------------------------------------------------
Result 2:
ID: 13
ID: {'file_id': 13, 'file_trg_obj_name': 'None', 