# üß† OpenAI Vector Search with LangChain and Oracle Database

## Introduction

This cookbook shows how to build a **semantic vector search pipeline** using **OpenAI**, **LangChain**, and **Oracle Database AI Vector Search** via the official `langchain-oracledb` plugin.

The goal is to demonstrate how unstructured text can be embedded with OpenAI models and stored natively in Oracle Database, enabling **similarity search based on meaning**, not keywords.

---

## What This Cookbook Demonstrates

- Using OpenAI embeddings with LangChain  
- Storing vectors inside Oracle Database  
- Performing semantic similarity search using native database capabilities  

---

## Requirements

- Python 3.10+
- Oracle Database 26ai with vector search enabled
- OpenAI API key
- Required packages:
  - `langchain`
  - `langchain-openai`
  - `langchain-oracledb`
  - `oracledb`

---

‚ö†Ô∏è Note:  
If OpenAI quota is unavailable, embedding generation and vector population steps may fail with a `429` error.  
This behavior is expected and does not invalidate the integration logic demonstrated in this cookbook.


### Installing Required Dependencies

This notebook relies on LangChain, Oracle Database Vector Search, and OpenAI embeddings.
The following packages ensure that all integrations work correctly, including database connectivity,
vector storage, and environment configuration.

If the dependencies are already installed in the current environment, this cell will not make any changes.


In [1]:
%pip install -q langchain langchain-oracledb langchain-openai oracledb python-dotenv

Note: you may need to restart the kernel to use updated packages.


### Verify Installed Dependencies

This step verifies that the required Python dependencies are available in the current environment and displays their installed versions.

If the versions are shown successfully, the environment is corectly configured and ready to proceed.

In [2]:
import langchain
import oracledb

langchain.__version__, oracledb.__version__
#print("Dependencies installed and ready.")

('1.2.6', '3.4.1')

### Verify OpenAI Configuration

This step checks that OpenAI API key is available in the environment.

In [3]:
import os

"OPENAI_API_KEY" in os.environ

True

### Importing Libraries and Loading Configuration

This section imports the core libraries used throughout the notebook:

- OpenAI embeddings via LangChain
- Oracle Vector Store integration
- Vector distance strategy for similarity search
- Oracle Database Python driver
- Utilities for loading environment variables

Configuration values such as database credentials and the OpenAI API key
are expected to be provided through environment variables.


In [4]:
from langchain_openai import OpenAIEmbeddings
from langchain_oracledb.vectorstores import OracleVS
from langchain_oracledb.vectorstores.oraclevs import DistanceStrategy

import oracledb
import os
from dotenv import load_dotenv


  from pydantic.v1.fields import FieldInfo as FieldInfoV1


### Step 1: Load Environment Variables

This step loads configuration values from a `.env` file, including
OpenAI credentials and Oracle Database connection details.

Environment variables are used to keep sensitive data out of source code.


In [5]:
load_dotenv()

required_env_vars = [
        "OPENAI_API_KEY" ,
        "ORACLE_USER" ,
        "ORACLE_PASSWORD" ,
        "ORACLE_DSN" ,
]

missing = [v for v in required_env_vars if not os.getenv(v)]
if missing:
    raise EnvironmentError(f"Missing required environment variables: {missing}")
print("Environment variables loaded successfuly")


Environment variables loaded successfuly


### Step 2: Verify OpenAI API Key

This sanity check verifies that the OpenAI API key is available
in the runtime environment.

No OpenAI API calls are made at this step.


In [6]:
assert os.getenv("OPENAI_API_KEY") is not None, "OPENAI_API_KEY is not set"
print("OpenAI API key detected")


OpenAI API key detected


### Step 3: Initialize OpenAI Embeddings

This step initializes an OpenAI embedding model using LangChain.

The embeddings will later be used to generate vector representations
for text documents.


In [7]:
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

print("OpenAI embeddings object created")


OpenAI embeddings object created


### Step 4: Connect to Oracle Database

This step establishes a connection to Oracle Database using the
Python `oracledb` driver.

The database will be used to store vectors and perform similarity search.


In [8]:
conn = oracledb.connect(
    user=os.getenv("ORACLE_USER"),
    password=os.getenv("ORACLE_PASSWORD"),
    dsn=os.getenv("ORACLE_DSN")
)

print("Connected to Oracle Database")


Connected to Oracle Database


### Step 5: Define Similarity Distance Strategy

Oracle Database Vector Search supports multiple distance metrics.
In this example, cosine similarity is used, which is well-suited
for OpenAI-generated text embeddings.


In [9]:
table_name = "LANGCHAIN_DEMO_VECTORS"
distance_strategy = DistanceStrategy.COSINE

### Step 6: Initialize Oracle Vector Store (Requires OpenAI Quota)

This step initializes the Oracle vector store backed by a database table.

‚ö†Ô∏è This operation requires an OpenAI API key with active embeddings quota.
If quota is exceeded, OpenAI will return a 429 error.


In [10]:
try:
    oracle_vs = OracleVS(
        client=conn,
        embedding_function=embeddings,
        table_name="TABLE_NAME",

distance_strategy=DistanceStrategy.COSINE
    )
    print("Oracle Vector Store initialized")

except Exception as e:
    oracle_vs = None
    print("Oracle Vectore STore not initialized.")
    print("Reason: OpenAI quota is not available for this API key.")
    print("This is expected if you are using a free restricted OpenAI account.")
    print(e)


Oracle Vector Store initialized


### Step 7: Prepare Sample Documents

This step defines a small collection of example text documents
that will be embedded and stored in Oracle Database.


In [11]:
texts = [
    "Oracle AI Database 26ai provides native vector search.",
    "LangChain integrates Oracle Database using the langchain-oracledb plugin.",
    "Vector search enables semantic similarity over unstructured text.",
    "Relational databases can now store and search embeddings efficiently."
]

if oracle_vs:
    oracle_vs.add_texts(texts)
    print("Sample documents embedded and stored")
else:
    print("Skipping document embedding (vector store not available)")


Sample documents embedded and stored


### Step 8: Generate Query Embedding

We generate an embedding for a natural language query.
This vector will be used to perform similarity search in Oracle Database.


In [12]:
query = "How does Oracle support vector search?"

try:
    query_vector = embeddings.embed_query(query)
    print("Query embedding generated")

except Exception as e:
    query_vector = None
    print("Query embedding not generated (OpenAI quota issue)")
    print(e)


Query embedding generated


### Step 9: Perform Vector Similarity Search

We use Oracle SQL and the `VECTOR_DISTANCE` function to compute cosine similarity
between the query vector and stored embeddings.

The results are ordered by similarity score.


In [13]:
cursor = conn.cursor()

cursor.execute("""
BEGIN
  EXECUTE IMMEDIATE '
    CREATE TABLE LANGCHAIN_DEMO_VECTORS (
      id NUMBER GENERATED ALWAYS AS IDENTITY,
      text CLOB,
      embedding VECTOR(1536)
    )
  ';
EXCEPTION
  WHEN OTHERS THEN
    IF SQLCODE != -955 THEN
      RAISE;
    END IF;
END;
""")

conn.commit()
print("Vector table ready")


Vector table ready


In [14]:
if oracle_vs:
    results = oracle_vs.similarity_search(query, k=3)

    for r in results:
        print(r.page_content)
else:
    print("Skipping similarity search (vector store not available)")

Oracle AI Database 26ai provides native vector search.
Oracle AI Database 26ai provides native vector search.
Oracle AI Database 26ai provides native vector search.


### Step 10: Initialize OpenAI Embeddings

In this step, we initialize the OpenAI embedding model that will be used to
convert text into vector representations.

These embeddings are compatible with Oracle Database vector columns and
will be used for semantic similarity search.


In [15]:
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

print("OpenAI embeddings initialized")


OpenAI embeddings initialized


### Step 11: Perform Vector Similarity Search in Oracle Database

This step executes a native SQL vector similarity query using Oracle Database.
The `VECTOR_DISTANCE` function computes cosine similarity between the query
vector and stored document embeddings.

The results are ordered by similarity score and the top matches are returned.


In [16]:
results = oracle_vs.similarity_search("What is Oracle Search?, k=3")

print(f"Number of results: {len(results)}")
for doc in results:
    print(doc.page_content)

Number of results: 4
Oracle AI Database 26ai provides native vector search.
Oracle AI Database 26ai provides native vector search.
Oracle AI Database 26ai provides native vector search.
Oracle AI Database 26ai provides native vector search.


### Index Documents into Oracle Vector Store

In this step, we add sample text documents to the Oracle Database vectore store.
Each document is embedded using OpenAI embeddings and stored as a vector, making it available for semantic similarity search.

In [17]:
oracle_vs.add_texts(texts)

['736d20fe-99cc-4b00-97cb-a59c448fa775',
 '646d6ceb-d526-441b-8441-c71c784e372e',
 'f7f3dc34-700f-4dda-8dc7-eb0bbbecd304',
 '49c62658-ba63-4553-b5ff-12d36b6cd8b2']

### Conclusion

This example shows how OpenAI embeddings can be used with LangChain and Oracle Database native vector search.

Text is embedded using OpenAI, stored as vectors in Oracle Database, and quried using vector similarity search through LangChain.
