# Oracle Vector Search with OpenAI Embeddings - Cookbook

A cloud-first, reproductible cookbook demonstrating semantic vector search using OpenAI embeddings stored and queried natively in Oracle Autonomous AI Database.

This notebook walks through how to build a simple **vector similarity search pipeline**
using **OpenAI embeddings** and **Oracle Database AI Vector Search**.

We will:
- Generate embeddings using OpenAI
- Store them directly inside Oracle Database
- Perform semantic similarity search using SQL

## Environment & Prerequisites

- Python 3.10+
- Oracle Autonomous AI Database (primary target)
- Oracle Database Free /XE (local development)
- Oracle Instant Client (Thick mode)
- `python-oracledb`
- OpenAI API key (used for embeddings)

This notebook is designed as a **cloud-first**, reproductible cookbook for **Oracle Autonomous AI Database** with native VECTOR support.
For local development and testing, the same notebook can also be run using **Oracle Database Free / XE** by changing only the database connection details.

In [1]:
import oracledb

oracledb.init_oracle_client(
    lib_dir=r"C:\oracle\instantclient-basic-windows.x64-23.26.0.0.0\instantclient_23_0"
)

## Verify Oracle Instant Client Configuration

Before working with Oracle AI Vector Search, we must ensure that the Oracle Instant Client is correctly installed and accessible from Python.

In this step, we check the Oracle client version used by the `python-oracledb` driver.

This verification step applies both to Oracle Autonomous AI Database (cloud deployments) and to local Oracle Database Free / XE environments used for development and testing.

### Expected result
- The function should return a tuple representing the Oracle client version.
- Example output:
  `(23, 26, 0, 0, 0)`

This confirms:
- Oracle Instant Client 23.x is installed
- Thick mode is active
- The environment supports Oracle VECTOR data types and vector SQL functions

If this step fails or returns an error, vector operations will not work in the following steps.

In [2]:
import oracledb
oracledb.clientversion()

(23, 26, 0, 0, 0)

The project is structured into small, reusable modules:

- `embeddings.py` ‚Äì generates embeddings using OpenAI
- `db.py` ‚Äì handles Oracle DB connections and inserts
- `similarity.py` ‚Äì performs vector similarity search
üì¶ Import Project Modules

In [3]:
from embeddings import get_embedding
from db import insert_document
from similarity import similarity_search

### Database Schema

We use a single table to store documents and embeddings:

- `content` ‚Äì document text (CLOB)
- `embedding` ‚Äì vector embedding (VECTOR(1536, FLOAT32))

This allows Oracle to perform native vector similarity search directly in SQL.

We now insert a few sample documents into Oracle.
Each document is converted into an embedding before being stored.

In [4]:
docs = [
    "Oracle Vector Search demo",
    "Vector similarity search using Oracle Database",
    "Embeddings stored directly inside Oracle",
    "Semantic search with Oracle AI Vector"
]

for d in docs:
    vec = get_embedding(d)
    insert_document(d, vec)

## Generate Embeddings and Store Them in Oracle Database

In this step, we prepare a small collection of example text documents and store them in Oracle Database together with their vector embeddings.

For each text:
1. We generate a semantic embedding using the OpenAI embedding model.
2. We insert the original text and its corresponding vector into the Oracle database table.

This demonstrates how unstructured text can be transformed into numerical vectors and persisted directly inside Oracle using the native VECTOR data type.

### What happens in this cell
- `get_embedding(text)` converts each text into a dense vector representation.
- `insert_document(text, vector)` stores:
  - the original text as a CLOB
  - the embedding as a VECTOR column in Oracle

### Expected result
- No output is printed if the operation is successful.
- Each text entry is inserted as a new row in the database.
- The database is now populated with vectorized documents that can be queried using semantic similarity.

### Why this step is important
This step creates the semantic knowledge base used later for vector similarity search.
Once the embeddings are stored in Oracle, similarity queries can be executed directly in SQL without moving data outside the database.

In [5]:
from embeddings import get_embedding
from db import insert_document

texts = [
    "Oracle Vector Search demo",
    "Vector similarity search in Oracle Database",
    "Using embeddings with Oracle AI Vector Search",
    "Oracle database supports vector indexes"
]

for t in texts:
    vec = get_embedding(t)
    insert_document(t, vec)

### üóÇÔ∏è Insert Sample Documents with Vector Embeddings

In this step, we insert a few sample text documents into the Oracle Database together with their vector embeddings.

Each document is processed as follows:
- The text is converted into a numerical vector representation (embedding).
- The embedding captures the semantic meaning of the text.
- Both the original text and its embedding are stored in the `documents` table.

This step is essential because vector similarity search relies on having embeddings stored directly in the database.

After executing this cell:
- The database contains multiple documents.
- Each document is associated with a vector embedding.
- The data is ready to be queried using Oracle‚Äôs vector distance functions in the next steps.

In [6]:
from embeddings import get_embedding
from db import insert_document

insert_document("Oracle vector search demo", get_embedding("Oracle vector search demo"))
insert_document("Semantic search with Oracle DB", get_embedding("Semantic search with Oracle DB"))
insert_document("Vector similarity example", get_embedding("Vector similarity example"))

### ‚ûï Insert Individual Documents into the Vector Store

In this step, we manually insert individual documents into the Oracle Database together with their vector embeddings.

For each document:
- The text is transformed into a vector embedding using the embedding function.
- The embedding represents the semantic meaning of the text.
- Both the text and its embedding are stored in the `documents` table.

This approach is useful when:
- You want to insert documents one by one.
- You need fine-grained control over how and when documents are stored.
- You want to clearly demonstrate the end-to-end flow from text ‚Üí embedding ‚Üí database storage.

After running this cell:
- Each document is persisted in the database.
- All stored vectors have the same dimensionality.
- The data is ready for semantic similarity queries.

In [7]:
from db import insert_document
from embeddings import get_embedding

insert_document(
    "Oracle vector search demo",
    get_embedding("Oracle vector search demo")
)

insert_document(
    "Semantic search with Oracle DB",
    get_embedding("Semantic search with Oracle DB")
)

insert_document(
    "Vector similarity example",
    get_embedding("Vector similarity example")
)

### üîç Perform Semantic Similarity Search Using Oracle Vector Search

In this step, we execute a semantic similarity query against the vectors stored in Oracle Database.

The process is as follows:
- The input query text is converted into a vector embedding.
- Oracle Database computes the distance between the query vector and each stored document vector using `VECTOR_DISTANCE`.
- Results are ordered by semantic similarity (lower distance means higher similarity).
- We retrieve the top 3 most relevant documents.

#### ‚úÖ Interpreting the Results

The output shows:
- The original document text stored in the database.
- A numeric distance score representing semantic similarity.

Example output:

Embeddings stored directly inside Oracle          ‚Üí distance ‚âà 0.228
Vector similarity search using Oracle Database    ‚Üí distance ‚âà 0.233
Oracle vector search demo                         ‚Üí distance ‚âà 0.238

**How to read this:**
- The document *‚ÄúEmbeddings stored directly inside Oracle‚Äù* is the most semantically similar to the query.
- All returned results are contextually relevant, even though the exact wording differs.
- This demonstrates that Oracle Vector Search performs **semantic understanding**, not keyword matching.

This confirms that:
- Vector embeddings are correctly generated.
- Embeddings are successfully stored inside Oracle Database.
- Oracle is performing native vector similarity search using SQL.

In [8]:
from similarity import similarity_search

results = similarity_search("oracle vector similarity", 3)

for content, distance in results:
    print(content, distance)

Embeddings stored directly inside Oracle 0.2324751330014676
Using embeddings with Oracle AI Vector Search 0.23861176131670936
Oracle vector search demo 0.2395735649054488


### üîç Semantic Similarity Search Results

In this step, a semantic similarity search is executed against the documents stored in the Oracle Database.

The input query is converted into a vector embedding and compared against all stored document embeddings using Oracle‚Äôs native vector distance computation.

The output is a ranked list of documents ordered by semantic similarity.

Each result includes:
- the original document text
- a numeric distance score representing semantic closeness

Lower distance values indicate higher semantic similarity.

In this example, the top results correctly focus on Oracle vector search and embeddings, demonstrating that the database understands semantic meaning rather than relying on keyword matching.

This confirms that:
- embeddings are generated in Python
- vectors are stored directly inside Oracle Database
- similarity calculations are performed fully inside SQL using vector functions

In [9]:
from similarity import similarity_search
similarity_search("oracle vector similarity", 3)

[('Embeddings stored directly inside Oracle', 0.2324751330014676),
 ('Using embeddings with Oracle AI Vector Search', 0.23861176131670936),
 ('Oracle vector search demo', 0.2395735649054488)]

## ‚úÖ Conclusion & Next Steps

This cookbook serves as a practical, reproductible guide for developers exploring vector search and AI-native workflows with Oracle Database.

What was achieved in this demo:
- Text data was converted into embeddings in Python
- Embeddings were stored directly inside Oracle Database using the VECTOR data type
- Semantic similarity search was executed fully in SQL using `VECTOR_DISTANCE`
- Results were ranked by semantic closeness, not keyword matching

This shows how Oracle Database can act as:
- a persistent vector database
- a semantic search engine
- a foundation for RAG and AI applications

### Why this matters

Unlike external vector stores, Oracle allows embeddings and business data to live together in the same database, enabling:
- simpler architectures
- transactional consistency
- SQL-native AI workflows

### Next steps

Planned follow-ups for this cookbook:
- Add a lightweight Streamlit UI for interactive search
- Extend the example to a RAG-style flow
- Prepare the content for submission as an OpenAI Cookbook contribution (PR)

This notebook serves as a practical, reproducible cookbook example for developers exploring vector search with Oracle Database.