![Banner](images/banner.png)

# Working with the VECTOR data type

This section requires Oracle AI Database 26ai

Documentation reference link: [Using VECTOR Data](https://python-oracledb.readthedocs.io/en/latest/user_guide/vector_data_type.html).

<hr>

Setup for this notebook:

In [None]:
import array
import os
import oracledb

un = os.environ.get("PYO_SAMPLES_MAIN_USER", "pythondemo")
pw = os.environ.get("PYO_SAMPLES_MAIN_PASSWORD", "welcome")
cs = os.environ.get("PYO_SAMPLES_CONNECT_STRING", "localhost/orclpdb")

connection = oracledb.connect(user=un, password=pw, dsn=cs)

if tuple(int(s) for s in connection.version.split("."))[:2] < (23, 7):
    print("!! This notebook requires Oracle Database 23.7 or later !!")

cursor = connection.cursor()

## The VECTOR data type

Oracle AI Database 26ai introduces a VECTOR data type and Unified Hybrid Vector Search. You can blend vectors with relational, text, JSON, graph, and spatial predicates in a single query to retrieve documents, images, audio, video, and table rows together. 

Each VECTOR is represented as a number of vectors and a data format. For example this table holds rows where the first column contains three vectors of 32-bit numbers, the second column holds three vectors of 64-bit numbers, and the last column holds three vectors of 8-bit integers:

In [None]:
cursor.execute("drop table if exists vtab")

cursor.execute("""create table vtab (
                               v32  vector(3, float32),
                               v64  vector(3, float64),
                               v8   vector(3, int8))"""
)

The Python `array.array()` class is used to represent vectors:

In [None]:
vector_data_32 = array.array("f", [2.625, 2.5, 2.0])
vector_data_64 = array.array("d", [22.25, 22.75, 22.5])
vector_data_8 = array.array("b", [4, 5, 6])

Insert the data:

In [None]:
cursor.execute(
    "insert into vtab (v32, v64, v8) values (:1, :2, :3)", 
    [vector_data_32, vector_data_64, vector_data_8]
)

Verify it was inserted correctly:

In [None]:
for row in cursor.execute("select v32, v64, v8 from vtab"):
    print(row)

## Binary Vectors

Documentation reference link: [Using BINARY Vectors](https://python-oracledb.readthedocs.io/en/latest/user_guide/vector_data_type.html#using-binary-vectors).

The BINARY format for VECTOR is an efficient way to store 0 and 1 values.

You must define the number of dimensions as a multiple of 8. Rows in the `vbin` column of this table hold 24 binary values:

In [None]:
cursor.execute("drop table if exists vtab")

cursor.execute("create table vtab (vbin vector(24, binary))")

Binary vectors are represented as 8-bit unsigned integers so the 24 bit values are inserted as three 8-bit unsigned integers:

In [None]:
vector_data_bin = array.array("B", [40, 15, 255])

cursor.execute("insert into vtab (vbin) values (:1)", [vector_data_bin])

Verify it was inserted correctly:

In [None]:
for row in cursor.execute("select vbin from vtab"):
    print(row)

## Sparse Vectors

Documentation reference link: [Using SPARSE Vectors](https://python-oracledb.readthedocs.io/en/latest/user_guide/vector_data_type.html#using-sparse-vectors).

Vectors can be declared as SPARSE, which saves space when most values are zero:

In [None]:
cursor.execute("drop table if exists vtab")

cursor.execute("create table vtab (v64sparse vector(30, float64, sparse))")

Sparse vector data is represented by a maximum length of the vector, and two arrays. The first array contains data indexes, and the second array contains the non-zero data values corresponding to the indexes. In python-oracledb, a SparseVector class encapsulates this information:

In [None]:
vector_data_sparse64 = oracledb.SparseVector(30, [3, 10, 12], array.array("d", [2.5, 2.5, 1.0]))

Insertion is simply a matter of binding the vector:

In [None]:
cursor.execute("insert into vtab (v64sparse) values (:1)", [vector_data_sparse64])

Verify it was inserted correctly:

In [None]:
for row in cursor.execute("select v64sparse from vtab"):
    print(row)

## VECTOR Query Metadata

Query metadata can be used to describe vector columns:

In [None]:
cursor.execute("select v64sparse from vtab")
desc = cursor.description[0]

print(desc.vector_format, desc.vector_dimensions, desc.vector_is_sparse)