# Vector Search with Cloud SQL

In this notebook, we'll leverage the Vector Search capabilities available in [InterSystems IRIS Cloud SQL](https://developer.intersystems.com/products/iris-cloud-sql-integratedml/). The feature works in exactly the same way as in the InterSystems IRIS 2025.1 release, but Cloud SQL requires secure connections, and this notebook illustrates how to set those up.

First, please adapt the password and hostname entries in the following cell to match your Cloud SQL deployment.

In [None]:
username = 'SQLAdmin'
password = '...'
hostname = '...'
port = 443 
namespace = 'USER'

### Copying the certificate

In order to connect securely, you'll need to point the driver at the `certificateSQLaaS.pem` file for your Cloud SQL deployment. You can download the certificate file from your deployment's detail screen. Look for the button that says "Get X.509 certificate". If you're running this notebook in a container, you can copy the certificate file into the container using the following command:

```Shell
docker cp ~/Downloads/certificateSQLaaS.pem iris-vector-search-jupyter-1:/usr/cert-demo/certificateSQLaaS.pem
```

We'll use simple DB-API commands to establish a connection in this example:

In [None]:
import intersystems_iris as iris
import ssl

# change this to wherever you copied your certificate to
certificateFile = "/usr/cert-demo/certificateSQLaaS.pem"
sslcontext = ssl.create_default_context(cafile=certificateFile)

connection = iris.connect( hostname, port, namespace, username, password, sslcontext = sslcontext )
cursor = connection.cursor()

cursor.execute("SELECT 'hello secure world!'")
cursor.fetchone()[0]

## Vector time!

Now that we have established a secure connection, let's get onto some actual vector stuff!

See the neighbouring `sql_demo.ipynb` for full detail on what we're trying to achieve here.

In [None]:
import pandas as pd

# Load the CSV file
df = pd.read_csv('../data/scotch_review.csv')
df.head()

In [None]:
# Clean data
# Remove the specified columns
df.drop(['currency'], axis=1, inplace=True)

# Drop the first column
df.drop(columns=df.columns[0], inplace=True)

# Remove rows without a price
df.dropna(subset=['price'], inplace=True)

# Ensure values in 'price' are numbers
df = df[pd.to_numeric(df['price'], errors='coerce').notna()]

# Replace NaN values in other columns with an empty string
df.fillna('', inplace=True)

df.head()

In [None]:
from sentence_transformers import SentenceTransformer

# Load a pre-trained sentence transformer model. This model's output vectors are of size 384
model = SentenceTransformer('all-MiniLM-L6-v2') 

# Generate embeddings for all descriptions at once. Batch processing makes it faster
embeddings = model.encode(df['description'].tolist(), normalize_embeddings=True)

# Add the embeddings to the DataFrame
df['description_vector'] = embeddings.tolist()

df.head()

## And now load them into Cloud SQL

We'll first create a table and then ingest all the rows from the dataframe we created earlier.

In [None]:
cursor.execute('DROP TABLE IF EXISTS scotch_reviews')
cursor.execute(f"""CREATE TABLE scotch_reviews (
                    name VARCHAR(255),
                    category VARCHAR(255),
                    review_point INT,
                    price DOUBLE,
                    description VARCHAR(2000),
                    description_vector VECTOR(FLOAT, 384)
                )""")

seq = []
for index, row in df.iterrows():
    seq.append((row['name'], row['category'], row['review.point'], row['price'], row['description'], str(row['description_vector'])))

success = cursor.executemany("INSERT INTO scotch_reviews (name, category, review_point, price, description, description_vector) VALUES (?, ?, ?, ?, ?, TO_VECTOR(?))", seq)


In [None]:
description_search = "earthy and creamy taste"
search_vector = model.encode(description_search, normalize_embeddings=True).tolist() # Convert search phrase into a vector

cursor.execute("""
            SELECT TOP 3 * FROM scotch_reviews 
            WHERE price < 100 
            ORDER BY VECTOR_DOT_PRODUCT(description_vector, TO_VECTOR(?)) DESC
        """, [str(search_vector)])

print(cursor.fetchall())