# Using InterSystems Vector Search with LangChain

In this notebook, we'll leverage the Vector Search capabilities available in [InterSystems IRIS 2025.1](https://www.intersystems.com/news/iris-vector-search-support-ai-applications/) and [InterSystems IRIS Cloud SQL](https://developer.intersystems.com/products/iris-cloud-sql-integratedml/), using the well-known [LangChain](https://www.langchain.com/) framework.

## Setting up the connection

First, let's make sure we set up the connection to your InterSystems IRIS instance or Cloud SQL deployment. When targeting a Cloud SQL deployment, change the username and password to `SQLAdmin` and the corresponding password you chose when creating the deployment, and set the port to 443. 


In [1]:
import os

username = 'demo'
password = 'demo'
hostname = os.getenv('IRIS_HOSTNAME', 'localhost')
port = 1972 
namespace = 'USER'

### Securing the connection

If the target you're connecting to requires secure connections, as is the case for Cloud SQL deployments, we need to supply a certificate and some additional settings to the driver. For Cloud SQL, you can download the certificate file from your deployment's details screen. Look for the button that says "Get X.509 certificate", and copy it into a local folder, such as `/usr/cert-demo/`. If you're running this notebook in a container, you can copy the certificate file into the container using the following command:

```Shell
docker cp ~/Downloads/certificateSQLaaS.pem iris-vector-search-jupyter-1:/usr/cert-demo/certificateSQLaaS.pem
```

Remember to also set the port to 443 in the cell above.

In [2]:
import ssl

certificateFile = "/usr/cert-demo/certificateSQLaaS.pem"

if (os.path.exists(certificateFile)):
    print("Located SSL certficate at '%s', initializing SSL configuration", certificateFile)
    sslcontext = ssl.create_default_context(cafile=certificateFile)
else:
    print("No certificate file found, continuing with insecure connection")
    sslcontext = None

No certificate file found, continuing with insecure connection


In [3]:
from sqlalchemy import create_engine, text

url = f"iris://{username}:{password}@{hostname}:{port}/{namespace}"
engine = create_engine(url, connect_args={"sslcontext": sslcontext})
with engine.connect() as conn:
    print(conn.execute(text("SELECT 'hello world!'")).first()[0])

hello world!


## Creating vectors using LangChain

The following cell will load the `state_of_the_union.txt` file from the `/data/` directory and split it into chunks that are ready for translation into vectors, using standard LangChain components.

In [4]:
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

loader = TextLoader("../data/state_of_the_union.txt", encoding='utf-8')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=20)
docs = text_splitter.split_documents(documents)

### Setting up your OpenAI API key

If you have an OpenAI subscription, use the following cell to pick up your OpenAI API key, and use `OpenAIEmbeddings()` in the cells below. 

Alternatively, you can skip this step and use a local embeddings model that's included in the libraries already imported, such as `HuggingFaceEmbeddings()`, `FastEmbeddings()`, or `FakeEmbeddings()` (for testing purposes!). Just comment / uncomment the corresponding lines in the cells further down the notebook.

In [5]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv(override=True)

if "OPENAI_API_KEY" in os.environ:
    os.environ.pop("OPENAI_API_KEY")

if not os.environ.get("OPENAI_API_KEY"): 
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")


OpenAI API Key: ········


In this cell, we'll put all the pieces together and create embeddings for our document collection and store them as a collection in our IRIS database.

In [6]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.embeddings import FakeEmbeddings
from langchain.embeddings.fastembed import FastEmbedEmbeddings

from langchain_iris import IRISVector

db = IRISVector.from_documents(
    embedding = OpenAIEmbeddings(), 
    # embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"),
    # embedding = FastEmbeddings(),
    # embedding = FakeEmbeddings(size=123),
    documents = docs,
    collection_name = "state_of_the_union_test",
    connection_string = f"iris://{username}:{password}@{hostname}:{port}/{namespace}",
    engine_args = { "connect_args": {"sslcontext": sslcontext} }
)

print(f"Number of docs in vector store: {len(db.get()['ids'])}")

  embedding = OpenAIEmbeddings(),


Number of docs in vector store: 114


Next, we'll use LangChain's similarity search API to retrieve documents from our collection that match a free text query.

In [7]:
query = "Joint patrols to catch traffickers"
docs_with_score = db.similarity_search_with_score(query)

for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)

--------------------------------------------------------------------------------
Score:  0.180300940505385
And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. 

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score:  0.207073243024113
So let’s not abandon our streets. Or choose between safety and equal justice. 

Let’s come together to protect our communities, restore trust, and hold law enforcement accountable. 

That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.
---------------------------------------------------------------------------

In [8]:
db.add_documents([Document(page_content="dog")])
docs_with_score = db.similarity_search_with_score("dog")
docs_with_score[0]

(Document(metadata={}, page_content='dog'), 0.0)