# Demo 4 (LangChain, Vectors)

Something more about vectors in LangChain

In [1]:
import os
import cassio

cassio.init(
    token=os.environ['ASTRA_DB_APPLICATION_TOKEN'],
    database_id=os.environ['ASTRA_DB_ID'],
    keyspace=os.environ.get('ASTRA_DB_KEYSPACE'),
)

In [6]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.vectorstores import Cassandra
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

In [3]:
oai_embeddings = OpenAIEmbeddings()
oai_llm = OpenAI()

### We re-use the vector store from "demo2"

Run that one to populate the store first.

In [5]:
demo4_store = Cassandra(
    table_name='news_v_store',
    embedding=oai_embeddings,
    session=None,  # = get defaults from init()
    keyspace=None,  # = get defaults from init()
)

## Getting a score with the results

This is the cosine similarity, scaled from zero to one (higher=more similar)

In [14]:
result_pairs = demo4_store.similarity_search_with_relevance_scores(
    "Someone found a way to peek at electrons better",
    k=3,
    score_threshold=0.8,
)
for i, (doc, score) in enumerate(result_pairs):
    print(f"[{i}] \"{doc.page_content[:64]}\"... ({score:.4f})")

[0] "Electrons move around so fast that they have been out of reach o"... (0.9325)
[1] "But even when they "see" the electron, there's only so much they"... (0.9271)
[2] "The scientists' experiments "have given humanity new tools for e"... (0.9235)


## MMR retrieval

It is out-of-the-box in LangChain

In [26]:
result_docs = demo4_store.search(
    "Fancy a Nobel laureate?",
    search_type="mmr",
    fetch_k=8,
    k=3,
)
for i, doc in enumerate(result_docs):
    print(f"[{i}] \"{doc.page_content[:64]}\"...")

[0] "Agostini is affiliated with Ohio State University in the U.S.

T"...
[1] "L'Huillier said she was teaching when she got the call that she "...
[2] "The scientists' experiments "have given humanity new tools for e"...


### Add more documents

... with different metadata

In [43]:
falsehoods = [
    "Electrons are not so tiny after all, more like a tennis ball",
    "The Nobel Prize in 2023 went to an oak tree for having dreamt of an electron",
    "Nobody has ever seen an electron, they might not even exist",
]
f_metadatas = [{"status": "fake"}] * 3

_ = demo4_store.add_texts(falsehoods, f_metadatas)

In [44]:
index = VectorStoreIndexWrapper(vectorstore=demo4_store)

## Metadata filtering

Check the output with or without restricting the search to the fake entries:

In [45]:
print(index.query("Tell me about electrons and the nobel prize", llm=oai_llm))

 The Nobel Prize in Physics in 2023 was awarded to three scientists who have studied how electrons move around atoms in fractions of a second. This science could lead to better electronics, disease diagnoses, and better understanding of chemistry, physics, our bodies, and our gadgets. By studying the tiniest fraction of a second, scientists can gain a “blurry” glimpse of electrons, which could open up new sciences.


In [47]:
retr_kwargs = {"search_kwargs": {"filter": {"status": "fake"}}}

print(index.query("Tell me about electrons and the nobel prize", llm=oai_llm, retriever_kwargs=retr_kwargs))

 Electrons are tiny particles that make up atoms and are believed to be the smallest form of matter. They have never been seen because they are so small. In 2023, the Nobel Prize was given to an oak tree for having allegedly dreamt of an electron, despite the fact that the size of an electron is debated and may be larger than many people think.
