# Infinispan VectorStore: similarity search demo 1

Infinispan is an open-source key-value data grid, it can work as single node as well as distributed.

Vector search is supported since release 15.x
For more: [Infinispan Home](https://infinispan.org)

In [2]:
# Ensure that all we need is installed
# You may want to skip this 
%pip install sentence-transformers
%pip install langchain
%pip install langchain_core
%pip install langchain_community


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;4

# Setup

To run this demo we need a running Infinispan instance without authentication and a data file (bbc_news.csv.gz)

In [4]:
!docker rm --force infinispanvs-demo
!docker run -d --name infinispanvs-demo -v $(pwd):/user-config  -p 11222:11222 infinispan/server:15.0.0.Dev09 -c /user-config/infinispan-noauth.yaml 

infinispanvs-demo
25d337666cba9ef664e75959084a915212f5d0b502221b5a1627c8778063f2a4


# The Code

## Pick up an embedding model

In this demo we're using
a HuggingFace embedding mode.

In [5]:
from langchain_core.embeddings import Embeddings
from langchain.embeddings import HuggingFaceEmbeddings
model_name = "sentence-transformers/all-MiniLM-L12-v2"
hf = HuggingFaceEmbeddings(model_name=model_name)

  from .autonotebook import tqdm as notebook_tqdm


## Prepare the data

In this demo we choose to store text,vector and metadata in the same cache, but other options
are possible: i.e. content can be store somewhere else and vector store could contain only a reference to the actual content.

In [6]:
import csv, time, gzip
# Open the news file and process it as a csv
with gzip.open('bbc_news.csv.gz', 'rt', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    i=0
    texts = []
    metas = []
    embeds = []
    for row in spamreader:
        # first and fifth value are joined to form the content
        # to be processed
        text=row[0]+"."+row[4]
        texts.append(text)
        # Storing meta
        # Store text and title as metadata
        meta={}
        meta["text"]=row[4]
        meta["title"]=row[0]
        metas.append(meta)
        i=i+1
        # Change this to change the number of news you want to load
        if (i >= 5000):
            break

# Populate the vector store

This cell will create the vector store, all the needed configuration on the
Infinispan side is performed by the infinispan_vector module

In [7]:
# Creating a langchain_core.VectorStore

from infinispan_vector import InfinispanVS
ispnvs = InfinispanVS.from_texts(texts=texts, metadatas=metas, embedding=hf)

# An helper func that prints the result documents

By default InfinispanVS returns the protobuf `text` field in the `Document.page_content`
and all the remaining protobuf fields (except the vector) in the `metadata`. This behaviour is
configurable via lambda functions at setup.

In [8]:
def print_docs(docs):
    for res, i in zip(docs, range(len(docs))):
        print("----"+str(i+1)+"----")
        print("TITLE: "+res.metadata["title"])        
        print(res.page_content)

# Try it!!!

Below some sample queries

In [9]:
docs = ispnvs.similarity_search("European nations",5)
print_docs(docs)

----1----
TITLE: EU awards Ukraine and Moldova candidate status
President Zelensky calls it a "unique and historical moment... Ukraine's future is within the EU".
----2----
TITLE: Northern Ireland: UK and EU's row risks Western unity, top US official warns
US state department warns against "a big fight between the UK and the EU" amid the Ukraine war.
----3----
TITLE: Why ex-French colonies are joining the Commonwealth
Behind the Commonwealth's allure as it welcomes Gabon and Togo into its ranks.
----4----
TITLE: European peace seems as fragile as ever
The shifting of European history's tectonic plates is not really that unexpected, explains Kevin Connolly.
----5----
TITLE: A Nato summit in Madrid for hawks
The BBC's Frank Gardner speaks to Nato leaders during the first summit since Russia invaded Ukraine.


In [10]:
print_docs(ispnvs.similarity_search("Milan fashion week begins",2))

----1----
TITLE: In pictures: Head-turning millinery at Royal Ascot Ladies' Day 2022
Racegoers in their fabulous finery for Ladies' Day, on the third day of Royal Ascot.
----2----
TITLE: Fast fashion: European Union reveals fast fashion crackdown
Fast fashion could be a thing of the past under plans to make clothing worn in the EU more sustainable.


In [11]:
print_docs(ispnvs.similarity_search("Stock market is rising today",4))

----1----
TITLE: Summer food prices to rise quickly, say grocery analysts
There will be a spike in food prices this summer, the Institute of Grocery Distribution predicts.
----2----
TITLE: Prices rising at fastest rate since 1992
Prices went up by 7% in the 12 months to March, as food, fuel and energy prices continued to climb.
----3----
TITLE: US makes biggest interest rate rise in almost 30 years
The US central bank raises rates by 0.75 percentage points as it scrambles to contain soaring prices.
----4----
TITLE: Why is inflation in US higher than elsewhere?
A surge in government spending drove US inflation to the highest of advanced economies last year.


In [None]:
print_docs(ispnvs.similarity_search("Why cats are so viral?",2))

In [None]:
print_docs(ispnvs.similarity_search("How to stay young",5))

In [None]:
!docker rm --force infinispanvs-demo