# Infinispan VectorStore: similarity search demo 1

Infinispan is an open-source key-value data grid, it can work as single node as well as distributed.

Vector search is supported since release 15.x
For more: [Infinispan Home](https://infinispan.org)

In [None]:
# Ensure that all we need is installed
# You may want to skip this 
%pip install sentence-transformers
%pip install langchain
%pip install langchain_core
%pip install langchain_community

# Setup

To run this demo we need a running Infinispan instance without authentication and a data file (bbc_news.csv.gz)

In [None]:
!docker rm --force infinispanvs-demo
!docker run -d --name infinispanvs-demo -v $(pwd):/user-config  -p 11222:11222 infinispan/server:15.0.0.Dev09 -c /user-config/infinispan-noauth.yaml 

# The Code

## Pick up an embedding model

In this demo we're using
a HuggingFace embedding mode.

In [None]:
from langchain_core.embeddings import Embeddings
from langchain.embeddings import HuggingFaceEmbeddings
model_name = "sentence-transformers/all-MiniLM-L12-v2"
hf = HuggingFaceEmbeddings(model_name=model_name)

## Prepare the data

In this demo we choose to store text,vector and metadata in the same cache, but other options
are possible: i.e. content can be store somewhere else and vector store could contain only a reference to the actual content.

In [None]:
import csv, time, gzip
# Open the news file and process it as a csv
with gzip.open('bbc_news.csv.gz', 'rt', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    i=0
    texts = []
    metas = []
    embeds = []
    for row in spamreader:
        # first and fifth value are joined to form the content
        # to be processed
        text=row[0]+"."+row[4]
        texts.append(text)
        # Storing meta
        # Store text and title as metadata
        meta={}
        meta["text"]=row[4]
        meta["title"]=row[0]
        metas.append(meta)
        i=i+1
        # Change this to change the number of news you want to load
        if (i >= 5000):
            break

# Populate the vector store

This cell will create the vector store, all the needed configuration on the
Infinispan side is performed by the infinispan_vector module

In [None]:
# Creating a langchain_core.VectorStore

from infinispan_vector import InfinispanVS
ispnvs = InfinispanVS.from_texts(texts=texts, metadatas=metas, embedding=hf)

# An helper func that prints the result documents

By default InfinispanVS returns the protobuf `text` field in the `Document.page_content`
and all the remaining protobuf fields (except the vector) in the `metadata`. This behaviour is
configurable via lambda functions at setup.

In [None]:
def print_docs(docs):
    for res, i in zip(docs, range(len(docs))):
        print("----"+str(i+1)+"----")
        print("TITLE: "+res.metadata["title"])        
        print(res.page_content)

# Try it!!!

Below some sample queries

In [None]:
docs = ispnvs.similarity_search("European nations",5)
print_docs(docs)

In [None]:
print_docs(ispnvs.similarity_search("Milan fashion week begins",2))

In [None]:
print_docs(ispnvs.similarity_search("Stock market is rising today",4))

In [None]:
print_docs(ispnvs.similarity_search("Why cats are so viral?",2))

In [None]:
print_docs(ispnvs.similarity_search("How to stay young",5))

In [None]:
!docker rm --force infinispanvs-demo