# Handy Utils to do Vector Search on Collections

## Configuration

In [1]:
from my_config import MY_CONFIG

## Connect to Vector Database

Milvus can be embedded and easy to use.

<span style="color:blue;">Note: If you encounter an error about unable to load database, try this: </span>

- <span style="color:blue;">In **vscode** : **restart the kernel** of previous notebook. This will release the db.lock </span>
- <span style="color:blue;">In **Jupyter**: Do `File --> Close and Shutdown Notebook` of previous notebook. This will release the db.lock</span>
- <span style="color:blue;">Re-run this cell again</span>


In [2]:
from pymilvus import MilvusClient

milvus_client = MilvusClient(MY_CONFIG.DB_URI)

print ("✅ Connected to Milvus instance:", MY_CONFIG.DB_URI)

✅ Connected to Milvus instance: ./rag_1_dpk.db


## Setup Embeddings

Two choices here. 

1. use sentence transformers directly
2. use Milvus model wrapper

In [3]:
## Option 1 - use sentence transformers directly

# If connection to https://huggingface.co/ failed, uncomment the following path
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer(MY_CONFIG.EMBEDDING_MODEL)

def get_embeddings (str):
    embeddings = embedding_model.encode(str, normalize_embeddings=True)
    return embeddings

  from tqdm.autonotebook import tqdm, trange


In [4]:
## Option 2 - Milvus model
from pymilvus import model

# If connection to https://huggingface.co/ failed, uncomment the following path
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'


# embedding_fn = model.DefaultEmbeddingFunction()

## initialize the SentenceTransformerEmbeddingFunction
embedding_fn = model.dense.SentenceTransformerEmbeddingFunction(
    model_name = MY_CONFIG.EMBEDDING_MODEL,
    device='cpu' # this will work on all devices (KIS)
)

In [5]:
# Test Embeddings
text = 'Paris 2024 Olympics'
embeddings = get_embeddings(text)
print ('sentence transformer : embeddings len =', len(embeddings))
print ('sentence transformer : embeddings[:5] = ', embeddings[:5])

embeddings = embedding_fn([text])
print ('milvus model wrapper : embeddings len =', len(embeddings[0]))
print ('milvus model wrapper  : embeddings[:5] = ', embeddings[0][:5])

sentence transformer : embeddings len = 384
sentence transformer : embeddings[:5] =  [ 0.02468893  0.10352131  0.02752644 -0.08551719 -0.01412828]
milvus model wrapper : embeddings len = 384
milvus model wrapper  : embeddings[:5] =  [ 0.02468893  0.10352128  0.02752643 -0.08551716 -0.01412826]


## Do A  Vector Search

We will do this to verify data

In [6]:
import random


## helper function to perform vector search
def  do_vector_search (query):
    query_vectors = [get_embeddings(query)]  # Option 1 - using sentence transformers
    # query_vectors = embedding_fn([query])  # using Milvus model 

    results = milvus_client.search(
        collection_name=MY_CONFIG.COLLECTION_NAME,  # target collection
        data=query_vectors,  # query vectors
        limit=5,  # number of returned entities
        output_fields=["filename", "page_number", "text"],  # specifies fields to be returned
    )
    return results
## ----

def  print_search_results (results):
    # pprint (results)
    print ('num results : ', len(results[0]))

    for i, r in enumerate (results[0]):
        #pprint(r, indent=4)
        print (f'------ result {i+1} --------')
        print ('search score:', r['distance'])
        print ('filename:', r['entity']['filename'])
        print ('page number:', r['entity']['page_number'])
        print ('text:\n', r['entity']['text'])
        print()

## Questions for Vector Search

See [questions](questions.md) file for some sample questions.  You can ask your own questions, of course!

In [7]:

## papers
# query = "What was the training data used to train Granite model?"

## FOMC
query = "Who is on the board?"
query = "Which members voted?"

results = do_vector_search (query)
print_search_results(results)

num results :  5
------ result 1 --------
search score: 0.3802601099014282
filename: monetary20240731a1.pdf
page number: 2
text:
 FEDERAL RESERVE press release
Voting for the monetary policy action were Jerome H. Powell, Chair; John C. Williams, Vice Chair; Thomas I. Barkin; Michael S. Barr; Raphael W. Bostic; Michelle W. Bowman; Lisa D. Cook; Mary C. Daly; Austan D. Goolsbee; Philip N. Jefferson; Adriana D. Kugler; and Christopher J. Waller. Austan D. Goolsbee voted as an alternate member at this meeting.

------ result 2 --------
search score: 0.25401562452316284
filename: monetary20240918a1.pdf
page number: 2
text:
 FEDERAL RESERVE press release
Voting for the monetary policy action were Jerome H. Powell, Chair; John C. Williams, Vice Chair; Thomas I. Barkin; Michael S. Barr; Raphael W. Bostic; Lisa D. Cook; Mary C. Daly; Beth M. Hammack; Philip N. Jefferson; Adriana D. Kugler; and Christopher J. Waller. Voting against this action was Michelle W. Bowman, who preferred to lower the t

In [8]:
## papers
# query = "What is attention mechanism?"

## FOMC
query = "What is the target inflation rate?"
query = "When would the rate cut take effect?"


results = do_vector_search (query)
print_search_results(results)

num results :  5
------ result 1 --------
search score: 0.33179062604904175
filename: monetary20240918a1.pdf
page number: 1
text:
 FEDERAL RESERVE press release
In light of the progress on inflation and the balance of risks, the Committee decided to lower the target range for the federal funds rate by 1/2 percentage point to 4-3/4 to 5 percent. In considering additional adjustments to the target range for the federal funds rate, the Committee will carefully assess incoming data, the evolving outlook, and the balance of risks. The Committee will continue reducing its holdings of Treasury securities and agency debt and agency mortgage-backed securities. The Committee is strongly committed to supporting maximum employment and returning inflation to its 2 percent objective.

------ result 2 --------
search score: 0.3317785859107971
filename: monetary20240731a1.pdf
page number: 1
text:
 FEDERAL RESERVE press release
In support of its goals, the Committee decided to maintain the target range

In [9]:
# milvus_client.close()