## Querying a Milvus index - Nomic AI Embeddings

Simple example on how to query content from a Milvus VectorStore. In this example, the embeddings are the fully open source ones released by NomicAI, [nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1).

As described in [this blog post](https://blog.nomic.ai/posts/nomic-embed-text-v1), those embeddings feature a "8192 context-length that outperforms OpenAI Ada-002 and text-embedding-3-small on both short and long context tasks". In additions, they are:

- Open source
- Open data
- Open training code
- Fully reproducible and auditable

Requirements:
- A Milvus instance, either standalone or cluster.

### Needed packages and imports

In [1]:
!pip install -q einops==0.7.0 langchain==0.1.9 pymilvus==2.3.6 sentence-transformers==2.4.0


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import os
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Milvus

### Base parameters, the Milvus connection info

In [3]:
MILVUS_HOST = "vectordb-milvus.milvus.svc.cluster.local"
MILVUS_PORT = 19530
MILVUS_USERNAME = "root"
MILVUS_PASSWORD = "Milvus"
MILVUS_COLLECTION = "collection_nomicai_embeddings"

### Initialize the connection

In [4]:
# If you don't want to use a GPU, you can remove the 'device': 'cuda' argument
model_kwargs = {'trust_remote_code': True, 'device': 'cuda'}
embeddings = HuggingFaceEmbeddings(
    model_name="nomic-ai/nomic-embed-text-v1",
    model_kwargs=model_kwargs,
    show_progress=True
)

store = Milvus(
    embedding_function=embeddings,
    connection_args={"host": MILVUS_HOST, "port": MILVUS_PORT, "user": MILVUS_USERNAME, "password": MILVUS_PASSWORD},
    collection_name=MILVUS_COLLECTION,
    metadata_field="metadata",
    text_field="page_content",
    drop_old=False
    )

You try to use a model that was created with version 2.4.0.dev0, however, your version is 2.4.0. This might cause unexpected behavior or errors. In that case, try to update to the latest version.



!!!!!!!!!!!!megablocks not available, using torch.matmul instead
<All keys matched successfully>


### Make a query to the index to verify sources

In [5]:
query="How can I create a Data Science Project?"
results = store.similarity_search_with_score(query, k=4, return_metadata=True)
for result in results:
    print(result[0].metadata['source'])

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.13/html-single/working_on_data_science_projects/index
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.13/html-single/openshift_ai_tutorial_-_fraud_detection_example/index
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.13/html-single/getting_started_with_red_hat_openshift_ai_self-managed/index
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.13/html-single/working_on_data_science_projects/index


### Work with a retriever

In [6]:
retriever = store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [7]:
docs = retriever.get_relevant_documents(query)
docs

  warn_deprecated(


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[Document(page_content='CHAPTER 1. USING DATA SCIENCE PROJECTS\n1.1. CREATING A DATA SCIENCE PROJECT\nTo implement a data science workflow, you must create a project. In OpenShift, a project is a Kubernetes\nnamespace with additional annotations, and is the main way that you can manage user access to\nresources. A project organizes your data science work in one place and also allows you to collaborate\nwith other developers and data scientists in your organization.\nWithin a project, you can add the following functionality:\nData connections so that you can access data without having to hardcode information like\nendpoints or credentials.\nWorkbenches for working with and processing data, and for developing models.\nDeployed models so that you can test them and then integrate them into intelligent\napplications. Deploying a model makes it available as a service that you can access by using an\nAPI.\nPipelines for automating your ML workflow.\nPrerequisites\nYou have logged in to Red Ha