## Nemo Retrieval Microservice Tutorial
### Implementing using LCEL

This notebook provides a simple end-to-end example of how to use Nemo Retreiver Microservice APIs.
# Getting Started with the Nemo Retriever "Retriever" microservice
|rkharwar@nvidia.com| Author(s) | [Ruchika Kharwar](https://github.com/rasalt)

NOTE: This notebook has been tested in the following environment:
Python version = 3.10.8

## Pipelines 

A pipeline is an end-to-end retrieval function using Nvidia Retriever Microservice.
This system is accessed via a set of API calls/Client library

Here we list the pipeline names along with their status and the embedding model the pipeline is using. Notice the document store being used on the backend is part of the pipeline name. 

There are other properties of the pipelines (chunking strategy) which can also be viewed by printing out the entire pipeline object.

## Overview

<> 

## Objective
This notebook aims to show you how to leverage a freshly deployed "embedding micro-service".
These examples aim to be building blocks of the larger solution you will likley have in place for yout Generative AI use case.

## Before you begin
### Set up your environment.
Refer to page <> for details on how to deploy the service.
You should have docker services running in your environment thus  

docker                         dockerd                        dockerd-rootless-setuptool.sh  dockerd-rootless.sh            docker-proxy                   
nvidia@dev-h100-rkharwar-gpu01:~/retriever_03182024/docker-compose$ docker compose -f 
config/                      docker-compose-ea.yaml       docker-compose-nemollm.yaml  models/                      models_orig/                 volumes/                     
nvidia@dev-h100-rkharwar-gpu01:~/retriever_03182024/docker-compose$ docker compose -f docker-compose-ea.yaml ps
NAME                              IMAGE                                                                              COMMAND                  SERVICE          CREATED        STATUS                    PORTS
docker-compose-elasticsearch-1    docker.elastic.co/elasticsearch/elasticsearch:8.12.0                               "/bin/tini -- /usr/l…"   elasticsearch    21 hours ago   Up 21 hours (healthy)     0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp
docker-compose-embedding-ms-1     nvcr.io/ohlfw0olaadg/ea-participants/nemo-retriever-embedding-microservice:24.02   "/opt/nvidia/nvidia_…"   embedding-ms     21 hours ago   Up 21 hours (healthy)     
docker-compose-etcd-1             quay.io/coreos/etcd:v3.5.11                                                        "etcd -advertise-cli…"   etcd             21 hours ago   Up 21 hours (healthy)     2379-2380/tcp
docker-compose-milvus-1           milvusdb/milvus:v2.3.5                                                             "/tini -- milvus run…"   milvus           21 hours ago   Up 21 hours (healthy)     
docker-compose-minio-1            minio/minio:RELEASE.2023-03-20T20-16-18Z                                           "/usr/bin/docker-ent…"   minio            21 hours ago   Up 21 hours (healthy)     9000/tcp
docker-compose-otel-collector-1   otel/opentelemetry-collector-contrib:0.91.0                                        "/otelcol-contrib --…"   otel-collector   21 hours ago   Up 21 hours               0.0.0.0:4317->4317/tcp, :::4317->4317/tcp, 0.0.0.0:13133->13133/tcp, :::13133->13133/tcp, 0.0.0.0:55679->55679/tcp, :::55679->55679/tcp, 55678/tcp
docker-compose-postgres-1         postgres:16.1                                                                      "docker-entrypoint.s…"   postgres         21 hours ago   Up 21 hours               0.0.0.0:5432->5432/tcp, :::5432->5432/tcp
docker-compose-retrieval-ms-1     nvcr.io/ohlfw0olaadg/ea-participants/nemo-retriever-microservice:24.02             "/usr/bin/shelless_u…"   retrieval-ms     21 hours ago   Up 21 hours (unhealthy)   0.0.0.0:1984->8000/tcp, :::1984->8000/tcp
docker-compose-tika-1             apache/tika:2.9.1.0                                                                "/bin/sh -c 'exec ja…"   tika             21 hours ago   Up 21 hours               0.0.0.0:9998->9998/tcp, :::9998->9998/tcp
docker-compose-zipkin-1           openzipkin/zipkin:3.0.6                                                            "start-zipkin"           zipkin           21 hours ago   Up 21 hours (healthy)     9410/tcp, 0.0.0.0:9411->9411/tcp, :::9411->9411/tcp

### Setup the Nemo retriever
- Initialize Client
- List the collections that exist

In [7]:
from nemo_retriever.retriever_client import RetrieverClient
from pprint import pprint

retriever = RetrieverClient(base_url="http://localhost:1984")

# GET collections: list all created collections
collections_response = retriever.get_collections()
collections = collections_response.collections

print("These collections exists: ")

for collection in collections:
    print(f"Collection '{collection.name}'")

These collections exists: 


### Create collection and add files
- Create Colection "testCollection"
- Add a pdf
- Add a text

In [8]:
# Create a collection specificying the pipeline type
response = retriever.create_collection(pipeline="hybrid", name="testCollection")
collection_id = response.collection.id  # store ID of the newly created collection
print("Collection id {} created".format(collection_id))

# Add a pdf with metadata to the collection 
FILE1 = "./files/python-basics-sample-chapters.pdf"
try:

    my_document_string = "Alice's favorite color is green. Her favorite ice cream is mint chip. She lives in Kansas."
    response = retriever.add_document(collection_id=collection_id,
                                       filepath=FILE1,
                                       format="pdf",
                                       metadata={"title": "A practical introduction to python 3",
                                                 "authors": "David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler"})
    
    created_document_id_1 = response.documents[0].id
    print("Added file 1: {} to the collection".format(created_document_id_1))

except Exception as e:
    print("An error occurred while adding a document (PDF):", e)


# Add a text string to the collection
STRING1 = "Alice's favorite color is green. Her favorite ice cream is mint chip. She lives in Kansas."
try:

    response = retriever.add_document(collection_id=collection_id,
                                       content=STRING1,
                                       format="txt",
                                       metadata={"title": "A practical introduction to python 3",
                                                 "authors": "David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler"})
    
    created_document_id_2 = response.documents[0].id
    print("Added text string: {} to the collection".format(created_document_id_2))

except Exception as e:
    print("An error occurred while adding a document (PDF):", e)

Collection id 80c9026e-42b6-4910-bc07-2442848b13bc created
Added file 1: 2350732e5d1a15c9df7d70e39c948c312efbd93b858a898dc3d02893221cca4d to the collection
Added text string: c18f09aa335fa6b87885cb04f5aecd5fdd3591c960f85c255b18d76ecae84573 to the collection


### Query the collection

In [22]:
# Define the query you want to search for
query = "What is Python"

# Define the number of top results you want to retrieve
top_k = 5  # Adjust as needed

# Perform the search
search_response = retriever.search_collection(collection_id, query, top_k)


# Process and print the search results
for chunk in search_response.chunks:
    print("Chunk ID: " + chunk.id)
    print("Chunk Score: ",chunk.score)
    print("Chunk Metadata: ",chunk.metadata)
    print("-"*20)
    print("Chunk Content: ",chunk.content)
    print("-" * 80)
    

Chunk ID: 16570a8fa8eb58b2ca2b0724cefbbe2304125102f08a44437c1f2c93c96fafcc
Chunk Score:  anyof_schema_1_validator=0.5 anyof_schema_2_validator=None actual_instance=0.5 any_of_schemas=typing.Literal['float', 'object']
Chunk Metadata:  metadata={'title': 'A practical introduction to python 3', 'authors': 'David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler', '_indexed_at': '2024-03-21T16:06:06.926289', 'source_id': '2350732e5d1a15c9df7d70e39c948c312efbd93b858a898dc3d02893221cca4d'}
--------------------
Chunk Content:  youmay encounter problemswhen running some of the code examples.

37

https://realpython.com/quizzes/pybasics-setup/


2.4. Ubuntu Linux
Install Python
There’s a good chance that your Ubuntu distribution already hasPython installed, but it probably won’t be the latest version, and itmay be Python 2 instead of Python 3.
To find out what version(s) you have, open a terminal window and trythe following commands:
$ python --version

$ python3 --version

One or more of the

### Delete all collections

In [6]:
for collection in collections:
    try:
        retriever.delete_collection(collection.id)
        print(f"Collection '{collection.name}' deleted successfully.")
    except Exception as e:
        print(f"Failed to delete collection '{collection.name}': {str(e)}")

Collection 'testCollection' deleted successfully.
Collection 'testCollection' deleted successfully.
Collection 'testCollection' deleted successfully.
