# Get Started With NVIDIA Biomedical AI-Q Research Agent Blueprint Using NVIDIA API

This notebook helps you get started with the [Biomedical AI-Q Research Agent](https://build.nvidia.com/nvidia/biomedical-aiq-research-agent).


## Prerequisites 

- This blueprint depends on the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag). This deployment guide starts by deploying RAG using docker compose, but you should refer to the [RAG Blueprint documentation](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/quickstart.md) for full details. 

- Docker Compose

- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

- (Optional) This blueprint supports Tavily web search to supplement data from RAG. A Tavily API key can be supplied to enable this function. 

- [NVIDIA API Key](https://build.nvidia.com) This notebook uses NVIDIA NIM microservices hosted on build.nvidia.com. To deploy the NIM microservices locally, follow the [getting started deployment guide](https://github.com/NVIDIA-AI-Blueprints/biomedical-aiq-research-agent/blob/main/docs/get-started/get-started-docker-compose.md).

### Hardware Requirements

This notebook uses NVIDIA NIM microservices hosted on build.nvidia.com for the majority of the services that require GPUs. 

To run this notebook requires:
-  1xL40S or comparable
-  72 GB of disk space if deploying RAG ingestion NIMs locally as recommended, or 37 GB of disk space if using all public hosted NIM services
-  16 CPUs

### NVIDIA NIM Microservices

Access NVIDIA NIM microservices including:   
- NemoRetriever  
  - Page Elements  
  - Table Structure  
  - Graphic Elements  
  - Paddle OCR   
- Llama Instruct 3.3 70B  
- Llama Nemotron 3.3 Super 49B  
- BioNeMo MolMIM
- BioNeMo DiffDock


## Step 1: Deploy the RAG Blueprint

See the NVIDIA RAG blueprint documentation for full details. This notebook will use docker compose to deploy the RAG blueprint with *hosted NVIDIA NIM microservices*. Start by setting the appropriate environment variables.

In [None]:
#To pull images required by the blueprint from NGC, you must first authenticate Docker with nvcr.io.
import subprocess
import os

# ADD YOUR API KEY
NVIDIA_API_KEY = "nvapi-your-api-key"
os.environ['NVIDIA_API_KEY'] = NVIDIA_API_KEY
os.environ['NGC_API_KEY'] = NVIDIA_API_KEY

cmd = f"echo {NVIDIA_API_KEY} | docker login nvcr.io -u '$oauthtoken' --password-stdin"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
print(result.stdout)

Next, clone the NVIDIA RAG blueprint.

In [None]:
#Clone the github repository
!git clone https://github.com/NVIDIA-AI-Blueprints/rag.git

#### Locally Host vs Public Endpoint for RAG Ingestion NIMs
It is recommended in this notebook to deploy the RAG Blueprint ingestion NIMs locally, and keeping the following `deploy_rag_ingestion_locally = True` will do so. The RAG Ingestion NIMs are the only ones that by default will be deployed locally, all other NIMs in the RAG Blueprint and the Biomedical AI-Q Research Agent Developer Blueprint are by default set to the public hosted NVIDIA AI Endpoints in this notebook. However, if you do not want to deploy the RAG ingestion NIMs locally, you can choose to use the public NVIDIA AI Endpoints with your valid NVIDIA_API_KEY, by setting `deploy_rag_ingestion_locally = False`

If you're in the Brev launchable, the instance type should be compatible with the default local deployment, which is recommended for the Brev experience. In this case, no need to change anything below.

In [None]:
# change this parameter to False if you don't want to locally deploy the ingestion NIMs
deploy_rag_ingestion_locally = True

Add the necessary environment variables so that the RAG deployment will use the localy deployment or the hosted NVIDIA AI Endpoint services.

In [None]:
if deploy_rag_ingestion_locally:
    # Set the endpoint urls of the ingestion NIMs to local
    os.environ["APP_LLM_MODELNAME"] = "nvidia/llama-3.3-nemotron-super-49b-v1"
    os.environ["APP_LLM_SERVERURL"] = ""
    os.environ["APP_RANKING_SERVERURL"] = ""
    os.environ["APP_EMBEDDINGS_MODELNAME"] = "nvidia/llama-3.2-nv-embedqa-1b-v2"
    os.environ["APP_EMBEDDINGS_SERVERURL"] = "nemoretriever-embedding-ms:8000"
    os.environ["EMBEDDING_NIM_ENDPOINT"] = "http://nemoretriever-embedding-ms:8000/v1"
    os.environ["PADDLE_INFER_PROTOCOL"] = "grpc"
    os.environ["PADDLE_GRPC_ENDPOINT"] = "paddle:8001"
    os.environ["YOLOX_INFER_PROTOCOL"] = "grpc"
    os.environ["YOLOX_GRPC_ENDPOINT"] = "page-elements:8001"
    os.environ["YOLOX_GRAPHIC_ELEMENTS_GRPC_ENDPOINT"] = "graphic-elements:8001"
    os.environ["YOLOX_GRAPHIC_ELEMENTS_INFER_PROTOCOL"] = "grpc"
    os.environ["YOLOX_TABLE_STRUCTURE_GRPC_ENDPOINT"] = "table-structure:8001"
    os.environ["YOLOX_TABLE_STRUCTURE_INFER_PROTOCOL"] = "grpc"
    os.environ["ENABLE_RERANKER"] = "false" #Disable re-ranking
    os.environ["ENABLE_NV_INGEST_BATCH_MODE"] = "true"
    os.environ["USERID"] = str(os.getuid())
    # deploy the ingestion NIMs locally
    # PLEASE NOTE this can take up to 15 minutes
    try:
        result = subprocess.run(
                ["docker", "compose", "-f", "rag/deploy/compose/nims.yaml", "--profile", "ingest", "up", "-d"],
                env=os.environ,
                check=True,
                capture_output=True,
                text=True
        )
        print(result.stdout[-1000:], flush=True)
    except subprocess.CalledProcessError as e:
        print(e.stderr)
else:
    # If you do not want to deploy the RAG Ingestion NIMs locally, and want to use the public hosted endpoints instead:
    os.environ["APP_LLM_MODELNAME"] = "nvidia/llama-3.3-nemotron-super-49b-v1"
    os.environ["APP_EMBEDDINGS_MODELNAME"] = "nvidia/llama-3.2-nv-embedqa-1b-v2"
    os.environ["APP_RANKING_MODELNAME"] = "nvidia/llama-3.2-nv-rerankqa-1b-v2"
    os.environ["APP_EMBEDDINGS_SERVERURL"] = ""
    os.environ["APP_LLM_SERVERURL"] = ""
    os.environ["APP_RANKING_SERVERURL"] = ""
    os.environ["EMBEDDING_NIM_ENDPOINT"] = "https://integrate.api.nvidia.com/v1"
    os.environ["PADDLE_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/baidu/paddleocr"
    os.environ["PADDLE_INFER_PROTOCOL"] = "http"
    os.environ["YOLOX_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-page-elements-v2"
    os.environ["YOLOX_INFER_PROTOCOL"] = "http"
    os.environ["YOLOX_GRAPHIC_ELEMENTS_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-graphic-elements-v1"
    os.environ["YOLOX_GRAPHIC_ELEMENTS_INFER_PROTOCOL"] = "http"
    os.environ["YOLOX_TABLE_STRUCTURE_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-table-structure-v1"
    os.environ["YOLOX_TABLE_STRUCTURE_INFER_PROTOCOL"] = "http"
    os.environ["ENABLE_RERANKER"] = "false"
    os.environ["ENABLE_NV_INGEST_BATCH_MODE"] = "true"
    os.environ["USERID"] = str(os.getuid())

Deploy the NVIDIA RAG blueprint.

In [None]:
# Start the vector db containers from the repo root. 
try:
    result = subprocess.run(
            ["docker", "compose", "-f", "rag/deploy/compose/vectordb.yaml", "up", "-d"],
            env=os.environ,
            check=True,
            capture_output=True,
            text=True
    )
    print(result.stdout[-1000:], flush=True)
except subprocess.CalledProcessError as e:
    print(e.stderr)

In [None]:
# Start the ingestion containers from the repo root. This pulls the prebuilt containers from NGC and deploys it on your system.
try:
    result = subprocess.run(
            ["docker", "compose", "-f", "rag/deploy/compose/docker-compose-ingestor-server.yaml", "up", "-d"],
            env=os.environ,
            check=True,
            capture_output=True,
            text=True
    )
    print(result.stdout[-1000:], flush=True)
except subprocess.CalledProcessError as e:
    print(e.stderr)

In [None]:
# Start the rag containers from the repo root. This pulls the prebuilt containers from NGC and deploys it on your system.
try:
    result = subprocess.run(
            ["docker", "compose", "-f", "rag/deploy/compose/docker-compose-rag-server.yaml", "up", "-d"],
            env=os.environ,
            check=True,
            capture_output=True,
            text=True
    )
    print(result.stdout[-1000:], flush=True)
except subprocess.CalledProcessError as e:
    print(e.stderr)

Confirm all of the containers are running successfully:

In [None]:
#Confirm all the below mentioned containers are running.
result = subprocess.run(
    ["docker", "ps", "--format", "table {{.ID}}\t{{.Names}}\t{{.Status}}"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True,
)

print(result.stdout)


The outputs should look like this if deploying the ingestion NIMs locally: 

| Container ID | Name | Status |
|-------------|------|--------|
| 4fc3494e0646 |  rag-playground                 |  Up 21 seconds |
| 15177b220b11 |  rag-server                     |  Up 22 seconds |
| 0941f2dc6039 |  compose-nv-ingest-ms-runtime-1 |  Up 28 seconds (healthy) |
| a99de643c140 |  ingestor-server                |  Up 28 seconds |
| 4b29311ad214 |  compose-redis-1                |  Up 28 seconds |
| 7651a7c41bd0 |  milvus-standalone              |  Up 37 seconds |
| 32056205632b |  milvus-minio                   |  Up 37 seconds (healthy) |
| b1bc296b158a |  milvus-etcd                    |  Up 38 seconds (healthy) |
| f4d9c05425a3 |  compose-page-elements-1        |  Up 7 minutes|
| f83d24d482e9 |  compose-paddle-1               |  Up 7 minutes|
| 8c911e965fca |  compose-table-structure-1      |  Up 7 minutes|
| c1a5fac065a7 |  compose-graphic-elements-1     |  Up 7 minutes|
| acdeb9c53261 |  nemoretriever-embedding-ms     |  Up 7 minutes (healthy)|

Otherwise, the outputs should look like this:
| Container ID | Name | Status |
|-------------|------|--------|
| 4fc3494e0646 |  rag-playground                 |  Up 21 seconds |
| 15177b220b11 |  rag-server                     |  Up 22 seconds |
| 0941f2dc6039 |  compose-nv-ingest-ms-runtime-1 |  Up 28 seconds (healthy) |
| a99de643c140 |  ingestor-server                |  Up 28 seconds |
| 4b29311ad214 |  compose-redis-1                |  Up 28 seconds |
| 7651a7c41bd0 |  milvus-standalone              |  Up 37 seconds |
| 32056205632b |  milvus-minio                   |  Up 37 seconds (healthy) |
| b1bc296b158a |  milvus-etcd                    |  Up 38 seconds (healthy) |

At this point, you should be able to access the NVIDIA RAG frontend web application by visiting `http://<your-server-ip>:8090`.

<div class=\"alert alert-block alert-success\">
    <b>Tip:</b> If you are running this notebook as a Brev Launchable or on Brev, you will need to make sure the port for the RAG playground is accessible. On the settings page for your machine from which you launched the notebook, navigate to the "Access" tab among three tabs "Container Content Access", scroll down to "Using Ports", if "8090" is not already listed, enter "8090", click "Expose Port", and then click "I accept". You should see the link in the format "your-server-ip:8090" now under the section "Using Ports".

To test the RAG deployment:
- Navigate to the RAG frontend web application exposed on port 8090.
- On the left sidebar, click "New Collection".
- Select a PDF to upload. We recommend starting with the file `notebooks/simple.pdf` included in the blueprint repository.
- After the collection is created and the file is uploaded, select the collection by clicking on it in the left sidebar. 
- Ask a question in the chat like "What is the title?". Confirm that a response is given.

*If any of these steps fail, please consult the NVIDIA RAG blueprint [troubleshooting guide](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/troubleshooting.md) and the [Biomedical AI-Q Research Agent troubleshooting guide](https://github.com/NVIDIA-AI-Blueprints/biomedical-aiq-research-agent/blob/main/docs/troubleshooting.md) prior to proceeding further*. For problems creating a collection or uploading a file, you can view the logs of the ingestor-server by running `docker logs ingestor-server`. For problems asking a question, you can view the logs of the rag-server by running `docker logs rag-server`.


## Step 2: Deploy the Biomedical AI-Q Research Agent 

This NVIDIA blueprint allows you to create a Biomedical AI-Q Research Agent using NVIDIA NeMo Agent Toolkit, powered by NVIDIA NIM microservices.

The research agent allows you to:
- Provide a desired report structure and topic
- Provide human in the loop feedback on a research plan
- Perform parallel research of both unstructured on-premise data and web sources
- Perform Virtual Screening when researching a condition or disease when discovering novel small-molecule therapies is intended
- Update the draft report using Q&A 
- Q&A with the final report for further understanding
- View sources from both RAG and web search

The blueprint consists of a frontend web interface and a backend API service. To deploy Biomedical AI-Q Research Agent, follow the steps below in this section.

1. Clone the Git repository biomedical-aiq-research-agent

In [None]:

!git clone https://github.com/NVIDIA-AI-Blueprints/biomedical-aiq-research-agent.git
%cd biomedical-aiq-research-agent/

2. Configure the Virtual Screening NIMs to be utilized in the Biomedical AI-Q Research Agent Blueprint

You could choose to utilize the public NVIDIA AI Endpoints (option 1) for the BioNeMo NIMs needed for Virtual Screening, or deploy them locally (option 2). This notebook will default to the public NVIDIA AI Endpoints (option 1). For steps to locally deploy the BioNeMo NIMs (option 2), please see section `Deploy the BioNeMo NIMs for Virtual Screening in the Biomedical AI-Q Research Agent` in [docs/get-started/get-started-docker-compose.md](https://github.com/NVIDIA-AI-Blueprints/biomedical-aiq-research-agent/blob/main/docs/get-started/get-started-docker-compose.md#deploy-the-bionemo-nims-for-virtual-screening-in-the-biomedical-ai-q-research-agent).

Utilizing the public NVIDIA AI Endpoints for the BioNeMo NIMs requires a NVIDIA_API_KEY that has access to [MolMIM](https://build.nvidia.com/nvidia/molmim-generate) and [DiffDock](https://build.nvidia.com/mit/diffdock).

We will also want to set the MolMIM and DiffDock URLs to the public endpoints.

In [None]:
# enter your own NVIDIA API Key here
NVIDIA_API_KEY = "nvapi-your-api-key-here"
os.environ['NVIDIA_API_KEY'] = NVIDIA_API_KEY

os.environ["MOLMIM_ENDPOINT_URL"] = "https://health.api.nvidia.com/v1/biology/nvidia/molmim/generate" # public NVIDIA AI Endpoint 
os.environ["DIFFDOCK_ENDPOINT_URL"] = "https://health.api.nvidia.com/v1/biology/mit/diffdock" # public NVIDIA AI Endpoint 

3. Set the necessary environment variables for the service to use hosted NVIDIA NIM microservices.

In [None]:
# set to true if you want to use the publicly hosted NIMs on the NVIDIA AI Endpoints for the LLM NIMs
os.environ["AIRA_HOSTED_NIMS"] = "true"

# optional, if you want to use web search. Please visit https://www.tavily.com/ for API key and make sure you have enough credits.
os.environ["TAVILY_API_KEY"] = "tavily-api-key"

4. Deploy the Biomedical AI-Q Research Agent

In [None]:
#To deploy the Biomedical AI-Q Research Agent run:
try:
    result = subprocess.run(
            ["docker", "compose", "-f", "deploy/compose/docker-compose.yaml", 
            "--profile", "aira", "up", "-d", "--build"],
            env=os.environ,
            check=True,
            capture_output=True,
            text=True
    )
    print(result.stdout[-1000:], flush=True)
except subprocess.CalledProcessError as e:
        print("Failed to bring up aira profile: " + e.stderr)

Confirm the services have started successfully: 

In [None]:
result = subprocess.run(
    ["docker", "ps", "--format", "table {{.ID}}\t{{.Names}}\t{{.Status}}"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True,
)

print(result.stdout)

The list should include these three additional containers:
```bash
CONTAINER ID   NAMES                            STATUS
a1fbeb65efad   aira-nginx                       Up 51 seconds
75974fdadc1c   aira-backend                     Up 51 seconds
abdab7c24989   aira-frontend                    Up 51 seconds
...
```

You can access the Biomedical AI-Q Research Agent frontend web application at `http://<your-server-ip>:3001`. The backend API documentation at `http://<your-server-ip>:8051/docs`. **If any of the services failed to start, refer to the troubleshooting guide in the docs folder**.

<div class=\"alert alert-block alert-success\">
    <b>Tip:</b> If you are running this notebook as a Brev Launchable or on Brev, you will need to make sure the port for the Biomedical AI-Q Research Agent demo web frontend is accessible. On the settings page for your machine from which you launched the notebook, navigate to the "Access" tab among three tabs "Container Content Access", scroll down to "Using Ports", if "3001" is not already listed, enter "3001", click "Expose Port", and then click "I accept". You should see the link in the format "your-server-ip:3001" now under the section "Using Ports". To view the backend REST APIs, repeat these steps for port "8051".

## Step 3: Upload Default Collections
The demo web application includes two default report prompts. To support these prompts, the blueprint includes two example datasets. In this section we will upload the default datasets using a bulk upload helper. You can also upload your own files through the web interface.

Start by running the Docker upload utility. **Note: this command can take upwards of 30 minutes to execute.**

In [None]:
try:
    result = subprocess.run(
        ["docker", "run", "-e", "RAG_INGEST_URL=http://ingestor-server:8082/v1",
        "-e", "PYTHONUNBUFFERED=1",
        "-v", "/tmp:/tmp-data",
        "--network", "nvidia-rag",
        "nvcr.io/nvidia/blueprint/aira-load-files:v1.0.0"
        ],
        env=os.environ,
        check=True,
        capture_output=True,
        text=True
    )
    print(result.stdout[-1000:], flush=True)
except subprocess.CalledProcessError as e:
    print("Failed to load datasets: " + e.stderr)
    

At the end of the command, you should see a list of documents successfully uploaded for both the Financial_Dataset and the Biomedical_Dataset. You can also confirm the datasets were uploaded by visiting the web frontend and clicking on "Collections" in the left sidebar.

If any of the file upload steps failed, consult the [NVIDIA RAG blueprint troubleshooting guide](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/troubleshooting.md) and the [Biomedical AI-Q Research Agent troubleshooting guide](https://github.com/NVIDIA-AI-Blueprints/biomedical-aiq-research-agent/blob/main/docs/troubleshooting.md) prior to proceeding further. You can check the logs of the ingestor-server by running `docker logs ingestor-server` and the ingestion process by running `docker logs compose-nv-ingest-ms-runtime-1`.

## Step 4: Use the Biomedical AI-Q Research Agent

Follow the instructions in the [demo walkthrough](https://github.com/NVIDIA-AI-Blueprints/biomedical-aiq-research-agent/blob/main/demo/README.md) to explore the Biomedical AI-Q Research Agent.

## Step 5: Stop Services

To stop all services, run the following commands:

1. Stop the Biomedical AI-Q Research Agent services:
```bash
docker compose -f deploy/compose/docker-compose.yaml --profile aira down
```

2. Stop the RAG services:

First navigate to the rag repository's root. Ensure you still have the variable `NGC_API_KEY` exported.

```bash
docker compose -f deploy/compose/docker-compose-rag-server.yaml down
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml down
docker compose -f deploy/compose/vectordb.yaml down
docker compose -f deploy/compose/nims.yaml down
```
3. Remove the cache directories:
```bash
sudo rm -rf (path-to-rag)/deploy/compose/volumes
```

To verify all services have been stopped, run:
```bash
docker ps
```
