# Get Started With AI-Q NVIDIA Research Assistant Blueprint Using NVIDIA API

This notebook helps you get started with the [AI-Q Research Assistant](https://build.nvidia.com/nvidia/aiq).


## Prerequisites 

- This blueprint depends on the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag). This deployment guide starts by deploying RAG using docker compose, but you should refer to the [RAG Blueprint documentation](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/quickstart.md) for full details. 

- Docker Compose

- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

- (Optional) This blueprint supports Tavily web search to supplement data from RAG. A Tavily API key can be supplied to enable this function. 

- [NVIDIA API Key](https://build.nvidia.com) This notebook uses NVIDIA NIM microservices hosted on build.nvidia.com. To deploy the NIM microservices locally, follow the [getting started deployment guide](../docs/get-started/get-started-docker-compose.md).

### Hardware Requirements

This notebook uses NVIDIA NIM microservices hosted on build.nvidia.com for the majority of the services that require GPUs. 

To run this notebook requires:
-  1xL40S or comparable
-  50GB of disk space
-  16 CPUs

### NVIDIA NIM Microservices

Access  NVIDIA NIM microservices including:   
- NemoRetriever  
  - Page Elements  
  - Table Structure  
  - Graphic Elements  
  - Paddle OCR   
- Llama Instruct 3.3 70B  
- Llama Nemotron 3.3 Super 49B  


## Step 1: Deploy the RAG Blueprint

See the NVIDIA RAG blueprint documentation for full details. This notebook will use docker compose to deploy the RAG blueprint with *hosted NVIDIA NIM microservices*. Start by setting the appropriate environment variables.

In [65]:
#To pull images required by the blueprint from NGC, you must first authenticate Docker with nvcr.io.
import subprocess
import os

NVIDIA_API_KEY = "nvapi-your-api-key"
os.environ['NVIDIA_API_KEY'] = NVIDIA_API_KEY
os.environ['NGC_API_KEY'] = NVIDIA_API_KEY

cmd = f"echo {NVIDIA_API_KEY} | docker login nvcr.io -u '$oauthtoken' --password-stdin"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

Next, clone the NVIDIA RAG blueprint.

In [None]:
#Clone the github repository
!git clone https://github.com/NVIDIA-AI-Blueprints/rag.git

Add the necessary environment variables so that the RAG deployment will use hosted NVIDIA NIM microservices.

In [5]:
#Set the endpoint urls of the NIMs
os.environ["APP_LLM_MODELNAME"] = "nvidia/llama-3.3-nemotron-super-49b-v1"
os.environ["APP_EMBEDDINGS_MODELNAME"] = "nvidia/llama-3.2-nv-embedqa-1b-v2"
os.environ["APP_RANKING_MODELNAME"] = "nvidia/llama-3.2-nv-rerankqa-1b-v2"
os.environ["APP_EMBEDDINGS_SERVERURL"] = ""
os.environ["APP_LLM_SERVERURL"] = ""
os.environ["APP_RANKING_SERVERURL"] = ""
os.environ["EMBEDDING_NIM_ENDPOINT"] = "https://integrate.api.nvidia.com/v1"
os.environ["PADDLE_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/baidu/paddleocr"
os.environ["PADDLE_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-page-elements-v2"
os.environ["YOLOX_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_GRAPHIC_ELEMENTS_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-graphic-elements-v1"
os.environ["YOLOX_GRAPHIC_ELEMENTS_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_TABLE_STRUCTURE_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-table-structure-v1"
os.environ["YOLOX_TABLE_STRUCTURE_INFER_PROTOCOL"] = "http"

#Disable re-ranking
os.environ["ENABLE_RERANKER"] = "false"

Deploy the NVIDIA RAG blueprint.

In [None]:
#Start the vector db containers from the repo root.
!docker compose -f rag/deploy/compose/vectordb.yaml up -d

In [None]:
#Start the ingestion containers from the repo root. This pulls the prebuilt containers from NGC and deploys it on your system.
!docker compose -f rag/deploy/compose/docker-compose-ingestor-server.yaml up -d

In [None]:
#Start the rag containers from the repo root. This pulls the prebuilt containers from NGC and deploys it on your system.
!docker compose -f rag/deploy/compose/docker-compose-rag-server.yaml up -d

Confirm all of the containers are running successfully:

In [None]:
#Confirm all the below mentioned containers are running.
import subprocess

result = subprocess.run(
    ["docker", "ps", "--format", "table {{.ID}}\t{{.Names}}\t{{.Status}}"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True,
)

print(result.stdout)


The outputs should look like this: 

| Container ID | Name | Status |
|-------------|------|--------|
| bb4f15c42376 | rag-server | Up 2 hours |
| 6eb5373d0318 | compose-nv-ingest-ms-runtime-1 | Up 2 hours (healthy) |
| 8e53f676486e | ingestor-server | Up 2 hours |
| 355f3317a73a | milvus-standalone | Up 2 hours |
| b6620d59d4d3 | milvus-minio | Up 2 hours (healthy) |
| 0c266aaa1fb1 | milvus-etcd | Up 2 hours (healthy) |
| af09adfad86b | rag-playground | Up 2 hours |
| d4b7399ab07e | compose-redis-1 | Up 2 hours |

At this point, you should be able to access the NVIDIA RAG frontend web application by visiting `http://localhost:8090`.

<div class=\"alert alert-block alert-success\">
    <b>Tip:</b> If you are running this notebook on brev, you will need to make the port for the RAG playground accessible. On the settings page for your machine, navigate to "Using Ports", enter "8090", click "Expose Port", and then click "I accept". 

To test the RAG deployment:
- Navigate to the RAG frontend web application exposed on port 8090.
- On the left sidebar, click "New Collection".
- Select a PDF to upload. We recommend starting with the file `notebooks/simple.pdf` included in the blueprint repository.
- After the collection is created and the file is uploaded, select the collection by clicking on it in the left sidebar. 
- Ask a question in the chat like "What is the title?". Confirm that a response is given.

*If any of these steps fail, please consult the NVIDIA RAG blueprint [troubleshooting guide](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/troubleshooting.md) and the [AI-Q Research Assistant troubleshooting guide](../docs/troubleshooting.md) prior to proceeding further*. For problems creating a collection or uploading a file, you can view the logs of the ingestor-server by running `docker logs ingestor-server`. For problems asking a question, you can view the logs of the rag-server by running `docker logs rag-server`.


## Step 2: Deploy AI-Q NVIDIA  Research Assistant

This NVIDIA blueprint allows you to create an AI-Q Research Assistant using NVIDIA AI-Q Toolkit, powered by NVIDIA NIM microservices.

The research assistant allows you to:
- Provide a desired report structure and topic
- Provide human in the loop feedback on a research plan
- Perform parallel research of both unstructured on-premise data and web sources
- Update the draft report using Q&A 
- Q&A with the final report for further understanding
- View sources from both RAG and web search

The blueprint consists of a frontend web interface and a backend API service. To deploy AI-Q Research Assistant, follow the steps below in this section.

1. Clone the Git repository ai-research-assistant

In [None]:

!git clone https://github.com/NVIDIA-AI-Blueprints/aiq-research-assistant.git
%cd ai-research-assistant/

2. Set the necessary environment variables for the service to use hosted NVIDIA NIM microservices.

In [None]:
os.environ["AIRA_HOSTED_NIMS"] = "true"

# optional, if you want to use web search
os.environ["TAVILY_API_KEY"] = "tavily-api-key"

3. Deploy the AI-Q Research Assistant

In [None]:
#To deploy the AI-Q Research Assistant run:
!docker compose -f deploy/compose/docker-compose.yaml --profile aira up -d

Confirm the services have started successfully: 

In [None]:
!docker ps 

In addition to the RAG services from step 1, you should now also see:  
- `aira-backend`  
- `aira-frontend`  
- `aira-nginx`  

You can access the AI-Q Research Assistant frontend web application at `http://<your-server-ip>:3001`. The backend API documentation at `http://<your-server-ip>:8051/docs`. **If any of the services failed to start, refer to the troubleshooting guide in the docs folder**.

<div class=\"alert alert-block alert-success\">
    <b>Tip:</b> If you are running this notebook on brev, you will need to make the ports for the AI-Q Research Assistant demo web frontend accessible. On the settings page for your machine, navigate to "Using Ports", enter "3001", click "Expose Port", and then click "I accept". To view the backend REST APIs, repeat these steps for port "8051".

## Step 3: Upload Default Collections
The demo web application includes two default report prompts. To support these prompts, the blueprint includes two example datasets. In this section we will upload the default datasets using a bulk upload helper. You can also upload your own files through the web interface.

Start by running the Docker upload utility. **Note: this command can take upwards of 30 minutes to execute.**

In [None]:
!docker run \
  -e RAG_INGEST_URL=http://ingestor-server:8082/v1 \
  -e PYTHONUNBUFFERED=1 \
  -v /tmp:/tmp-data \
  --network nvidia-rag \
  nvcr.io/nvidia/blueprint/aira-load-files:v1.0.0

At the end of the command, you should see a list of documents successfully uploaded for both the Financial_Dataset and the Biomedical_Dataset. You can also confirm the datasets were uploaded by visiting the web frontend and clicking on "Collections" in the left sidebar.

If any of the file upload steps failed, consult the [NVIDIA RAG blueprint troubleshooting guide](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/troubleshooting.md) and the [AI-Q Research Assistant troubleshooting guide](../docs/troubleshooting.md) prior to proceeding further. You can check the logs of the ingestor-server by running `docker logs ingestor-server` and the ingestion process by running `docker logs compose-nv-ingest-ms-runtime-1`.

**Note: if you see 429 errors in the logs for the compose-nv-ingest-ms-runtime-1 service log it suggests a temporary error. You can re-run the file upload command multiple times, each time the process will pick up where it left off, uploading any documents that failed due to this error.**

## Step 4: Use the AI-Q Research Assistant

Follow the instructions in the [demo walkthrough](../demo/README.md) to explore the AI-Q Research Assistant.

## Step 5: Stop Services

To stop all services, run the following commands:

1. Stop the AI-Q Research Assistant services:
```bash
docker compose -f deploy/compose/docker-compose.yaml --profile aira down
```

2. Stop the RAG services:
```bash
docker compose -f rag/deploy/compose/docker-compose-rag-server.yaml down
docker compose -f rag/deploy/compose/docker-compose-ingestor-server.yaml down
docker compose -f rag/deploy/compose/vectordb.yaml down
```

3. Remove the cache directories:
```bash
rm -rf rag/deploy/compose/volumes
```

To verify all services have been stopped, run:
```bash
docker ps
```
