Release v2.3.0 · NVIDIA-AI-Blueprints/rag

Version 2.3.0 (2025-10-14)

This release adds RTX6000 platform support, adds deployment by using NIM operator, improves vector database pluggability with the blueprint, and other changes.

Added

Support deploying the blueprint on RTX6000 platform.
Migrated to llama-3.3-nemotron-super-49b-v1.5 as the default LLM model.
Added support to deploy the helm chart by using NVIDIA NIM operator. For details, refer to Deploy NVIDIA RAG Blueprint with NIM Operator.
Updated all NIMs, NVIDIA Ingest and third party dependencies to latest versions.
Refactoring to support custom 3rd party vector DB integration in a streamlined manner.
- Interactive notebook showcasing integration with library mode here.
Added support for elasticsearch vector DB as an alternate to milvus.
Added opt-in query decomposition support.
Added opt-in nemoretriever-ocr support.
Added opt-in VLM embedding support
Custom metadata enhancments. Detailed doc here.
- Added support for more datatypes.
- Added opt-in support to generate filters using LLM yielding better accuracy.
- Added an interactive notebook showcasing new features.
Added dependency check support for ingestor server /health API.
Added support for configurable confidence threashold for retrieval from API layer.
Added support to store NV-Ingest extraction results directly from the filesystem.
Logging enhancements
Added better latency data reporting for RAG server
- API level enhancements for component level latency
- Added dedicated Prometheus metric endpoint
Added independent script to showcase batch ingestion
Enabled support for GPU indexing with CPU search
- Exposed APP_VECTORSTORE_EF as a configurable parameter
Added environment variables to control llm parameters LLM_MAX_TOKENS, LLM_TEMPERATURE and LLM_TOP_P
Added notebooks for showcasing RAG evaluation using common metrics
- Notebook 1 - evaluation using RAGAS
- Notebook 2 - Recall calculation
Added unit tests and pre-commit hooks for maintaining code quality.
Optimized container sizes by removing unnecessary packages and improving security.

Changed

Migrated default LLM model for reflection to llama-3.3-nemotron-super-49b instead of mixtral-8x22b-instruct-v01.
Refactored rag-playground code
- Use React end to end. Next.js dependencies were deprecated.
- More developer friendly and intuitive look and feel.
- rag-playground service is renamed to rag-frontend
Refactored helm chart support
- Expanded and reorganized Helm chart configuration, enabling granular control over service components, resource settings, and observability (tracing, metrics).
- Introduced ConfigMap and service definitions to facilitate improved application deployment flexibility.
- Implemented refined service account and secret management in Helm templates.
- Added a new Helm values file for nim-operator to configure LLM model environment and component toggles.

Fixed

Fixed support for long audio file ingestion.
Fixed support to ingest images without charts/tables.
Fixed requirement of rebuilding rag frontend container when LLM model name was changed.

Removed

Removed consistency level configuration support for Milvus.
Removed EMBEDDING_NIM_ENDPOINT and EMBEDDING_NIM_MODEL_NAME environment variables for nvingest.
Removed unused ENABLE_MULTITURN environment variable from rag-server.
Removed ENABLE_NEMOTRON_THINKING environment variable from rag-server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Version 2.3.0 (2025-10-14)

Added

Changed

Fixed

Removed

Uh oh!