Release v2.4.0 · NVIDIA-AI-Blueprints/rag

Release 2.4.0 (2026-02-20)

This release adds new features to the RAG pipeline for supporting agent workflows and enhances generations with VLMs augmenting multimodal input.

Highlights

This release contains the following key changes:

Updated NIMs and code to support NVIDIA Ingest 26.01 release.
Added support for non-NIM models including OpenAI, models hosted on AWS and Azure, OSS models, and others. Supported through service-specific API keys. For details, refer to Get an API Key.
The RAG Blueprint now uses nemoretriever-ocr-v1 as the default OCR model. For details, refer to NeMo Retriever OCR Configuration Guide.
Improved VLM based generation support. The Vision-Language Model (VLM) inference feature now uses the model nemotron-nano-12b-v2-vl. For details, refer to VLM for Generation.
User interface improvements including catalog display, image and text query, and others. For details, refer to User Interface.
Added ingestion metrics endpoint support with OpenTelemetry (OTEL) for monitoring document uploads, elements ingested, and pages processed. For details, refer to Observability.
Support image and text as input query. For details, refer to Multimodal Query Support.
Nemotron-3-Nano model support with reasoning budget. For details, refer to Enable Reasoning.
Vector Database enhancements including secure database access. For details, refer to Milvus Configuration and Elasticsearch Configuration.
You can now access RAG functionality from a Model Context Protocol (MCP) server for tool integration. For details, refer to MCP Server and Client Usage.
Added OpenAI-compatible search endpoint for integration with OpenAI tools. For details, refer to API - RAG Server Schema.
Added support for collection-level data catalog, descriptions, and metadata. For details, refer to Data Catalog.
Enhanced /status endpoint publishing ingestion metrics and status information. For details, refer to the ingestion notebook.
Multi-turn conversation support is no longer the default for either retrieval or generation stage in the pipeline. Refer to Multi-Turn Conversation Support for details.
Improved document processing and element extraction.
Enhancements to RAG library mode including the following. For details, refer to Use the NVIDIA RAG Blueprint Python Package.
- Independent multi-instance support for the RAG Server and the ingestion server
- Configuration support through function arguments
- Async interface for RAG methods
- Compatibility with the NVIDIA NeMo Agent Toolkit (NAT)
Summarization enhancements including the following. For details, refer to Document Summarization Customization Guide.
- Shallow summarization support
- Easy model switches and dedicated configurations
- Ease of prompt changes
Reserved field names type, subtype, and location for NV-Ingest exclusive use in metadata schemas.
Added support for rag_library_lite_usage.ipynb which demonstrates containerless deployment of the NVIDIA RAG Python package in lite mode.
Added example showcasing NeMo Agent Toolkit integration with NVIDIA RAG.
Added weighted hybrid search support with configurable weights.
RAG server logging improvements

Fixed Known Issues

The following are the known issues that are fixed in this version:

Fixed issue in NIM LLM for automatic profile selection. For details, refer to Model Profiles.

Known limitations

The following are the known limitations in this version:

DRA support using NIM operator based helm chart is not available in this release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.4.0

Choose a tag to compare

Sorry, something went wrong.