v2.4.0
Release 2.4.0 (2026-02-20)
This release adds new features to the RAG pipeline for supporting agent workflows and enhances generations with VLMs augmenting multimodal input.
Highlights
This release contains the following key changes:
- Updated NIMs and code to support NVIDIA Ingest 26.01 release.
- Added support for non-NIM models including OpenAI, models hosted on AWS and Azure, OSS models, and others. Supported through service-specific API keys. For details, refer to Get an API Key.
- The RAG Blueprint now uses nemoretriever-ocr-v1 as the default OCR model. For details, refer to NeMo Retriever OCR Configuration Guide.
- Improved VLM based generation support. The Vision-Language Model (VLM) inference feature now uses the model nemotron-nano-12b-v2-vl. For details, refer to VLM for Generation.
- User interface improvements including catalog display, image and text query, and others. For details, refer to User Interface.
- Added ingestion metrics endpoint support with OpenTelemetry (OTEL) for monitoring document uploads, elements ingested, and pages processed. For details, refer to Observability.
- Support image and text as input query. For details, refer to Multimodal Query Support.
- Nemotron-3-Nano model support with reasoning budget. For details, refer to Enable Reasoning.
- Vector Database enhancements including secure database access. For details, refer to Milvus Configuration and Elasticsearch Configuration.
- You can now access RAG functionality from a Model Context Protocol (MCP) server for tool integration. For details, refer to MCP Server and Client Usage.
- Added OpenAI-compatible search endpoint for integration with OpenAI tools. For details, refer to API - RAG Server Schema.
- Added support for collection-level data catalog, descriptions, and metadata. For details, refer to Data Catalog.
- Enhanced
/statusendpoint publishing ingestion metrics and status information. For details, refer to the ingestion notebook. - Multi-turn conversation support is no longer the default for either retrieval or generation stage in the pipeline. Refer to Multi-Turn Conversation Support for details.
- Improved document processing and element extraction.
- Enhancements to RAG library mode including the following. For details, refer to Use the NVIDIA RAG Blueprint Python Package.
- Independent multi-instance support for the RAG Server and the ingestion server
- Configuration support through function arguments
- Async interface for RAG methods
- Compatibility with the NVIDIA NeMo Agent Toolkit (NAT)
- Summarization enhancements including the following. For details, refer to Document Summarization Customization Guide.
- Shallow summarization support
- Easy model switches and dedicated configurations
- Ease of prompt changes
- Reserved field names
type,subtype, andlocationfor NV-Ingest exclusive use in metadata schemas. - Added support for rag_library_lite_usage.ipynb which demonstrates containerless deployment of the NVIDIA RAG Python package in lite mode.
- Added example showcasing NeMo Agent Toolkit integration with NVIDIA RAG.
- Added weighted hybrid search support with configurable weights.
- RAG server logging improvements
Fixed Known Issues
The following are the known issues that are fixed in this version:
- Fixed issue in NIM LLM for automatic profile selection. For details, refer to Model Profiles.
Known limitations
The following are the known limitations in this version:
- DRA support using NIM operator based helm chart is not available in this release.