Skip to content

v2.4.0

Choose a tag to compare

@shubhadeepd shubhadeepd released this 20 Feb 19:14
· 100 commits to main since this release
19bb443

Release 2.4.0 (2026-02-20)

This release adds new features to the RAG pipeline for supporting agent workflows and enhances generations with VLMs augmenting multimodal input.

Highlights

This release contains the following key changes:

  • Updated NIMs and code to support NVIDIA Ingest 26.01 release.
  • Added support for non-NIM models including OpenAI, models hosted on AWS and Azure, OSS models, and others. Supported through service-specific API keys. For details, refer to Get an API Key.
  • The RAG Blueprint now uses nemoretriever-ocr-v1 as the default OCR model. For details, refer to NeMo Retriever OCR Configuration Guide.
  • Improved VLM based generation support. The Vision-Language Model (VLM) inference feature now uses the model nemotron-nano-12b-v2-vl. For details, refer to VLM for Generation.
  • User interface improvements including catalog display, image and text query, and others. For details, refer to User Interface.
  • Added ingestion metrics endpoint support with OpenTelemetry (OTEL) for monitoring document uploads, elements ingested, and pages processed. For details, refer to Observability.
  • Support image and text as input query. For details, refer to Multimodal Query Support.
  • Nemotron-3-Nano model support with reasoning budget. For details, refer to Enable Reasoning.
  • Vector Database enhancements including secure database access. For details, refer to Milvus Configuration and Elasticsearch Configuration.
  • You can now access RAG functionality from a Model Context Protocol (MCP) server for tool integration. For details, refer to MCP Server and Client Usage.
  • Added OpenAI-compatible search endpoint for integration with OpenAI tools. For details, refer to API - RAG Server Schema.
  • Added support for collection-level data catalog, descriptions, and metadata. For details, refer to Data Catalog.
  • Enhanced /status endpoint publishing ingestion metrics and status information. For details, refer to the ingestion notebook.
  • Multi-turn conversation support is no longer the default for either retrieval or generation stage in the pipeline. Refer to Multi-Turn Conversation Support for details.
  • Improved document processing and element extraction.
  • Enhancements to RAG library mode including the following. For details, refer to Use the NVIDIA RAG Blueprint Python Package.
    • Independent multi-instance support for the RAG Server and the ingestion server
    • Configuration support through function arguments
    • Async interface for RAG methods
    • Compatibility with the NVIDIA NeMo Agent Toolkit (NAT)
  • Summarization enhancements including the following. For details, refer to Document Summarization Customization Guide.
    • Shallow summarization support
    • Easy model switches and dedicated configurations
    • Ease of prompt changes
  • Reserved field names type, subtype, and location for NV-Ingest exclusive use in metadata schemas.
  • Added support for rag_library_lite_usage.ipynb which demonstrates containerless deployment of the NVIDIA RAG Python package in lite mode.
  • Added example showcasing NeMo Agent Toolkit integration with NVIDIA RAG.
  • Added weighted hybrid search support with configurable weights.
  • RAG server logging improvements

Fixed Known Issues

The following are the known issues that are fixed in this version:

  • Fixed issue in NIM LLM for automatic profile selection. For details, refer to Model Profiles.

Known limitations

The following are the known limitations in this version:

  • DRA support using NIM operator based helm chart is not available in this release.