v2.3.0
Version 2.3.0 (2025-10-14)
This release adds RTX6000 platform support, adds deployment by using NIM operator, improves vector database pluggability with the blueprint, and other changes.
Added
- Support deploying the blueprint on RTX6000 platform.
- Migrated to
llama-3.3-nemotron-super-49b-v1.5as the default LLM model. - Added support to deploy the helm chart by using NVIDIA NIM operator. For details, refer to Deploy NVIDIA RAG Blueprint with NIM Operator.
- Updated all NIMs, NVIDIA Ingest and third party dependencies to latest versions.
- Refactoring to support custom 3rd party vector DB integration in a streamlined manner.
- Interactive notebook showcasing integration with library mode here.
- Added support for elasticsearch vector DB as an alternate to milvus.
- Added opt-in query decomposition support.
- Added opt-in nemoretriever-ocr support.
- Added opt-in VLM embedding support
- Custom metadata enhancments. Detailed doc here.
- Added support for more datatypes.
- Added opt-in support to generate filters using LLM yielding better accuracy.
- Added an interactive notebook showcasing new features.
- Added dependency check support for ingestor server /health API.
- Added support for configurable confidence threashold for retrieval from API layer.
- Added support to store NV-Ingest extraction results directly from the filesystem.
- Logging enhancements
- Added better latency data reporting for RAG server
- API level enhancements for component level latency
- Added dedicated Prometheus metric endpoint
- Added independent script to showcase batch ingestion
- Enabled support for GPU indexing with CPU search
- Exposed
APP_VECTORSTORE_EFas a configurable parameter
- Exposed
- Added environment variables to control llm parameters LLM_MAX_TOKENS, LLM_TEMPERATURE and LLM_TOP_P
- Added notebooks for showcasing RAG evaluation using common metrics
- Added unit tests and pre-commit hooks for maintaining code quality.
- Optimized container sizes by removing unnecessary packages and improving security.
Changed
- Migrated default LLM model for reflection to
llama-3.3-nemotron-super-49binstead ofmixtral-8x22b-instruct-v01. - Refactored rag-playground code
- Use React end to end. Next.js dependencies were deprecated.
- More developer friendly and intuitive look and feel.
rag-playgroundservice is renamed torag-frontend
- Refactored helm chart support
- Expanded and reorganized Helm chart configuration, enabling granular control over service components, resource settings, and observability (tracing, metrics).
- Introduced ConfigMap and service definitions to facilitate improved application deployment flexibility.
- Implemented refined service account and secret management in Helm templates.
- Added a new Helm values file for nim-operator to configure LLM model environment and component toggles.
Fixed
- Fixed support for long audio file ingestion.
- Fixed support to ingest images without charts/tables.
- Fixed requirement of rebuilding rag frontend container when LLM model name was changed.
Removed
- Removed consistency level configuration support for Milvus.
- Removed
EMBEDDING_NIM_ENDPOINTandEMBEDDING_NIM_MODEL_NAMEenvironment variables for nvingest. - Removed unused
ENABLE_MULTITURNenvironment variable from rag-server. - Removed
ENABLE_NEMOTRON_THINKINGenvironment variable from rag-server.