Skip to content

Feat/deterministic aggregation#28

Merged
nadeem4 merged 10 commits into
mainfrom
feat/deterministic-aggregation
Jan 30, 2026
Merged

Feat/deterministic aggregation#28
nadeem4 merged 10 commits into
mainfrom
feat/deterministic-aggregation

Conversation

@nadeem4
Copy link
Copy Markdown
Owner

@nadeem4 nadeem4 commented Jan 30, 2026

This pull request contains a mix of documentation updates, configuration changes, and terminology clarifications. The most significant changes improve the clarity and accuracy of adapter development documentation, update the LLM agent configuration, and introduce new documentation for executor artifacts and schema store configuration. Below are the most important changes grouped by theme.

Adapter Protocol & Documentation Updates:

  • Standardized terminology across documentation to reference the "core adapter protocol" and DatasourceAdapterProtocol instead of the previous "SDK" or DatasourceAdapter, with updated examples and compliance instructions in docs/adapters/development.md and docs/adapters/sdk.md [1] [2] [3] [4].
  • Updated references from "Adapter SDK Reference" to "Adapter Interface Reference" and improved example code to use DatasourceAdapterProtocol and related contracts [1] [2].

LLM Agent Configuration:

  • Changed the agent name from intent_validator to indexing_enrichment and updated its model to gpt-5.2 in both configs/llm.yaml and configs/llm.demo.yaml [1] [2].

Executor Artifacts & Storage Configuration:

  • Added new documentation describing how executors write scan results as Parquet artifacts, including artifact path templates, supported backends (local, S3, ADLS), and relevant environment variables for configuration in docs/core/artifacts.md.
  • Documented new schema store configuration options (SCHEMA_STORE_BACKEND, SCHEMA_STORE_PATH) in docs/ops/configuration.md.

Terminology & Service Renaming:

  • Renamed OrchestratorVectorStore to VectorStore in documentation and code references for consistency [1] [2] [3].
  • Clarified terminology in security documentation, referring to the intent validator generically rather than as a node.

Cleanup & Audit Documentation:

  • Removed obsolete audit remediation and observability plan markdown files, consolidating their content elsewhere [1] [2].

Let me know if you want to discuss any of these changes in more detail!

…ate management, various processing nodes, RBAC, and CLI execution.
…node responses

- Deleted outdated audit documentation files: architecture_remediation.md, remediation_plan_observability.md, and remediation_plan.md.
- Updated llm.yaml configuration to remove unused model settings.
- Enhanced response schemas for various pipeline nodes (e.g., AnswerSynthesizerNode, ExecutorNode, and others) to include error handling and reasoning fields.
- Introduced new nodes for answer synthesis and schema retrieval in the pipeline.
- Improved overall structure and clarity of the codebase by consolidating and organizing node responses.
- Introduced a new pytest.ini file to specify test paths for core, adapters, and SQLAlchemy tests.
- Added documentation for executor artifacts, detailing configuration options and usage.
- Created test configuration files for SQLAlchemy integration and unit tests, improving test organization.
- Removed outdated compliance tests for MSSQL, MySQL, Postgres, and SQLite adapters to streamline the test suite.
- Implemented new test cases for artifact handling and execution layers, ensuring robust coverage of the new features.
- Added new configuration for indexing enrichment in llm.demo.yaml and llm.yaml.
- Implemented SchemaEnrichment and related classes in enrichment_service.py to facilitate schema metadata enrichment.
- Updated SchemaChunkBuilder to include column names in the generated schema chunks.
- Enhanced the IndexingOrchestrator to utilize the new enrichment functionality during schema snapshot registration.
- Introduced new methods for retrieving column candidates in vector_store.py to improve schema retrieval.
- Added unit tests for the enrichment service and models to ensure functionality and correctness.
- Added new configuration options for schema store backend and path in configuration.md.
- Updated VectorStore initialization to include collection name and improved error handling for missing settings.
- Refactored NL2SQLContext to enforce required settings for vector store and schema store.
- Introduced SQLite schema store implementation and in-memory store for better schema management.
- Enhanced error handling in the AggregationService and other pipeline nodes to improve robustness.
- Added integration tests for schema retrieval and validation processes.
…dling

- Introduced cancellation functionality using threading events to allow users to cancel long-running operations.
- Updated various pipeline nodes (e.g., ExecutorNode, SQLExecutorService) to handle cancellation gracefully and return appropriate error messages.
- Enhanced error handling in the pipeline to include cancellation errors, improving user experience during execution.
- Added new configuration options for SQL agent retries and delays to manage execution flow more effectively.
- Refactored graph building and routing logic to integrate cancellation checks, ensuring responsiveness during execution.
- Added pytest markers for end-to-end tests to facilitate better test organization and execution.
…eline

- Enhanced artifact handling by introducing a unified method for creating artifact references across different storage backends (S3, ADLS, Local).
- Refactored the execution contracts to replace the deprecated ExecutorBaseModel with ExecutorResponse, improving clarity and consistency.
- Updated the ExecutorNode to include tenant_id in requests, facilitating multi-tenant support.
- Improved error handling and logging across various pipeline nodes, ensuring better traceability and debugging capabilities.
- Removed unused schema management methods and streamlined the datasource resolution process for improved performance.
- Removed deprecated Column and Table classes from the schema module, introducing ColumnRef for better reference management.
- Updated the SchemaRetrieverNode to build tables from schema snapshots, incorporating relationships and metadata.
- Enhanced the LogicalValidatorNode to enforce join relationships and validate against column statistics.
- Improved error handling and logging in various pipeline nodes for better traceability.
- Refactored imports across the codebase to utilize the new schema structure, ensuring consistency and clarity.
@nadeem4 nadeem4 merged commit 4238067 into main Jan 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant