Conversation
… of NamedTuple types (being accessed as a BaseEnum member), updated string conversion patterns for consistency with BaseEnum classes across the codebase.
…latten imports - Removed obsolete license file for nodes_to_langs.json. - Updated imports in progress.py, state.py, tools.py, user_agent.py, and other files to use the correct modules for DataclassSerializationMixin and DATACLASS_CONFIG. - Removed exclude_args from ToolRegistrationDict and adjusted related code in tools.py. - Modified find_code_tool function to receive fastmcp context. - Enhanced embedding types with field title generation for better schema documentation. - Updated AgentTask classifications to include additional synonyms for tasks. - Improved code readability and maintainability by applying consistent formatting and annotations across various modules.
…mendations - Introduced `test-infrastructure-summary.md` detailing current test coverage, risks, and a 4-tier CI strategy to enhance testing efficiency. - Added `test_skip_xfail_analysis.md` to analyze skipped and xfail tests, identifying gaps in coverage and proposing immediate actions for improvement. - Updated `dataclasses.py` to improve serialization logic by using fields directly instead of pydantic fields. - Enhanced `tools.py` documentation to reflect the increase in supported programming languages from 160 to 166. - Modified `user_agent.py` to allow optional context parameter for the `find_code_tool` function. - Fixed health monitoring tests to align with the BaseEnum interface by replacing `.value` with `.variable`. - Added tests in `test_selector.py` to verify fallback configurations for chunkers. - Implemented idempotency test in `test_failover_tracker.py` to ensure unchanged files do not affect pending changes during re-indexing.
…s and initialization; add mock_only marker to config tests
…roved test execution and categorization
…──────────────────────�[0m
�[38;5;238m│ �[0m�[1mSTDIN�[0m
�[38;5;238m─────┼──────────────────────────────────────────────────────────────────────────�[0m
�[38;5;238m 1�[0m �[38;5;238m│�[0m �[38;5;231mrefactor(logging): Rename logging modules to _logging to fix namespace conflicts�[0m
�[38;5;238m 2�[0m �[38;5;238m│�[0m
�[38;5;238m 3�[0m �[38;5;238m│�[0m �[38;5;231mRenamed logging.py → _logging.py across multiple modules to resolve�[0m
�[38;5;238m 4�[0m �[38;5;238m│�[0m �[38;5;231m`import logging` namespace conflicts that were causing issues.�[0m
�[38;5;238m 5�[0m �[38;5;238m│�[0m
�[38;5;238m 6�[0m �[38;5;238m│�[0m �[38;5;231mChanges:�[0m
�[38;5;238m 7�[0m �[38;5;238m│�[0m �[38;5;231m- Renamed logging modules in common, config, chunker, watcher, server�[0m
�[38;5;238m 8�[0m �[38;5;238m│�[0m �[38;5;231m- Updated all imports to use _logging naming convention�[0m
�[38;5;238m 9�[0m �[38;5;238m│�[0m �[38;5;231m- Added test infrastructure for nightly and weekly test runs�[0m
�[38;5;238m 10�[0m �[38;5;238m│�[0m �[38;5;231m- Enhanced test documentation and coverage analysis�[0m
�[38;5;238m 11�[0m �[38;5;238m│�[0m �[38;5;231m- Updated lazy import validation�[0m
�[38;5;238m 12�[0m �[38;5;238m│�[0m �[38;5;231m- Cleaned up unused variables in test_publish_validation�[0m
�[38;5;238m 13�[0m �[38;5;238m│�[0m
�[38;5;238m 14�[0m �[38;5;238m│�[0m �[38;5;231mThis refactor maintains all existing functionality while fixing the�[0m
�[38;5;238m 15�[0m �[38;5;238m│�[0m �[38;5;231mnamespace collision issue that prevented proper access to Python's�[0m
�[38;5;238m 16�[0m �[38;5;238m│�[0m �[38;5;231mstandard logging module.�[0m
�[38;5;238m 17�[0m �[38;5;238m│�[0m
�[38;5;238m 18�[0m �[38;5;238m│�[0m �[38;5;231m🤖 Generated with [Claude Code](https://claude.com/claude-code)�[0m
�[38;5;238m 19�[0m �[38;5;238m│�[0m
�[38;5;238m 20�[0m �[38;5;238m│�[0m �[38;5;231mCo-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>�[0m
�[38;5;238m─────┴──────────────────────────────────────────────────────────────────────────�[0m
… enhanced -- identified unused/implemented structured logging system and implemented it
- Updated package name references from \codeweaver\ to \code-weaver\ in integration and smoke tests. - Relaxed performance test thresholds for memory persistence to accommodate WSL I/O overhead. - Changed Qdrant Docker image version from \latest\ to \v1.16.1\ for consistency and reliability. - Enhanced documentation in unit tests for embedding reconciliation, clarifying test organization and rationale. - Removed outdated integration tests for reconciliation exception handling, consolidating testing strategy.
…aths for SearchStrategy and StrategizedQuery - Updated import statements in multiple test files to reflect the new module structure for SearchStrategy and StrategizedQuery, moving from `codeweaver.agent_api.find_code.types` to `codeweaver.core.search_types`. - Ensured consistency across integration and unit tests by modifying the relevant import paths. - Added new unit tests for the DI container, including basic resolution, singleton behavior, nested resolution, overrides, lifespan management, and type hint resolution. - Updated the `uv.lock` file to include new packages: `code-weaver-daemon` and `code-weaver-tokenizers`, along with their dependencies and test configurations.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
…d independence in prep for phase 3 monorepo plan
- Removed old telemetry test files and added new structured tests for privacy serialization. - Implemented a mock model for testing privacy filters and ensured critical fields are handled correctly. - Updated the test container setup to use a context manager for dependency injection. - Increased the maximum chunks per file in performance settings for chunking tests. - Enhanced semantic deduplication tests to verify unique AST nodes and identifiers. - Fixed issues in indexer reconciliation tests by ensuring proper dependency injection. - Added a failures log for integration tests to track ongoing issues.
- Removed redundant network markers from integration tests. - Updated test functions to include `di_overrides` for better dependency management. - Simplified mock indexer setup in unit tests by leveraging existing fixtures. - Enhanced performance test threshold for CI stability. - Streamlined mock handling in `TestAddMissingEmbeddings`, `TestRemovePathWithDeletedFiles`, and `TestStalePointRemovalInIndexFile` classes.
…top-level, since all __init__ imports are lazy this doesn't add overhead, but greatly simplifies the monorepo moves and future refactoring
… psuedo-workspace package to prepare for phase 3; feat: added new client-oriented types and helpers for improved, and less complicated, client resolution
- Removed the capabilities module and its associated CLIENT_MAP. - Cleaned up imports in the __init__.py files across providers and embedding modules. - Eliminated unused types and classes from the embedding types module. - Updated dynamic imports to reflect the removal of obsolete components. - Streamlined the embedding registry to ensure proper initialization without circular dependencies.
…──────────────────────�[0m
�[38;5;238m│ �[0m�[1mSTDIN�[0m
�[38;5;238m─────┼──────────────────────────────────────────────────────────────────────────�[0m
�[38;5;238m 1�[0m �[38;5;238m│�[0m �[38;2;248;248;242mfeat(di): Add pydantic type resolution and circular dependency detection�[0m
�[38;5;238m 2�[0m �[38;5;238m│�[0m
�[38;5;238m 3�[0m �[38;5;238m│�[0m �[38;2;248;248;242m**Phase 1 DI Improvements - Part 1:**�[0m
�[38;5;238m 4�[0m �[38;5;238m│�[0m
�[38;5;238m 5�[0m �[38;5;238m│�[0m �[38;2;248;248;242m## Type Resolution Enhancement�[0m
�[38;5;238m 6�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Replace manual type resolution with pydantic's battle-tested utilities�[0m
�[38;5;238m 7�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Use `annotated_type()` for robust Annotated unwrapping�[0m
�[38;5;238m 8�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Use `get_function_type_hints()` for PEP 563, forward refs, Python 3.13+ support�[0m
�[38;5;238m 9�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Add `get_type_ref()` for stable cache keys (handles generics, type aliases)�[0m
�[38;5;238m 10�[0m �[38;5;238m│�[0m
�[38;5;238m 11�[0m �[38;5;238m│�[0m �[38;2;248;248;242m## Circular Dependency Detection�[0m
�[38;5;238m 12�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Add `_resolution_stack` parameter throughout dependency resolution chain�[0m
�[38;5;238m 13�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Create stable cache keys using pydantic's `get_type_ref()`�[0m
�[38;5;238m 14�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Detect and raise `CircularDependencyError` with full cycle path�[0m
�[38;5;238m 15�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Thread resolution stack through all injection methods�[0m
�[38;5;238m 16�[0m �[38;5;238m│�[0m
�[38;5;238m 17�[0m �[38;5;238m│�[0m �[38;2;248;248;242m## New DI Exception Hierarchy�[0m
�[38;5;238m 18�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- `DependencyInjectionError` - Base DI exception�[0m
�[38;5;238m 19�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- `CircularDependencyError` - Circular dependency detection with helpful suggestions�[0m
�[38;5;238m 20�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- `UnresolvableDependencyError` - Missing/invalid dependency with registration guidance�[0m
�[38;5;238m 21�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- `ScopeViolationError` - Scope hierarchy violations with explanation�[0m
�[38;5;238m 22�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- `DependencyResolutionError` - Aggregate multiple errors for better debugging�[0m
�[38;5;238m 23�[0m �[38;5;238m│�[0m
�[38;5;238m 24�[0m �[38;5;238m│�[0m �[38;2;248;248;242m**Files Changed:**�[0m
�[38;5;238m 25�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- src/codeweaver/di/container.py: Pydantic integration + circular detection�[0m
�[38;5;238m 26�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- src/codeweaver/core/exceptions.py: New DI exception hierarchy�[0m
�[38;5;238m 27�[0m �[38;5;238m│�[0m
�[38;5;238m 28�[0m �[38;5;238m│�[0m �[38;2;248;248;242m**Completed Tasks:**�[0m
�[38;5;238m 29�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- ✅ Phase 1.1: Replace type resolution with pydantic utilities�[0m
�[38;5;238m 30�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- ✅ Phase 1.2: Add circular dependency detection with stable cache keys�[0m
�[38;5;238m 31�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- ✅ Phase 1.8: Create new DI exception hierarchy�[0m
�[38;5;238m 32�[0m �[38;5;238m│�[0m
�[38;5;238m 33�[0m �[38;5;238m│�[0m �[38;2;248;248;242m**Next Steps:**�[0m
�[38;5;238m 34�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Phase 1.3: Generator/async generator context manager support�[0m
�[38;5;238m 35�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Phase 1.4: Scope lifecycle management�[0m
�[38;5;238m 36�[0m �[38;5;238m│�[0m �[38;2;248;248;242m- Phase 1.5-1.7: use_cache, error aggregation, union type resolution�[0m
�[38;5;238m 37�[0m �[38;5;238m│�[0m
�[38;5;238m 38�[0m �[38;5;238m│�[0m �[38;2;248;248;242mGenerated with [Claude Code](https://claude.com/claude-code)�[0m
�[38;5;238m 39�[0m �[38;5;238m│�[0m
�[38;5;238m 40�[0m �[38;5;238m│�[0m �[38;2;248;248;242mCo-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>�[0m
�[38;5;238m─────┴──────────────────────────────────────────────────────────────────────────�[0m
There was a problem hiding this comment.
Copilot reviewed 132 out of 804 changed files in this pull request and generated no comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Copilot reviewed 132 out of 804 changed files in this pull request and generated no comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…g sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
There was a problem hiding this comment.
Copilot reviewed 132 out of 804 changed files in this pull request and generated no comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Copilot reviewed 132 out of 805 changed files in this pull request and generated no comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Alpha 6 — "Pull the Band-Aid"
Overview
This is the combined Alpha 6 PR merging into
main. It's not a reviewable unit — it represents months of incremental work that's already been reviewed piecemeal. I'm not expecting a full review here, only a sanity check for integration issues or glaring concerns.This is a near-total refactor of CodeWeaver's core internals. With only a handful of users and real architectural debt accumulating, now was the right time to fix foundations before they became load-bearing problems.
Summary of Changes
Structure
Monorepo Readiness
In preparation for the Alpha 7 monorepo split, I realigned CodeWeaver's internal organization and significantly reduced cross-dependencies. The structure now maps cleanly to the planned package dependency ladder:
core/tokenizers/daemon→semantic→providers→engine→server(
cliremains unique — it can attach to any package with self-limiting functionality.)Major moves:
common, consolidating utilities and types intocoreand machinery intoenginediintocoreconfigpackage; each package now owns its own config for types relevant to it, with core types defined incore/config/core/agent,data,embedding,reranking, andvector_storesintoprovidersmcp,agent_api, andmiddlewareintoserverFederated Config
The config system needed to adapt based on which packages a user has installed. To support this:
pydantic_settings.BaseSettingssubclass incore.types.settings_model, following the dependency ladder:CodeWeaverCoreSettings→CodeWeaverProviderSettings→CodeWeaverEngineSettings→CodeWeaverSettingscore.config.loaderresolves the most complete available settings class at runtime and assigns it the role ofCodeWeaverSettings. In code, these are referenced via the type alias unionCodeWeaverSettingsType.DI — Fully Implemented
The DI system was the primary motivation for this refactor. Alpha 1-5's registry-based approach made it impossible to control state across the full pipeline — unit tests were fine, but meaningful integration testing wasn't. DI was the right fix.
The system is now fully implemented and modeled on patterns from libraries like FastAPI, with some improvements for usability. It handles:
Annotatedtypes, string forms, instances, data, sync and async context managersCall site pattern:
Registering a factory:
dependency_provideris syntactic sugar for importing theContainerdirectly. Factory functions can themselves take injected dependencies as arguments.Organizationally: core DI machinery lives in
codeweaver.core.di; dependency factories and type aliases live in each package'sdependenciesmodule or package. This lets each package define its own internal dependencies, resolved automatically at runtime.Test Fixtures — Complete Rewrite
I rewrote the majority of test fixtures from scratch to use the DI system natively — a big job, but necessary to get the state isolation the original motivation demanded. Most of the test suite was also updated to reflect new APIs and structure.
CodeWeaver now has ~1,000 unit tests and ~300 integration tests, all passing. Coverage is still below where I want it; that's a priority going forward. (Full coverage has never been a goal and isn't now.)
Provider Config — Ground-Up Rewrite
The Alpha 1-5 config system required translating between CodeWeaver's config objects and every SDK client, provider, and cross-provider scenario. That translation was nearly impossible to keep clean. The new system aligns the types directly:
ClientOptions— SDK client constructor configuration. If a setting can be passed to the client's constructor (e.g.AsyncOpenAI), it's available in the correspondingClientOptionssubclass.ProviderConfig— Category-and-provider aligned options for a provider's core methods (e.g. anembedcall for an embedding provider).Model capabilities are no longer hardcoded. Capabilities can now be dynamically registered and are optional — CodeWeaver trusts the user to supply correct model information. Given the pace of model releases, this prevents CodeWeaver from going stale between releases. You can use a new model on the day it ships.
Asymmetric Embedding Support
While this refactor was underway, Voyage AI released the
voyage-4series, which has a few notable properties:voyage-4-nano— a 300M parameter, Apache 2.0 licensed model (built on Qwen3) available on Hugging FaceI built native support for asymmetric model families via
AsymmetricEmbeddingProviderSettings, which takes a fullEmbeddingProviderSettingsobject for bothquery_configandembed_config. This means you can, for example, use FastEmbed or SentenceTransformers forvoyage-4-nanolocally while hitting Voyage's API forvoyage-4-largein the cloud — and they'll work against the same embeddings.voyage-4-nanoalso punches well above its weight on its own embeddings. For a model that runs on a consumer laptop, it outperforms most frontier models. Thevoyage-4family is now CodeWeaver's default for both cloud and local inference.Service Cards
The new
ServiceCardclass and registry incodeweaver.core.types.service_cardssimplifies SDK client and provider instantiation and forms the backbone ofproviders' dependency factories. It also supports dynamic provider and client registration at runtime.Combined with DI, provider implementations are now dramatically simpler. Previously, providers had complex __init__ logic for resolving config and setup. Now, providers receive their complete set of attributes upfront and register them as attributes. No factory logic in __init__.
Agent and Data Providers
Agent and data providers are now implemented as part of the providers refactor. Agents (internally called "context agents") will serve as resolution helpers; data providers are essentially their tools. Neither is wired up yet — that's Alpha 7.
Engine — Services and Management Separated
The previous engine had mixed concerns, complex stateful handling, and was hard to test and reason about. It's responsible for a lot: file discovery, AST and delimiter-based chunking, deduplication and invalidation, coordinating dense and sparse embedding, managing vector database state, and handling fallbacks for all of the above.
The new engine separates stateless services from stateful managers:
Services: chunking, config analysis (new — safely resolves configuration changes), failover, indexing, migration, reconciliation, snapshot, file watching
Managers: CheckpointManager, ManifestManager, ProgressTracker
Almost all core engine capabilities are now stateless. This also sets up a clean integration point for a new engine, likely from
knitli/recocoor theflowcrate inknitli/thread.Failover System — Complete Redesign
The Alpha 1-5 failover system had several problems:
Four Problems, Three Solutions
Breaking the problem down:
Problems:
Solutions:
Failover embedding and reranking models are no longer configurable. By preselecting models with sufficient context windows, the system always handles any chunk without deciding whether to generate one or two sets of chunks — eliminating the corresponding deduplication and collection resolution complexity.
Backup embeddings are generated with a lightweight local model only if the configured dense model is not local, does not support asymmetric embeddings (which would allow a local model to substitute), and backup is not disabled. Enabled by default when those conditions are met. Backup embeddings now live on the same point as primary embeddings — Qdrant supports arbitrary numbers of vectors per point, and with problem Fix linting errors and warnings #1 resolved, there's no risk of inconsistent data representation.
Cloud vector store fallback uses Qdrant's write-ahead logging and snapshots to maintain a local collection that's always available. If the cloud connection fails, the system picks up the local version and reconciles when the primary returns.
These two solutions activate independently. If your cloud embedding provider goes down but your vector store doesn't, only embedding failover activates.
Vector Store Simplification
With failover concerns removed from the vector store providers themselves, the primary/backup config and provider management complexity is gone. Resolving which embeddings are present is now a simple iteration over points: find missing vectors, ask the corresponding provider to fill the gap. This handles both the transition to backup and the return to primary cleanly.
Docs
The docs site was previously scaffolded but unreleased. It's ready to go live with Alpha 6 or shortly after. This included adding Griffe-based scripting for API doc generation.
** Related Issues**:
This PR closes or supersedes the following issues: