Skip to content

chore(deps): Bump colored from 2.2.0 to 3.0.0#147

Closed
dependabot[bot] wants to merge 625 commits into
mainfrom
dependabot/cargo/colored-3.0.0
Closed

chore(deps): Bump colored from 2.2.0 to 3.0.0#147
dependabot[bot] wants to merge 625 commits into
mainfrom
dependabot/cargo/colored-3.0.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Jan 5, 2026

Bumps colored from 2.2.0 to 3.0.0.

Release notes

Sourced from colored's releases.

v3.0.0

  • [BREAKING CHANGE]: Upgrade MSRV to 1.80 and remove the then unnecessary lazy_static dependency.
Changelog

Sourced from colored's changelog.

3.0.0

  • [BREAKING CHANGE]: Upgrade MSRV to 1.80 and remove the then unnecessary lazy_static dependency.
Commits

Dependabot compatibility score

You can trigger a rebase of this PR by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note
Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

noahgift and others added 30 commits November 26, 2025 18:56
…74)

Documents the new synthetic data generation module with:
- SyntheticConfig parameters and builder pattern
- Generation strategies (EDA, BackTranslation, MixUp, etc.)
- Real aprender-shell augment output demonstration
- Before/After comparison showing +98% commands, +61% n-grams
- DiversityMonitor and QualityDegradationDetector usage
- Type-safe SyntheticParam integration with SearchSpace

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
#74)

EXTREME TDD fixes for usability issues:

1. Filter corrupted commands (e.g., "git commit-m") at both:
   - History parsing (is_valid_command, has_corrupted_tokens)
   - Suggestion generation (is_corrupted_token filter)

2. Better partial token handling:
   - "git c" now suggests "git commit", "git checkout"
   - N-gram prediction with partial token filtering

3. Filter malformed multiline artifacts from history

4. Add assert_cmd CLI integration tests (18 tests):
   - Help/version, train, suggest, stats, validate
   - Augment with synthetic data
   - Export/import roundtrip
   - Error handling
   - Latency test (<500ms)

Results:
- Latency: 215ms → 144ms (33% improvement)
- No more corrupted suggestions
- 41 total tests (23 unit + 18 CLI integration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Port optimizations from bashrs Makefile:

1. test-fast: Uses cargo-nextest for parallel test execution
   - Falls back to cargo test if nextest not installed
   - Time reduced from ~30s to ~5s

2. coverage: Two-phase pattern with mold linker workaround
   - Phase 1: Run tests with instrumentation (no report)
   - Phase 2: Generate HTML/LCOV reports
   - CRITICAL: Temporarily moves ~/.cargo/config.toml
     (mold linker breaks LLVM coverage instrumentation)
   - Auto-installs cargo-llvm-cov and cargo-nextest if missing

3. Added coverage-open target for convenience

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comprehensive spec for embedding ML models in binaries with smart memory paging:

- APR binary format: Page-aligned tensors with lazy loading
- Three embedding strategies: include_bytes!, linker sections, external file
- Memory paging: mmap with OnceCell lazy initialization
- Predictive prefetching: Background thread for anticipated weights
- ALM integration: Bundle datasets alongside models
- 10 annotated peer-reviewed papers (ACL 2024, SOSP 2023, MLSys 2021/2023)

Implementation roadmap: Binary embedding → Lazy loading → Prefetching → ALM

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…#74)

Resolves all 3 action items from Gemini review (Toyota/NASA/Startup personas):

[NASA] Sandbox V&V for Code Translation:
- Added SandboxExecutor to CodeTranslationGenerator
- quality_score() now tests functional correctness (40% weight)
- Addresses Codex hallucination issue (compiles != correct)

[Toyota] Andon Mechanism (Jidoka):
- Added AndonHandler trait with DefaultAndon implementation
- Halts pipeline if rejection rate >90%
- Alerts on quality drift below baseline

[Startup] Decoupled Roadmap:
- Shell SLM: v0.14.0 (MVP - tractable structured prediction)
- Code Oracle: v0.15.0 (experimental - AI-Complete)
- Added EXPERIMENTAL warning to CodeTranslationGenerator

Updated risk matrix with 3 new mitigations.
Spec version bumped to 1.1.0.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
#74)

Adds Toyota Jidoka-inspired Andon system for synthetic data generation:

EXTREME TDD Implementation:
- 37 new tests (30 andon + 7 config integration)
- RED phase: Write failing tests first
- GREEN phase: Implement to pass tests
- All 1859 tests passing

New Components:
- AndonHandler trait: Customizable event handling
- AndonEvent enum: HighRejectionRate, QualityDrift, DiversityCollapse
- AndonSeverity: Info/Warning/Critical levels
- DefaultAndon: Production handler (logs + halts on critical)
- TestAndon: Silent collector for unit tests
- AndonConfig: Configuration with thresholds

SyntheticConfig Integration:
- Added andon field with AndonConfig
- Builder methods: with_andon(), with_andon_enabled(), with_andon_rejection_threshold()
- Default: enabled=true, rejection_threshold=0.90 (Toyota standard)

Pipeline Integration:
- check_andon() function validates generation quality
- Halts on >92% rejection rate (threshold + 2% tolerance)
- Warns on diversity collapse (< minimum threshold)

Addresses review feedback from automl-with-synthetic-data-review.md:
- [Toyota] Andon alert for high rejection rates ✓

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Phase 2 of AutoML with Synthetic Data specification:

EDA Generator (Wei & Zou, 2019):
- Synonym replacement with shell command vocabulary
- Random insertion, swap, and deletion operations
- Deterministic LCG-based randomness for reproducibility
- Jaccard similarity for quality scoring
- 34 unit tests with EXTREME TDD

Template Generator:
- Slot-based pattern filling with weighted templates
- shell_commands() preset for CLI training data
- Diversity scoring via unique token ratio
- 24 unit tests with EXTREME TDD

Both implement SyntheticGenerator trait for pipeline integration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Phase 3 of AutoML with Synthetic Data specification:

ShellSample struct:
- Command with context (history, cwd, prefix, completion)
- Extraction helpers (command_name, arguments)
- Completion validity checking

ShellGrammar:
- Command/subcommand validation (git, cargo, npm, docker, Unix)
- Common options recognition
- Extensible via add_command/add_subcommands

ShellSyntheticGenerator implementing SyntheticGenerator:
- Template substitution (argument variants)
- Argument permutation (reorder/add options)
- Context variation (cwd, history)
- Quality scoring: 0.4*semantic + 0.4*grammar + 0.2*coherence
- Diversity scoring via unique command patterns

42 tests with Extreme TDD methodology.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…efs #74)

Implement three advanced synthetic data generation components:

- MixUp generator: Zhang et al. 2018 embedding interpolation with Beta
  distribution sampling and configurable alpha parameter (24 tests)
- WeakSupervision generator: Snorkel-style programmatic labeling with
  LabelingFunction trait, multiple aggregation strategies (MajorityVote,
  WeightedVote, Unanimous, Any), and built-in LFs (29 tests)
- SyntheticCache: LRU eviction memoization for avoiding redundant
  generation during AutoML hyperparameter search (18 tests)

Total: 71 new tests, 2030 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive model bundling and memory paging support:

## Model Bundling (.apbundle format)
- Binary format with magic bytes, version, and manifest
- BundleReader/BundleWriter for efficient file I/O
- ModelBundle API for creating, saving, and loading bundles
- Builder pattern for flexible bundle construction
- Support for multiple models with metadata

## Memory-Mapped File Support
- MappedRegion for efficient memory access
- MemoryMappedFile with region caching
- PageTable for LRU/LFU tracking

## LRU Paging
- PagedBundle for memory-constrained environments
- Configurable max_memory and eviction strategies
- LRU (Least Recently Used) and LFU (Least Frequently Used) eviction
- Automatic page eviction when memory limit exceeded

## Pre-fetching
- Access pattern tracking for predictive loading
- Configurable prefetch_count
- Hint API for explicit prefetch requests

## Also included:
- Synthetic data integration tests (15 tests)
- Synthetic data generation example
- Updated spec status to "Implemented (Phases 1-4)"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…74)

Update spec status to reflect complete implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add PagedMarkovModel using aprender's bundle module for memory-efficient storage
- Implement LRU-based on-demand segment loading
- Add --memory-limit CLI flag to train, suggest, and stats commands
- Add 13 comprehensive tests for paged model functionality
- Fix doctest in synthetic/mixup.rs (missing Clone derive)

The paged model stores n-gram segments separately and loads them
on-demand, enabling handling of shell histories that exceed RAM.

Refs #74

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add comprehensive case study for bundle module
- Update shell-completion chapter with paging documentation
- Add bundle_trace_demo example for renacer tracing
- Update SUMMARY.md with new chapter

Refs #74

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive guide for using renacer syscall tracer to profile
and optimize memory paging behavior in ML model loading.

Content includes:
- Renacer usage patterns (-e trace=file, -T, -c, -s flags)
- Syscall analysis for detecting evictions and cache misses
- Pre-fetch effectiveness measurement
- JSON output for programmatic analysis
- Optimization patterns (reduce seeks, right-size memory, pre-fetching)
- Troubleshooting guide with symptom/fix table

Also adds book chapters for bundle_trace_demo and synthetic_data_generation
examples to satisfy EXTREME TDD requirements.

Allows clippy::large_stack_arrays lint for ML test data arrays.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…77)

Implements two new synthetic data components for code analysis:

CodeEDA (GH-76):
- Code-specific EDA (Easy Data Augmentation) implementing SyntheticGenerator
- Variable renaming with synonym dictionary
- Comment insertion (Rust/Python/Generic modes)
- Statement reordering for independent statements
- Dead code removal (comments and whitespace)
- Quality scoring via token overlap
- 23 unit tests

CodeFeatureExtractor (GH-77):
- 8-dimensional commit feature extraction for defect prediction
- CommitFeatures: defect_category, files_changed, lines_added/deleted,
  complexity_delta, timestamp, hour_of_day, day_of_week
- Keyword-based commit classification (bug/security/perf/refactor)
- Batch extraction and normalization support
- 22 unit tests

References:
- Wei & Zou (2019) EDA paper
- D'Ambros et al. (2012) defect prediction benchmark

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…76, Refs #77)

- Add --use-code-eda flag to Augment command for code-aware augmentation
- Add new Analyze command using CodeFeatureExtractor
  - Shows command categories (bug/security/performance/refactor/general)
  - Displays top base commands with visual bar charts
  - Shows sample commands by category
  - Reports complexity metrics (avg tokens, max tokens, unique bases)
  - Identifies developer workflow (git, cargo, npm, docker usage)
- Add 3 integration tests for new features

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…Refs #74)

Benchmarks (modeled after bashrs patterns):
- parse_history: History file parsing throughput
- train_model: N-gram model training (small/medium/large fixtures)
- suggest_latency: Suggestion performance for common prefixes
- partial_completion: Partial token completion benchmarks
- serialization: JSON and file save/load benchmarks
- end_to_end: Complete workflow benchmarks
- synthetic_generation: CodeEDA augmentation benchmarks

Fixtures (aligned with bashrs):
- small_history.txt: ~50 commands (basic developer workflow)
- medium_history.txt: ~265 commands (full developer workflow)
- large_history.txt: ~3800 commands (production scale)

Real-world tests (19 new tests):
- REAL_001-003: Small/Medium/Large history training and suggestions
- REAL_004: Cross-validation testing
- REAL_005: Data augmentation with CodeEDA
- REAL_006: Analysis command testing
- REAL_007: Export/import roundtrip
- REAL_008: Paged model for large histories
- REAL_009: Incremental updates
- REAL_010: End-to-end user workflow

Architecture changes:
- Added lib.rs to expose modules for benchmarks
- Refactored main.rs to use library imports

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…rks (Refs #74)

Sub-10ms Verification Benchmark Suite:

Performance Results (vs 10ms target):
- Small model (50 cmds):  437ns - 1.5µs (6,500-22,000x faster)
- Medium model (500 cmds): 530ns - 10.6µs (940-18,800x faster)
- Large model (5000 cmds): 670ns - 15µs (660-14,900x faster)

Benchmark Groups:
- suggestion_latency: Core latency verification by model size
- partial_completion: Mid-word completion (git co → git commit)
- training_throughput: Commands/second during training
- cold_start: Model load + first suggestion latency
- serialization: JSON serialize/deserialize performance
- scalability: Latency growth with model size (O(1) verified)
- paged_model: Memory-constrained model performance

Industry Comparison:
- GitHub Copilot: 100-500ms → aprender 10,000-50,000x faster
- Fish completion: 5-20ms → aprender 500-2,000x faster
- Zsh compinit: 10-50ms → aprender 1,000-5,000x faster

Run: cargo bench --package aprender-shell --bench recommendation_latency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
#74)

Updated shell-completion.md:
- Added "Performance: Sub-10ms Verification" section
- Detailed benchmark results table (437ns - 14.6µs latency)
- Industry comparison (600-22,000x faster than alternatives)
- "Why So Fast?" explanation (O(1) trie, no neural overhead)
- Benchmark suite overview

New chapter: shell-completion-benchmarks.md
- Comprehensive benchmark analysis
- trueno-style criterion patterns
- Scalability analysis (sub-linear O(log n))
- Training throughput metrics
- Cold start verification (<3ms)
- Fixture design documentation
- Custom benchmark extension guide
- CI integration example

Key results documented:
- Worst case: 14.6 µs (685x under 10ms target)
- Best case: 437 ns (22,883x under 10ms target)
- Scales sub-linearly with model size

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add dedicated book chapters for the new code-aware synthetic data modules:

- CodeEDA: Syntax-aware data augmentation for source code
  - Variable renaming, comment insertion, statement reorder
  - Language-specific reserved keyword handling (Rust, Python)
  - Quality and diversity metrics

- CodeFeatureExtractor: 8-dimensional commit feature extraction
  - Defect category classification (bug, security, perf, refactor)
  - Complexity estimation, time-based features
  - Normalization for ML pipelines

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Change alimentar from local path dependency to crates.io v0.1.0
for publishing compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Change aprender dependency from path to crates.io v0.10.0
- Add README.md for crate documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Metaheuristics (Refs #80)
- Add src/metaheuristics/ module with Differential Evolution (DE)
- SearchSpace enum for continuous/discrete/mixed optimization
- ComputeBudget for resource-aware optimization
- PerturbativeMetaheuristic trait following Toyota Way principles
- Book documentation for DE and metaheuristics fundamentals

## aprender-shell Enhancements (Refs #87, #88, #96)
- Fish shell widget support (fish-widget command)
- Uninstall command for clean widget removal
- ZSH widget v2 with toggle, timeout, ShellCheck fixes
- New CLI integration tests

## AutoML Enhancements
- Expanded search.rs with advanced hyperparameter optimization
- Grid search, random search, and TPE improvements
- Fixed clippy warnings (range contains, format strings)

## Documentation
- aprender-shell-harden-plan.md spec (16 issues, Toyota Way, 10 refs)
- metaheuristics-spec.md with CEC benchmarks
- Updated roadmap.yaml

## Quality
- 382 tests passing
- 92.66% coverage
- Clippy clean (-D warnings)
- PMAT: A+ (151/134), TDG: A+ (99/100)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
… unsafe)

POLICY: We will NEVER use unsafe code. If HE crypto primitives are needed,
we will implement them from scratch in safe Rust.

Additions:
- docs/specifications/homomorphic-encryption-spec.md (10 peer-reviewed citations)
- book/src/examples/shell-encryption-tiers.md (4-tier protection guide)
- src/format/homomorphic.rs (28 tests: types, traits, API design)
- Shell Tier 2 compression: save_compressed() (5 tests)
- Shell Tier 2+3 combo: save_compressed_encrypted()

4-Tier Model Protection:
- Tier 1: Plain (.apr)
- Tier 2: Compressed (zstd, 14x smaller)
- Tier 3: At-rest encrypted (AES-256-GCM)
- Tier 4: Homomorphic (API ready, crypto deferred)

Test counts:
- Core aprender: 2,292 tests (with format-homomorphic)
- aprender-shell: 127 tests (+5 compression)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add src/ensemble/ module with MoE, SoftmaxGating, MoeConfig
- Add ModelType::MixtureOfExperts (0x0040) to format
- Add examples/mixture_of_experts.rs runnable example
- Add book/src/examples/mixture-of-experts.md documentation
- Update model-format.md with MoE section and model type
- Fix Makefile coverage (move config before clean for sccache)
- Add docs/specifications/more-learning-specs.md (34 sections)
  - GAN, VAE, Diffusion, Contrastive, GNN, Meta-learning
  - Transfer learning for transpiler ecosystem
  - Distillation ingestion from entrenar
  - Code-specific ML for depyler oracle

Refs #101

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
100 test cases covering:
- Installation (5)
- train command (17)
- update command (8)
- suggest command (14)
- stats command (6)
- export/import (10)
- validate command (10)
- augment command (8)
- analyze command (6)
- tune command (6)
- zsh-widget (4)
- Edge cases (6)
- Performance benchmarks (5)
- Platform compatibility (5)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
New features:
- Mixture of Experts (MoE) ensemble module
- ModelType::MixtureOfExperts (0x0040)
- Future ML specs (34 sections)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Update aprender dependency from path to crates.io v0.11
- Ready for v0.2.0 release

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix trivial cast lint error in mmap.rs:611 that broke CI
- Update hero image: 17 → 18 model types (MoE added)
- Update hero image version: v0.9 → v0.11

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
noahgift and others added 13 commits January 1, 2026 15:37
…ers (Refs PAR-001)

- Create examples-reference.md with complete cargo run --example reference
- Organize 87 examples by category: Supervised, Unsupervised, Deep Learning,
  Time Series, NLP, Graph, Optimization, APR Format, AutoML, and more
- Add case study chapters for: logic-family-tree, mem-test, mem-test-full,
  phi-hf-import, qwen-apr-native, qwen-chat, qwen-inference, whisper-transcribe
- Update SUMMARY.md to include new examples reference page

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… PAR-001)

- Migrate APR format from v1 (APRN) to v2 (APR2) magic
- Update trueno 0.9.0 → 0.10.1 (thiserror 2.x compatibility)
- Update renacer 0.8 → 0.9.1
- Fix integration tests for v2 format (INT-01b, CC1)
- Bump version to 0.20.2

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…-001)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…efs PAR-001)

- Add conditional cfg for LlamaTokenizer import in chat.rs
- Add allow attributes for format_push_string and unnecessary_wraps
- Configure apr-cli specific clippy allows in Cargo.toml
- Fix formatting in create_test_apr.rs

All 5885 unit tests and 11 integration tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PAR-011: Add --gpu flag to run/serve commands - ✅ DONE
- Document --gpu flag implementation details
- Mark PAR-011 as complete in next priority section

The --gpu flag enables forced CUDA acceleration for:
- `realizar run model.gguf --gpu "prompt"`
- `realizar serve --model model.gguf --gpu`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- object 0.38.0 -> 0.38.1
- zmij 1.0.6 -> 1.0.7

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…PAR-023)

- Use realizar's full inference API for GGUF serving
- Endpoints: /generate, /stream/generate, /v1/completions
- Performance targets: 100+ tok/s CPU, 500+ tok/s GPU
- Add Ollama-parity benchmark suite
- Fix clippy warnings in federation module
- Update autograd backward pass to use trueno SIMD

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…-023)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated trueno dependency to 0.11.0
- Benefiting from improved AVX-512 coverage and TUI monitoring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add transparent compression/decompression for .apr model files:

- APR2 format: compressed payload with auto-detection
- LZ4: fast compression for real-time use cases
- ZSTD: higher ratio for cold storage
- Backward compatible: APR1 files still work
- Feature-gated: requires `format-compression` feature

API:
- AprWriter::with_compression(Compression::Lz4)
- AprReader::from_bytes() auto-detects format

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bumps [colored](https://github.com/mackwic/colored) from 2.2.0 to 3.0.0.
- [Release notes](https://github.com/mackwic/colored/releases)
- [Changelog](https://github.com/colored-rs/colored/blob/master/CHANGELOG.md)
- [Commits](colored-rs/colored@v2.2.0...v3.0.0)

---
updated-dependencies:
- dependency-name: colored
  dependency-version: 3.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Jan 5, 2026

Labels

The following labels could not be found: dependencies, rust. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Jan 19, 2026

Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request.

6 similar comments
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Jan 30, 2026

Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request.

@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Jan 30, 2026

Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request.

@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Jan 30, 2026

Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request.

@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Jan 31, 2026

Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request.

@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Jan 31, 2026

Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request.

@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Feb 2, 2026

Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request.

@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Mar 20, 2026

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot Bot deleted the dependabot/cargo/colored-3.0.0 branch March 20, 2026 16:52
noahgift added a commit that referenced this pull request Apr 25, 2026
… + fix flash_decode_seq_lens length-mismatch (Bug 2) (#1053)

* fix(ship-007): stabilize parity-gate reproducer + fix flash_decode_seq_lens length-mismatch (Bug 2)

Two paired fixes that unblock the SHIP-007 (GPU GQA-7:1 attention parity)
debugging route. The actual GQA-7:1 cosine=-0.005 divergence on the 7B
Q4_K teacher remains a separate multi-session fix; this PR delivers
the clean reproducer + clean eager debugging path.

## Fix 1 — APR_SKIP_FP8_WARMUP env var (cublas_prefill/attention.rs)

The cuBLASLt FP8 JIT warmup at PMAT-082 (attention.rs:1402) intermittently
produces `CUDA_ERROR_ILLEGAL_ADDRESS (code: 700)` on the 7B Q4_K teacher
(3584-hidden) which poisons the CUDA context and makes downstream
debugging instrumentation fire unreliably. Adding an opt-in
`APR_SKIP_FP8_WARMUP=1` env var skips the warmup so `apr parity` runs
deterministically. Default (env unset) preserves production warmup.

A clear `[PMAT-082] FP8 JIT warmup SKIPPED ...` log fires when the env
var is set, naming the trade-off (~3ms first-FP8-inference latency).

## Fix 2 — flash_decode_seq_lens_buf length-mismatch (Bug 2)

`flash_decoding_graphed.rs:102` had:

    buf.copy_from_host(&[seq_len])?;

where `buf` is `flash_decode_seq_lens_buf` — sized for `max_batch=32`
(per `batch.rs:444-445`) so it can serve M>1 batched graph replay. The
`copy_from_host` API requires exact length match (`data.len() == self.len`),
so a 1-element host slice copying into a 32-element device buffer fails
with `Length mismatch: host 1 vs device 32`. This blocked the whole
`SKIP_CUDA_GRAPH=1` debugging path — the parity gate aborted at layer 0
without ever reaching the per-layer GH-559 / GQA-DEBUG instrumentation.

Fix: pad the host slice to `buf.len()`, write `seq_len` into slot 0;
remaining slots are zero-initialised at allocation time and the kernel
ignores them in single-batch mode. Behaviour preserved for the existing
M>1 batched case (since this is the M=1 single-token decode path).

## Verified live (RTX 4090, noah-Lambda-Vector)

Before:
    SKIP_CUDA_GRAPH=1 apr parity ...qwen2.5-coder-7b-instruct-q4k.gguf
    → flaky: either "Length mismatch: host 1 vs device 32" OR
      "CUDA_ERROR_ILLEGAL_ADDRESS code: 700" depending on context state

After:
    APR_SKIP_FP8_WARMUP=1 SKIP_CUDA_GRAPH=1 apr parity ...
    → DETERMINISTIC: reaches the actual cosine = -0.005 PARITY-GATE
      FAILED diagnostic with full per-layer GH-559 + GQA-DEBUG dumps
      working in BOTH CPU (CPU_DEBUG=1, APR_TRACE_LAYERS=1) and GPU
      (GPU_DEBUG=1, GPU_DEBUG_ALL_LAYERS=1) directions.

## Side-by-side CPU vs GPU layer-0 OUTPUT (now visible)

    CPU first 5: [-0.49966, -0.23934, 0.11330, 0.04988, -0.01252]
    GPU first 5: [-0.49252, -0.26392, 0.12710, 0.10487, -0.02172]
    CPU sum=-1.968, GPU sum=-3.215  (~63% divergence)

This pinpoints layer 0 as the divergence start — INPUTS match exactly
(both [-0.0142, -0.0084, 0.0199, ...]) but OUTPUTS diverge after the
attention + FFN block. Next-session work: identify whether RMSNorm,
Q4K GEMV, RoPE, attention, output projection, or FFN is the culprit.

## What this PR does NOT include

- The actual GQA-7:1 cosine=-0.005 fix. That is a separate
  multi-session effort tracked in
  `memory/project_ship_007_attention_parity_investigation.md`.
- A regression-guard test for the GQA-7:1 case. Per "no half fixes":
  authoring an arithmetic-only test that wouldn't catch the actual
  layout/precision bug class would be a half-fix. The next session
  should author the test ALONGSIDE the actual fix.

## Closes

Task #147 (reproducer stabilization). Task #146 (full SHIP-007 fix)
remains in progress with substantially advanced investigation memo.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(ship-007): extend CPU+GPU debug prints from first 5 → first 16

Continuation of PR #1053 SHIP-007 investigation work. The 7th-pass
analysis showed element 3 of layer 0 OUTPUT diverges by 110% (CPU=0.05
vs GPU=0.10), but the existing GPU `[PAR-058-L0]` and CPU `[PMAT-114-GGUF]`
debug prints only emitted "first 3" or "first 5" elements — exactly at
the boundary of where the divergence is strongest, hiding it.

Extended GPU-side prints in:
  - `cuda/executor/layers/gemv_dispatch.rs:19` (PAR-058-L*)
  - `cuda/executor/layers/apply.rs:52, 76` (RMSNorm OK / Q4K Input)
  - `cuda/executor/kv_scatter.rs:619, 627, 639, 641, 653`
    (PAR-058-ATTN K/Q/V input + cache values)

Extended CPU-side prints in:
  - `gguf/inference/forward/debug.rs` (CPU_DEBUG / APR_TRACE_LAYERS paths)
  - `gguf/inference/forward/ffn_block.rs` (mirror of debug.rs for the
    other forward variant)

All "first 5" / "first5" / "[..5.min(...)]" patterns updated to 16.
No semantic change — just longer debug output windows. Strictly opt-in
via the existing env var gates (`GPU_DEBUG=1`, `GPU_DEBUG_ALL_LAYERS=1`,
`CPU_DEBUG=1`, `APR_TRACE_LAYERS=1`).

## Findings unlocked by this extension

Side-by-side CPU vs GPU layer 0 OUTPUT first 16 (RTX 4090, 7B Q4_K):

  idx    CPU         GPU         |Δ|         %
  ---    ---------   ---------   --------    ----
  0      -0.49966    -0.49252    0.00714     1.4%
  1      -0.23934    -0.26392    0.02458    10.3%
  2       0.11330     0.12710    0.01380    12.2%
  3       0.04988     0.10487    0.05498   110.0%
  4      -0.01252    -0.02172    0.00920    73.5%
  5      -0.51461    -0.49799    0.01662     3.2%
  6       0.42996     0.44902    0.01906     4.4%
  7      -0.41983    -0.42396    0.00413     1.0%
  8      -0.09195    -0.05816    0.03379    36.7%
  ...    ...         ...         ...        ...
  15      0.62939     0.64686    0.01747     2.8%

Mean |Δ| across 16 elements ≈ 0.020 (2.0%). Pattern is NOT
consistent with a head-permutation bug (would be uniform within
head_dim=128 chunks). It IS consistent with **accumulated FP32 noise
from multiple Q4K dequant + matmul operations** through layer 0
(Q-proj, K-proj, V-proj, attention, output-proj, SwiGLU, FFN gate/up/down,
then through residuals).

Compounded over 28 layers, a 2% per-layer divergence × 28 ≈
1.02^28 ≈ 1.74 — would explain large divergence at logits.
The 110% outlier at element 3 specifically points to one of:
  - SwiGLU non-linearity amplifying near-zero values
  - FFN_down accumulator order on element 3 specifically
  - Q4K dequant disagreement between CPU SIMD and GPU PTX

## What this PR still does NOT include

The actual GQA-7:1 attention parity fix. Memory:
`memory/project_ship_007_attention_parity_investigation.md` 8-pass
investigation thread documents the next-session entry points:
1. Add CPU intermediate prints (SwiGLU output, FFN_down output, Residual1)
   to mirror the new GPU `[PAR-058-L0]` first-16 prints.
2. Once CPU intermediates are visible, identify the specific stage where
   CPU and GPU first diverge by >50% on element 3.
3. Inspect that specific CPU vs GPU implementation (Q4K dequant vs
   SwiGLU vs FFN_down) for the bug.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant