feat: Init v0.1.30 by ncrispino · Pull Request #701 · massgen/MassGen

ncrispino · 2025-12-26T06:02:15Z

PR Title Format

Your PR title must follow the format: <type>: <brief description>

Valid types:

fix: - Bug fixes
feat: - New features
breaking: - Breaking changes
docs: - Documentation updates
refactor: - Code refactoring
test: - Test additions/modifications
chore: - Maintenance tasks
perf: - Performance improvements
style: - Code style changes
ci: - CI/CD configuration changes

Examples:

fix: resolve memory leak in data processing
feat: add export to CSV functionality
breaking: change API response format
docs: update installation guide

Description

Brief description of the changes in this PR

Type of change

Checklist

I have run pre-commit on my changed files and all checks pass
My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Pre-commit status

# Paste the output of running pre-commit on your changed files:
# uv run pre-commit install
# git diff --name-only HEAD~1 | xargs uv run pre-commit run --files # for last commit
# git diff --name-only origin/<base branch>...HEAD | xargs uv run pre-commit run --files # for all commits in PR
# git add <your file> # if any fixes were applied
# git commit -m "chore: apply pre-commit fixes"
# git push origin <branch-name>

How to Test

Add test method for this PR.

Test CLI Command

Write down the test bash command. If there is pre-requests, please emphasize.

Expected Results

Description/screenshots of expected results.

Additional context

Add any other context about the PR here.

Reverts the default behavior of _retrieval_exclude_recent from True to False in SingleAgent and CLI configuration. The previous default, introduced in commit 81c0205 (Oct 28), was an optimization intended to prevent context duplication by waiting for context compression before querying memory. However, this caused a regression where agents in new sessions (or short conversations) would completely ignore persistent memory, breaking the expected behavior defined in tests from Oct 18 (c0ea2e9). This change restores the original contract where agents consult persistent memory immediately upon initialization, ensuring long-term memory is accessible from the first turn. The optimization remains available as a configurable option.

…olve test failures Backend Fixes: Updated _register_custom_tools in base_with_custom_tool_and_mcp.py to correctly handle callable objects for custom tools, ensuring proper name extraction and registration. File System Security: Expanded BINARY_FILE_EXTENSIONS in _constants.py to include additional video, audio, and executable formats (.m4v, .mpg, .mpeg, .o, .a, .class, .jar, .wma) to prevent LLM ingestion. Test Suite Improvements: Fixed monkeypatching logic in test_backend_event_loop_all.py to patch modules directly. Updated test_custom_tools.py to match expected tool naming conventions. Added @pytest.mark.integration and temporary workspace handling to test_claude_code.py and test_claude_code_orchestrator.py. Fixed import error in test_gemini_planning_mode.py (renamed base_with_mcp to base_with_custom_tool_and_mcp). Updated expected Gemini model version in test_config_builder.py.

Issue: Tests in test_final_presentation_fallback.py were failing with "object Mock can't be used in 'await' expression" because mock_agent.backend.filesystem_manager was an auto-generated Mock that the orchestrator tried to await. Fix: Added explicit mock setup mock_agent.backend.filesystem_manager = None to all 3 test functions, which skips the snapshot copying logic in get_final_presentation(). Result: 3 Fail → 3 Pass

Update test expectations to align with current implementation behavior: Custom Tool Prefix Tests (6 clusters): - Update tests to expect 'custom_tool__' prefix for registered tools - Affected: test_add_tool_function_direct, test_add_tool_with_path, test_backend_registration (AG2), test_custom_tool_execution_flow, test_custom_tool_error_handling, and related schema/categorization tests VHS Terminal Evaluation Tests (2 clusters): - Add monkeypatch to mock VHS as installed in test_invalid_output_format - Update expected Sleep duration from 10s to 2s in test_vhs_tape_creation Error Message Validation Tests (2 clusters): - Update Azure OpenAI regex to match new API key validation order - Update PersistentMemory regex to match new llm_config validation message Filesystem MCP Exclusion Tests (1 cluster + bonus): - Update tests to expect 'filesystem' server IS present with limited tools (write_file, edit_file) instead of being completely excluded Test suite results: 549 passed (+15), 15 failed (-17), 56 skipped

… with indivudual cluster processing and veryfying their oput, work on the next 10 clusters, report back when done Triage and fix all 56 failing tests, bringing the suite from 518 to 562 passing. Test Fixes: - Update tests to expect `custom_tool__` prefix on registered tool names - Fix async generator detection: use `inspect.isasyncgenfunction()` instead of `iscoroutinefunction()` - Add missing mock attributes (filesystem_manager=None) to agent mocks - Update regex patterns to match new validation message order - Fix parameter name changes (context_paths, model defaults) - Add missing binary file extensions (.m4v, .mpg, .mpeg, .o, .a, .class, .jar, .wma) - Make directory listing tests resilient to pip metadata files (*.pyc, *.pyo) Test Exclusions: - Rename manual integration scripts to exclude from pytest discovery: - test_context_window_management.py → manual_context_window_management.py - test_filesystem_tool_integration.py → manual_filesystem_tool_integration.py Deferred (xfail): - 4 tests for MockClaudeCodeAgent missing orchestrator features - 2 tests for removed backend methods (get_tool_definitions, get_tools_for_request) Markers Added: - @pytest.mark.integration for ClaudeCode tests requiring workspace cwd Final: 562 passed, 58 skipped/xfailed, 0 failed

- Support both Azure-specific and OpenAI-compatible endpoints in AzureOpenAIBackend - Detect endpoint format and use appropriate client (AsyncAzureOpenAI vs AsyncOpenAI) - Add environment variable expansion (${VAR}) in YAML/JSON config files - Conditionally disable stream_options for Ministral/Mistral models - Enhance error logging with detailed tracebacks - Update azure_openai_multi.yaml config to use env vars for flexibility This enables using Azure's services.ai.azure.com OpenAI-compatible endpoints alongside traditional cognitiveservices.azure.com endpoints.

Comprehensive fix for 'Agent failed to use workflow tools' error: 1. Improved JSON extraction - Replace regex with brace-balanced parsing that handles nested objects. Supports complex nested structures. 2. Enhanced fallback pattern - Allow whitespace, escaped characters, and multi-line content in {"content": "..."} format. 3. Comprehensive error logging - Detailed logging at each stage with content samples and error details for debugging. 4. Three-tier extraction strategy: - Markdown code blocks (highest priority) - Balanced brace extraction (handles nested JSON) - Simple fallback pattern (for content-only responses) Tested with various formats including nested tool calls, content with quotes, markdown blocks, and multiple JSON blocks. Fixes false 'Agent failed to use workflow tools' errors where agents respond correctly but extraction regex fails.

…ing and fallback patterns Major improvements to workflow tool extraction reliability: 1. Enhanced logging throughout tool detection and extraction pipeline: - Log tool detection with counts and names at agent level - Debug logging for content being parsed - Success/failure logging at each extraction strategy - Warning logs with content samples when extraction fails 2. Improved system instructions for workflow tools: -

…wrapper

fix: Add OpenAI-compatible Azure endpoint support and env var expansion

test: making all tests green

feat: Web Search Plugin added to OpenRouter (MAS - 165)

feat: Improve diversity

docs: docs for v0.1.30

maxim-saplin and others added 30 commits December 22, 2025 09:53

Triage, WF

ec8d546

Triage on main

39de700

WF v2

6dd7b84

Misc

7ffb8f7

Update TRIAGE_WORKFLOW_v2.md

d7137f1

Cluster 1

9c6c2d4

Cluster 2

ff2ffc7

Faster rate limiter tests

435edc9

Delete testing_triage.md

e3a82f7

Cleanup

5e29725

Merge remote-tracking branch 'upstream/main' into mass-test-fix

4484062

pre-commit fixes

f46195d

Merge remote-tracking branch 'origin/subagents' into improve_diversity

7fce280

Web Search Plugin added to OpenRouter

6a28e6c

Resolving CI checks

385a56f

Fix precommits

084b455

Example Web Search OpenRouter Configs

aaa07cd

web search configs

5df0992

precommits fix

8947366

Fix persona generation json

042febf

Merge branch 'main' into improve_diversity

37fa928

Fix docker background shell; increase persona subagent timeout

c7c63ae

Add black formatting for checks

8fd8081

OpenRouter: Web Search plugin enabled for 'No MCP mode'

f76f91c

shubham2345 and others added 23 commits December 25, 2025 17:20

Black formatter

58561cc

Merge remote-tracking branch 'origin/main' into pr/maxim-saplin/688

71bb3fd

Pre-Commit Checks fix

e8896db

Add fallback for models that output tool arguments without tool_name …

50b250b

…wrapper

Init v0.1.30

213a3b7

Add black formatting for checks

6ea1a89

Merge pull request #698 from massgen/feature/azure-openai-multi-endpoint

115afd3

fix: Add OpenAI-compatible Azure endpoint support and env var expansion

Clean triage workflow; adjust xfail

c68a64c

Merge branch 'dev/v0.1.30' into mass-test-fix

7a1bd41

Fix remaining test categorizations, add description for contributors

2f74539

Merge pull request #688 from maxim-saplin/mass-test-fix

5928bc5

test: making all tests green

Fix timing, make concurrency default

3c1dbcd

Dedup code and add params

c0e33ef

Merge pull request #693 from massgen/restrict_openrouter_list

4b7b409

feat: Web Search Plugin added to OpenRouter (MAS - 165)

Remove old parameters; update model in rst docs

f6312f0

Merge in dev/v0.1.30

90f1526

Merge pull request #699 from massgen/improve_diversity

0cfa065

feat: Improve diversity

docs

1a32790

update command in README

71eb5d8

Fix config for persona

6a6dd5d

Merge pull request #703 from massgen/docs_for_v0.1.30

754f855

docs: docs for v0.1.30

a5507203 approved these changes Dec 26, 2025

View reviewed changes

Henry-811 approved these changes Dec 26, 2025

View reviewed changes

Henry-811 merged commit 8fc1667 into main Dec 26, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Init v0.1.30#701

feat: Init v0.1.30#701
Henry-811 merged 53 commits intomainfrom
dev/v0.1.30

ncrispino commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

ncrispino commented Dec 26, 2025

PR Title Format

Description

Type of change

Checklist

Pre-commit status

How to Test

Test CLI Command

Expected Results

Additional context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants