Skip to content

feat: Improve skill use and system prompt organization#515

Merged
Henry-811 merged 8 commits intodev/v0.1.12from
improve_skills_memory
Nov 14, 2025
Merged

feat: Improve skill use and system prompt organization#515
Henry-811 merged 8 commits intodev/v0.1.12from
improve_skills_memory

Conversation

@ncrispino
Copy link
Copy Markdown
Collaborator

@ncrispino ncrispino commented Nov 14, 2025

Description

Major refactoring of system prompt architecture using priority-based XML sections, addition of semantic search capabilities via semtools and serena skills, enhanced browser automation with image persistence, and support for local skill execution outside Docker.

Key improvements:

  • System prompt architecture: New class-based, priority-driven design with XML structure (Closes MAS-76)
  • Semantic search: Added semtools and serena skills for meaning-based code/document search (Closes MAS-72)
  • Browser automation: Screenshots now persist as files in workspace
  • Local skills: Skills can run locally without Docker container requirement
  • Code quality: Reduced complexity in orchestrator (-428 lines) and message templates (-682 lines)

Type of change

  • Code refactoring (refactor:) - Code changes that neither fix a bug nor add a feature
  • New feature (feat:) - Non-breaking change which adds functionality
  • Bug fix (fix:) - Non-breaking change which fixes an issue
  • Breaking change (breaking:) - Fix or feature that would cause existing functionality to not work as expected
  • Documentation (docs:) - Documentation updates
  • Tests (test:) - Adding missing tests or correcting existing tests
  • Chore (chore:) - Maintenance tasks, dependency updates, etc.
  • Performance improvement (perf:) - Code changes that improve performance
  • Code style (style:) - Changes that do not affect the meaning of the code (formatting, missing semi-colons, etc.)
  • CI/CD (ci:) - Changes to CI/CD configuration files and scripts

Detailed Changes

1. System Prompt Architecture Refactor (Closes MAS-76)

New files:

  • massgen/system_prompt_sections.py (+1,284 lines): Class-based section architecture
    • Priority enum: CRITICAL → AUXILIARY ordering
    • SystemPromptSection base class with XML support
    • 20+ specialized section classes (AgentIdentity, MassGenPrimitives, Skills, Memory, etc.)
    • Based on Lakera AI Prompt Engineering Guide 2025 & Anthropic best practices
  • massgen/system_message_builder.py (+488 lines): Declarative prompt builder
    • Automatic section ordering by priority
    • XML wrapping with priority attributes
    • Subsection support for hierarchical structure
  • docs/dev_notes/system_prompt_architecture_redesign.md (+593 lines): Design rationale

Modified files:

  • massgen/orchestrator.py (-428 lines): Cleaner, delegates to SystemMessageBuilder
  • massgen/message_templates.py (-682 lines): Removed redundant prompt logic
  • massgen/backend/claude_code.py: Integration with new system message builder

2. Semantic Search Skills (Closes MAS-72)

New skills:

  • massgen/skills/semtools/SKILL.md (+635 lines)
    • Rust-based CLI for embedding-based semantic search
    • Workspace management for large codebases
    • Document parsing (PDF, DOCX, PPTX) with API key support
    • Find code by meaning, not just keywords
  • massgen/skills/serena/SKILL.md (+522 lines)
    • Complementary semantic search capabilities
    • Alternative approach for different use cases

Documentation:

  • docs/source/user_guide/skills.rst (+222 lines): Comprehensive semantic vs. keyword search guide
  • docs/source/reference/yaml_schema.rst (+36 lines): Updated schema docs

3. Local Skill Execution

  • massgen/filesystem_manager/skills_manager.py: Refactored for local mode
  • massgen/skills/file-search/SKILL.md: Renamed from always/file_search
  • massgen/backend/claude_code.py: Ensures CC uses execute_command instead of raw bash
  • Skills now run directly on host without Docker requirement (Docker still supported)

4. Browser Automation Enhancement

  • massgen/tools/custom_tools/_browser_automation/browser_automation_tool.py (+39 lines)
    • Screenshots saved as files within workspace
    • Similar functionality to crawl4ai
    • Enables persistent visual context for agents

5. Other Improvements

  • Updated Docker README and Dockerfiles
  • Memory filesystem mode refactored to use direct file operations
  • Multiple example configs for new features
  • README updates for PyPI and main repo

Checklist

  • I have run pre-commit on my changed files and all checks pass
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Pre-commit status

# Run pre-commit on all changed files:
git diff --name-only main...HEAD | xargs uv run pre-commit run --files

How to Test

Test 1: System Prompt Architecture

CLI Command:

# Run existing orchestration tests to verify system prompt generation
uv run pytest massgen/tests/test_orchestration_restart.py -v

Expected Results:

  • All tests pass
  • System prompts use new XML structure internally
  • No behavior changes from user perspective

Test 2: Semantic Search Skills

CLI Command:

# Test semtools skill (requires semtools installed: cargo install semtools)
uv run massgen massgen/configs/skills/test_semantic_skills.yaml

# Test with semantic search example
uv run massgen massgen/configs/skills/semantic_search_comprehensive.yaml

Expected Results:

  • Skills load successfully
  • Semantic search finds code by meaning, not just keywords
  • Document parsing works with API key configured

Test 3: Local Skill Execution

CLI Command:

# Test skills running locally without Docker
uv run massgen massgen/configs/skills/skills_local_mode.yaml

Expected Results:

  • Skills execute on host machine
  • No Docker container required
  • Same functionality as Docker mode

Test 4: Browser Automation with Image Persistence

CLI Command:

# Test browser automation with screenshot saving
uv run massgen massgen/configs/tools/custom_tools/multimodal_tools/playwright_with_img_understanding.yaml

Expected Results:

  • Screenshots saved as files in workspace
  • Agents can reference saved images
  • Visual context persists across turns

Additional context

Impact: This is a significant architectural improvement that:

  1. Makes system prompts more maintainable and debuggable
  2. Adds powerful semantic search capabilities beyond keyword matching
  3. Simplifies skill deployment (no Docker required for basic use)
  4. Improves visual context handling in browser automation

Stats: 26 files changed: +4,234 insertions, -1,122 deletions (net +3,112 lines)

Future work (separate PRs):

  • Allow custom tools to run within Docker (MAS-79)
  • MCPs in Docker for better isolation
  • Evaluate browser automation approach (custom tool vs. direct code execution)

@Henry-811 Henry-811 changed the base branch from main to dev/v0.1.12 November 14, 2025 15:27
@Henry-811 Henry-811 merged commit 5a08d6f into dev/v0.1.12 Nov 14, 2025
22 checks passed
@Henry-811 Henry-811 deleted the improve_skills_memory branch November 14, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants