Skip to content

improvement: topic-aware section classification in AGENTS.md extraction #47

@tbitcs

Description

@tbitcs

Summary

During AGENTS.md extraction, specsmith classifies sections by heading keywords only.
Some content blocks land in semantically wrong files because the heading doesn't
reflect the content type.

Examples

  • Register maps (hardware reference) -> verification.md (should be architecture.md)
  • Phase 2 roadmap (project status) -> rules.md (should be architecture.md or excluded)
  • Repository structure (reference layout) -> rules.md (should be architecture.md)
  • Windows path-length mitigation (env setup) -> rules.md (should be drift-metrics.md)

Proposed Fix

Use content-level keyword scanning (not just heading names) to route blocks:

  • Lines containing register offsets, address maps -> architecture.md
  • Lines containing milestone, completion, roadmap -> architecture.md
  • Lines containing one-time setup, machine-specific -> drift-metrics.md
  • Project-type hints: fpga-rtl projects route hardware descriptions differently than web apps

Related

Part of import quality improvements. P0 items (diff markers, dedup) already fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions