fix(core): improve SKILL.md frontmatter parsing robustness#25728
fix(core): improve SKILL.md frontmatter parsing robustness#25728NTaylorMullen wants to merge 2 commits intomainfrom
Conversation
- Support UTF-8 BOM at the start of SKILL.md files. - Support optional trailing spaces after frontmatter markers. - Improve simple parser fallback to be case-insensitive and support spaces before colons. - Prevent simple parser from swallowing subsequent keys into descriptions. - Added unit tests for identified edge cases. Fixes #25693
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the reliability of SKILL.md frontmatter parsing. By addressing common formatting inconsistencies and edge cases, it ensures that skill discovery remains robust even when files contain invisible characters or non-standard YAML formatting. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request enhances the robustness of the skill loader by improving frontmatter parsing. Key changes include support for UTF-8 BOM, handling trailing whitespace in frontmatter markers, and making the simple parser case-insensitive with support for spaces before colons. It also adds a fallback to the simple parser when YAML values are not strings. A review comment suggested refining the logic for multi-line descriptions to prevent common patterns like "Note:" from being incorrectly identified as new keys, recommending a restriction to known keys like "name" and "description".
| if (nextLine.match(/^[ \t]+\S/)) { | ||
| // If next line is indented, it's a continuation of the description, | ||
| // UNLESS it looks like another key (e.g. "name:") | ||
| if (nextLine.match(/^[ \t]+\S/) && !nextLine.match(/^\s*\w+\s*:/)) { |
There was a problem hiding this comment.
The logic to prevent "swallowing" other keys into the description is too broad. The regex ^\s*\w+\s*: matches any indented line starting with a single word followed by a colon, such as "Note:" or "Example:". These are common patterns in multi-line descriptions and should not cause the parser to stop. Since this parser is specifically designed to extract "name" and "description", the check should be restricted to those known keys. Additionally, ensure the extracted string is trimmed and a fallback is used if the result is empty, consistent with repository standards for optional strings.
| if (nextLine.match(/^[ \t]+\S/) && !nextLine.match(/^\s*\w+\s*:/)) { | |
| if (nextLine.match(/^[ \t]+\S/) && !nextLine.match(/^\s*(name|description)\s*:/i)) { |
References
- When using an optional string with a fallback value, trim the optional string and use the fallback if the result is empty to avoid uninformative messages from whitespace-only strings.
|
|
||
| // Match "name:" at the start of the line (optional whitespace) | ||
| const nameMatch = line.match(/^\s*name:\s*(.*)$/); | ||
| const nameMatch = line.match(/^\s*name\s*:\s*(.*)$/i); |
|
|
||
| // Match "description:" at the start of the line (optional whitespace) | ||
| const descMatch = line.match(/^\s*description:\s*(.*)$/); | ||
| const descMatch = line.match(/^\s*description\s*:\s*(.*)$/i); |
|
Size Change: +73 B (0%) Total Size: 33.7 MB
ℹ️ View Unchanged
|
- Restrict multi-line description continuation check to known keys (name, description) - Avoid incorrectly identifying lines starting with "Note:" or similar as new keys - Add test case to verify the fix Fixes #25693
nidhishgajjar
left a comment
There was a problem hiding this comment.
This is a well-implemented robustness improvement for SKILL.md frontmatter parsing that addresses real-world file handling issues.
What this PR fixes:
- Skill discovery was failing due to common but "invisible" file characteristics like UTF-8 BOM and trailing spaces
- The simple parser was too strict and would fail on valid but slightly malformed frontmatter
- Multi-line descriptions could incorrectly swallow other keys
Code quality:
- Targeted improvements to regex and parsing logic (107 additions, 5 deletions, 2 files)
- Updated FRONTMATTER_REGEX to handle UTF-8 BOM and trailing whitespace
- Enhanced parseSimpleFrontmatter with case-insensitivity and better colon handling
- Fixed multi-line description parsing edge cases
Testing:
- Excellent test coverage with 5 new test cases covering:
- UTF-8 BOM handling
- Trailing spaces after markers
- Case-insensitivity and space before colon
- Prevention of key swallowing into descriptions
- Proper handling of "Note:" in multi-line descriptions
- All tests clearly demonstrate the issues being fixed
Documentation:
- Clear PR description explaining the three main improvements
- Properly references the related issue (#25693)
- Validation steps are well-documented
- Pre-merge checklist mostly complete (missing documentation update noted)
Robustness considerations:
- The changes make the parser more forgiving without sacrificing correctness
- Fallback mechanisms are properly implemented
- Edge cases are well-covered by tests
Approval: This looks good to merge. The improvements address real parsing issues, are well-tested, and make the skill discovery more robust without introducing breaking changes.
Summary
This PR improves the robustness of `SKILL.md` frontmatter parsing to resolve issues where skill discovery would fail due to common but "invisible" file characteristics, such as UTF-8 Byte Order Marks (BOM) or trailing spaces after markers.
Details
Related Issues
Fixes #25693
How to Validate
Create `SKILL.md` with a trailing space after the first `---`:
```
name: test-spaces
```description: Testing trailing spaces.
Repeat with a file containing a UTF-8 BOM at the start.
Pre-Merge Checklist