Conversation
There was a problem hiding this comment.
Pull request overview
Fixes incorrect Content-ID extraction in mailparse_msg_get_part_data() when the header value contains parentheses by avoiding RFC 822 address parsing (which treats parenthesized text as comments).
Changes:
- Replace RFC 822 address-token parsing for
Content-IDwith direct trimming + bracket stripping inmailparse.c. - Add a new PHPT regression test covering parenthesized
Content-IDvalues and bracketed/unbracketed forms.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
mailparse.c |
Updates Content-ID extraction logic to preserve parenthesized text by no longer using the RFC 822 address parser. |
tests/gh20.phpt |
Adds regression coverage for GH-20 to ensure parentheses in Content-ID are preserved. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fix Content-ID parsing stripping parenthesized text (GH-20)
Fixes #20
Problem
mailparse_msg_get_part_data()returns the wrongcontent-idvalue when the Content-ID header contains parentheses. For example:Returns
Facebook_32x32 _aa284ba9-f148-4698-9c1f-c8e92bdb842e.pnginstead of the correctFacebook_32x32(1)_aa284ba9-f148-4698-9c1f-c8e92bdb842e.png.The code was passing the Content-ID value through the RFC 822 address tokenizer (
php_mailparse_rfc822_tokenize+php_rfc822_parse_address_tokens), which interprets parenthesized text as RFC 822 comments and strips them. Content-ID is a message identifier, not an email address, so full address parsing is inappropriate here.Fix
Replace the RFC 822 address parser with simple whitespace trimming and angle bracket stripping. This preserves the existing behavior for standard RFC-compliant Content-IDs like
<part1@example.com>(angle brackets removed, inner value returned) while fixing the bug for values containing parentheses.