Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for extracting additional PNG textual metadata by recognizing the iTXt chunk type in the existing PNG metadata extraction pipeline.
Changes:
- Treat
iTXtchunks as textual metadata chunks alongsidetEXt. - Adjust null-terminator trimming when parsing chunk payloads into
(Key, Value)pairs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case "tEXt": | ||
| case "iTXt": | ||
| var keywordValuePair = await ReadTextualData(fs, chunkLength); | ||
| SkipBytes(fs, 4); // Move past the current chunk CRC | ||
| yield return keywordValuePair; |
There was a problem hiding this comment.
iTXt chunks do not have the same payload layout as tEXt (they include compression flag/method + language tag + translated keyword + possibly zlib-compressed UTF-8 text). Routing iTXt through ReadTextualData will yield incorrect values (and can include binary bytes/garbage). Add a dedicated ReadInternationalTextualData parser for iTXt (including optional zlib decompression) and call that from this switch case instead of reusing the tEXt logic.
| var dataString = buffer.BytesToString().TrimEnd(NullTerminator).Trim(); | ||
| var nullIndex = dataString.IndexOf(NullTerminator); | ||
| return new ( | ||
| nullIndex > -1 ? dataString.Substring(0, nullIndex) : string.Empty, | ||
| nullIndex > -1 && nullIndex + 1 < length ? dataString.Substring(nullIndex + 1).TrimEnd(NullTerminator) : string.Empty | ||
| nullIndex > -1 && nullIndex + 1 < length ? dataString.Substring(nullIndex + 1).Trim(NullTerminator) : string.Empty | ||
| ); |
There was a problem hiding this comment.
ReadTextualData converts the entire chunk to a string and then trims null terminators. This breaks iTXt parsing because the bytes immediately after the keyword are binary (compression flag/method) and may legitimately be 0x00; Trim(NullTerminator) will drop those leading bytes and corrupt the payload. For iTXt, parse at the byte level (split on nulls for the keyword/language/translated keyword, then treat the remaining bytes as (optionally decompressed) UTF-8 text) rather than trimming nulls from a decoded string.
| case "tEXt": | ||
| case "iTXt": | ||
| var keywordValuePair = await ReadTextualData(fs, chunkLength); | ||
| SkipBytes(fs, 4); // Move past the current chunk CRC | ||
| yield return keywordValuePair; |
There was a problem hiding this comment.
This change adds iTXt handling, but the test suite only covers tEXt via latin1-pngtext.png. Add at least one PNG fixture containing an iTXt chunk (ideally both compressed and uncompressed variants) and a corresponding test assertion to prevent regressions in iTXt parsing/decompression.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@jamesmoore I've opened a new pull request, #492, to work on those changes. Once the pull request is ready, I'll request review from you. |
No description provided.