Strip Unicode Cf characters in PrintableString#4593
Strip Unicode Cf characters in PrintableString#4593tnull merged 1 commit intolightningdevkit:mainfrom
Cf characters in PrintableString#4593Conversation
|
👋 Thanks for assigning @valentinewallace as a reviewer! |
|
No issues found. The previously flagged off-by-one in the Egyptian Hieroglyph Cf range ( |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4593 +/- ##
==========================================
- Coverage 86.84% 86.09% -0.75%
==========================================
Files 161 157 -4
Lines 109260 108828 -432
Branches 109260 108828 -432
==========================================
- Hits 94882 93694 -1188
- Misses 11797 12519 +722
- Partials 2581 2615 +34
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
valentinewallace
left a comment
There was a problem hiding this comment.
LGTM after squash and CI fix
907dd41 to
424a05f
Compare
|
Squashed with the following changes: diff --git a/lightning-types/src/string.rs b/lightning-types/src/string.rs
index 5f131f542..98055cf64 100644
--- a/lightning-types/src/string.rs
+++ b/lightning-types/src/string.rs
@@ -44,9 +44,10 @@ impl<'a> fmt::Display for PrintableString<'a> {
}
-// Codepoints in Unicode general category `Cf` (Format), per Unicode 16.0. These are not matched
-// by `char::is_control` (which only covers `Cc`), but include the bidirectional override / isolate
-// controls (e.g. U+202E RLO) and zero-width characters behind the "Trojan Source" attack family
-// (CVE-2021-42574), where an attacker-supplied string renders to a human reader as something other
-// than its byte content. Strip them alongside `Cc` characters when sanitising untrusted input.
+// Codepoints in Unicode general category `Cf` (Format), per Unicode standard. These are not
+// matched by `char::is_control` (which only covers `Cc`), but include the bidirectional override /
+// isolate controls (e.g. U+202E RLO) and zero-width characters behind the "Trojan Source" attack
+// family (CVE-2021-42574), where an attacker-supplied string renders to a human reader as
+// something other than its byte content. Strip them alongside `Cc` characters when sanitising
+// untrusted input.
fn is_format_char(c: char) -> bool {
matches!( |
`PrintableString` is the sanitiser LDK uses to render untrusted strings
(node aliases, BOLT-12 invoice / offer text, `UntrustedString`, LSPS
messages, `lightning-invoice` descriptions) to logs and UI. It only
replaced `char::is_control` matches (Unicode general category `Cc`)
with U+FFFD, leaving the entire `Cf` (Format) category untouched.
That is the exact category covering the bidirectional override /
isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width
characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack
family (CVE-2021-42574): a peer can set its alias / invoice description
/ offer fields to e.g. `safe\u{202E}cipsxe.exe`, which previously
passed through verbatim while a human reader sees `safeexe.cips` —
defeating the threat model `PrintableString` exists to defend against.
Replace `Cf` codepoints alongside `Cc` ones. The `Cf` ranges are
inlined as a `matches!` table sourced from Unicode 16.0 to keep the
change `no_std`-friendly with no new dependencies.
Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
424a05f to
1a01b5a
Compare
|
Ah, whoops, forgot to run diff --git a/lightning-types/src/string.rs b/lightning-types/src/string.rs
index 98055cf64..e45c17d85 100644
--- a/lightning-types/src/string.rs
+++ b/lightning-types/src/string.rs
@@ -106,8 +106,5 @@ mod tests {
// U+13440 is in the Egyptian Hieroglyph Format Controls block, but its
// general category is `Mn`, not `Cf`, so the `Cf` range ends at U+1343F.
- assert_eq!(
- format!("{}", PrintableString("x\u{1343F}y\u{13440}z")),
- "x\u{FFFD}y\u{13440}z"
- );
+ assert_eq!(format!("{}", PrintableString("x\u{1343F}y\u{13440}z")), "x\u{FFFD}y\u{13440}z");
}
} |
PrintableStringis the sanitiser LDK uses to render untrusted strings (node aliases, BOLT-12 invoice / offer text,UntrustedString, LSPS messages,lightning-invoicedescriptions) to logs and UI. It only replacedchar::is_controlmatches (Unicode general categoryCc) with U+FFFD, leaving the entireCf(Format) category untouched.That is the exact category covering the bidirectional override / isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack family (CVE-2021-42574): a peer can set its alias / invoice description / offer fields to e.g.
safe\u{202E}cipsxe.exe, which previously passed through verbatim while a human reader seessafeexe.cips— defeating the threat modelPrintableStringexists to defend against.Replace
Cfcodepoints alongsideCcones. TheCfranges are inlined as amatches!table sourced from Unicode 16.0 to keep the changeno_std-friendly with no new dependencies.Co-Authored-By: HAL 9000