Skip to content

Strip Unicode Cf characters in PrintableString#4593

Merged
tnull merged 1 commit intolightningdevkit:mainfrom
tnull:2026-05-printable-string-bidi
May 7, 2026
Merged

Strip Unicode Cf characters in PrintableString#4593
tnull merged 1 commit intolightningdevkit:mainfrom
tnull:2026-05-printable-string-bidi

Conversation

@tnull
Copy link
Copy Markdown
Contributor

@tnull tnull commented May 5, 2026

PrintableString is the sanitiser LDK uses to render untrusted strings (node aliases, BOLT-12 invoice / offer text, UntrustedString, LSPS messages, lightning-invoice descriptions) to logs and UI. It only replaced char::is_control matches (Unicode general category Cc) with U+FFFD, leaving the entire Cf (Format) category untouched.

That is the exact category covering the bidirectional override / isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack family (CVE-2021-42574): a peer can set its alias / invoice description / offer fields to e.g. safe\u{202E}cipsxe.exe, which previously passed through verbatim while a human reader sees safeexe.cips — defeating the threat model PrintableString exists to defend against.

Replace Cf codepoints alongside Cc ones. The Cf ranges are inlined as a matches! table sourced from Unicode 16.0 to keep the change no_std-friendly with no new dependencies.

Co-Authored-By: HAL 9000

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 5, 2026

👋 Thanks for assigning @valentinewallace as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Comment thread lightning-types/src/string.rs
@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented May 5, 2026

No issues found.

The previously flagged off-by-one in the Egyptian Hieroglyph Cf range (0x13430..=0x134400x13430..=0x1343F) has been correctly fixed. The Unicode 16.0 Cf table is complete, the security-critical bidi override / zero-width characters are all covered, and the boundary test at line 108 validates the fix. No new issues identified on re-review.

Comment thread lightning-types/src/string.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 86.09%. Comparing base (1a26867) to head (1a01b5a).
⚠️ Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
lightning-types/src/string.rs 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4593      +/-   ##
==========================================
- Coverage   86.84%   86.09%   -0.75%     
==========================================
  Files         161      157       -4     
  Lines      109260   108828     -432     
  Branches   109260   108828     -432     
==========================================
- Hits        94882    93694    -1188     
- Misses      11797    12519     +722     
- Partials     2581     2615      +34     
Flag Coverage Δ
fuzzing-fake-hashes ?
fuzzing-real-hashes ?
tests 86.09% <95.83%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after squash and CI fix

Comment thread lightning-types/src/string.rs Outdated
@tnull tnull force-pushed the 2026-05-printable-string-bidi branch from 907dd41 to 424a05f Compare May 7, 2026 08:46
@tnull
Copy link
Copy Markdown
Contributor Author

tnull commented May 7, 2026

Squashed with the following changes:

diff --git a/lightning-types/src/string.rs b/lightning-types/src/string.rs
index 5f131f542..98055cf64 100644
--- a/lightning-types/src/string.rs
+++ b/lightning-types/src/string.rs
@@ -44,9 +44,10 @@ impl<'a> fmt::Display for PrintableString<'a> {
 }

-// Codepoints in Unicode general category `Cf` (Format), per Unicode 16.0. These are not matched
-// by `char::is_control` (which only covers `Cc`), but include the bidirectional override / isolate
-// controls (e.g. U+202E RLO) and zero-width characters behind the "Trojan Source" attack family
-// (CVE-2021-42574), where an attacker-supplied string renders to a human reader as something other
-// than its byte content. Strip them alongside `Cc` characters when sanitising untrusted input.
+// Codepoints in Unicode general category `Cf` (Format), per Unicode standard. These are not
+// matched by `char::is_control` (which only covers `Cc`), but include the bidirectional override /
+// isolate controls (e.g. U+202E RLO) and zero-width characters behind the "Trojan Source" attack
+// family (CVE-2021-42574), where an attacker-supplied string renders to a human reader as
+// something other than its byte content. Strip them alongside `Cc` characters when sanitising
+// untrusted input.
 fn is_format_char(c: char) -> bool {
        matches!(

@tnull tnull requested a review from valentinewallace May 7, 2026 08:47
`PrintableString` is the sanitiser LDK uses to render untrusted strings
(node aliases, BOLT-12 invoice / offer text, `UntrustedString`, LSPS
messages, `lightning-invoice` descriptions) to logs and UI. It only
replaced `char::is_control` matches (Unicode general category `Cc`)
with U+FFFD, leaving the entire `Cf` (Format) category untouched.

That is the exact category covering the bidirectional override /
isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width
characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack
family (CVE-2021-42574): a peer can set its alias / invoice description
/ offer fields to e.g. `safe\u{202E}cipsxe.exe`, which previously
passed through verbatim while a human reader sees `safeexe.cips` —
defeating the threat model `PrintableString` exists to defend against.

Replace `Cf` codepoints alongside `Cc` ones. The `Cf` ranges are
inlined as a `matches!` table sourced from Unicode 16.0 to keep the
change `no_std`-friendly with no new dependencies.

Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
@tnull tnull force-pushed the 2026-05-printable-string-bidi branch from 424a05f to 1a01b5a Compare May 7, 2026 15:00
@tnull
Copy link
Copy Markdown
Contributor Author

tnull commented May 7, 2026

Ah, whoops, forgot to run rustfmt:

diff --git a/lightning-types/src/string.rs b/lightning-types/src/string.rs
index 98055cf64..e45c17d85 100644
--- a/lightning-types/src/string.rs
+++ b/lightning-types/src/string.rs
@@ -106,8 +106,5 @@ mod tests {
                // U+13440 is in the Egyptian Hieroglyph Format Controls block, but its
                // general category is `Mn`, not `Cf`, so the `Cf` range ends at U+1343F.
-               assert_eq!(
-                       format!("{}", PrintableString("x\u{1343F}y\u{13440}z")),
-                       "x\u{FFFD}y\u{13440}z"
-               );
+               assert_eq!(format!("{}", PrintableString("x\u{1343F}y\u{13440}z")), "x\u{FFFD}y\u{13440}z");
        }
 }

@tnull tnull merged commit d12f9ea into lightningdevkit:main May 7, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants