Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: unicode truncation bug #1089

Merged
merged 35 commits into from
May 12, 2024
Merged

Conversation

joshka
Copy link
Member

@joshka joshka commented May 6, 2024

  • Rewrote the line / span rendering code to take into account how multi-byte / wide emoji characters are truncated when rendering into areas that cannot accommodate them in the available space
  • Added comprehensive coverage over the edge cases
  • Adds a benchmark to ensure perf

Fixes: #1032

EdJoPaTo and others added 2 commits May 2, 2024 19:24
Due to the truncation the non truncation path used to_owned which cloned the line.

Calling an inner function on both cases resolves this as the temporary truncated line lives long enough for that function to end.
This has 3 bugs still I think in the rendering, but these are not panics.
Fixes ratatui-org#1032
This is introduced as a helper for a bug fix and I'm not 100% sure it is
correct (and it should have tests to verify it). So for now, I'm making
it `pub(crate)` to avoid exposing it to the public API.
@joshka
Copy link
Member Author

joshka commented May 6, 2024

I'd suggest that for this PR it's probably worth:

  • fixing up any obvious simplifications / problems I've missed
  • disabling the broken tests (unless there's a really obvious test fix for the right alignment one)
  • merging this and making a 0.26.3 release pretty soon.
  • If there's an easy and obvious way that unicode-truncate helps simplify this, then we should consider that

Note this stems from #1082 credit goes to @EdJoPaTo for making it easier to see a way to make a fix for this.

Side note: there's also a truncate crate, which does various unicode split_at things, but it uses unicode-segmentation instead of unicode-width for calculating positions - there's some subtle incompatibilities with the approaches. The ideas / methods might be useful to consider for unicode-truncate however.

@EdJoPaTo
Copy link
Member

EdJoPaTo commented May 6, 2024

Side note: there's also a truncate crate, which does various unicode split_at things, but it uses unicode-segmentation instead of unicode-width for calculating positions - there's some subtle incompatibilities with the approaches. The ideas / methods might be useful to consider for unicode-truncate however.

I also suggested unicode-segmentation for unicode-truncation as it currently rips apart stuff that belongs together like Emoji combinations (Flags for example): Aetf/unicode-truncate#11

@joshka
Copy link
Member Author

joshka commented May 6, 2024

Side note: there's also a truncate crate, which does various unicode split_at things, but it uses unicode-segmentation instead of unicode-width for calculating positions - there's some subtle incompatibilities with the approaches. The ideas / methods might be useful to consider for unicode-truncate however.

I also suggested unicode-segmentation for unicode-truncation as it currently rips apart stuff that belongs together like Emoji combinations (Flags for example): Aetf/unicode-truncate#11

There's an EXTREMELY lengthy debate on to use / not use segmentation in #75 (ratatui's longest open issue). I don't know which side of that debate has more merit, and it's likely that using unicode-segmentation within the truncation crate would break things in other ways for many terminals / use cases.

src/text/span.rs Outdated Show resolved Hide resolved
@joshka
Copy link
Member Author

joshka commented May 6, 2024

There's an EXTREMELY lengthy debate on to use / not use segmentation in #75 (ratatui's longest open issue). I don't know which side of that debate has more merit, and it's likely that using unicode-segmentation within the truncation crate would break things in other ways for many terminals / use cases.

Also, I forgot that we do use segmentation when doing grapheme splitting, so maybe that is the right abstraction. I'm unsure how exactly to test an edge case on this one without delving a vast amount more into unicode than I want to right now.

Copy link

codecov bot commented May 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.1%. Comparing base (aa4260f) to head (d415fe9).
Report is 1 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #1089   +/-   ##
=====================================
  Coverage   94.1%   94.1%           
=====================================
  Files         61      61           
  Lines      14619   14664   +45     
=====================================
+ Hits       13764   13809   +45     
  Misses       855     855           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joshka

This comment was marked as outdated.

src/text/line.rs Outdated Show resolved Hide resolved
src/text/line.rs Outdated Show resolved Hide resolved
@kdheepak
Copy link
Collaborator

kdheepak commented May 7, 2024

The code is very readable and merging this even with bugs is good to me! I like that there are tests that capture the behavior.

I don't quite understand why this works (haven't had a lot of time to go through it in detail). Maybe the PR description can have a high level description of why this works?

Unrelated to this PR but it would nice to collect links to resources on the developer guide sector of website about Unicode that for uninitiated potential contributors to read.

@joshka joshka force-pushed the jm/line-unicode-truncation branch from 49e7e0c to dda2106 Compare May 7, 2024 05:27
@joshka

This comment was marked as outdated.

src/text/line.rs Outdated Show resolved Hide resolved
src/text/line.rs Outdated Show resolved Hide resolved
src/text/span.rs Outdated Show resolved Hide resolved
src/text/span.rs Outdated Show resolved Hide resolved
Co-authored-by: EdJoPaTo <github@edjopato.de>
joshka and others added 2 commits May 7, 2024 02:19
@EdJoPaTo
Copy link
Member

EdJoPaTo commented May 7, 2024

Added heavily inspired by unicode-truncate code to find the index in order to get the starting index. This allows for taking a reference rather than cloning into an ever-more-allocating String.

I think this would be a great addition for unicode-truncate especially with all the related test cases specific for this.

Some test cases expect " c" with a space in the beginning where the multi width Unicode was.
This requires padding → Owned data → has a bigger performance impact. But it's something for step 2, ideally even solved within unicode-truncate too.

(Mentioning @Aetf here as they might be interested in this discussion)

Did not remove the split_at method which obviously annoys the CI.

Benchmark after the change. (Results on a Raspberry Pi 3 are similar while taking ~10 times the time)

line_render/0           time:   [82.312 ns 82.321 ns 82.329 ns]
                        change: [-13.630% -13.444% -13.279%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 148 outliers among 1000 measurements (14.80%)
  100 (10.00%) low severe
  3 (0.30%) low mild
  16 (1.60%) high mild
  29 (2.90%) high severe
line_render/3           time:   [179.75 ns 179.88 ns 180.00 ns]
                        change: [-26.027% -25.958% -25.891%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 1000 measurements (0.40%)
  3 (0.30%) high mild
  1 (0.10%) high severe
line_render/4           time:   [202.37 ns 202.48 ns 202.60 ns]
                        change: [-24.577% -24.503% -24.432%] (p = 0.00 < 0.05)
                        Performance has improved.
line_render/6           time:   [262.62 ns 262.75 ns 262.88 ns]
                        change: [-20.892% -20.719% -20.522%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 1000 measurements (0.50%)
  3 (0.30%) high mild
  2 (0.20%) high severe
line_render/7           time:   [281.92 ns 282.05 ns 282.20 ns]
                        change: [-20.378% -20.302% -20.212%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 25 outliers among 1000 measurements (2.50%)
  19 (1.90%) high mild
  6 (0.60%) high severe
line_render/10          time:   [359.13 ns 359.72 ns 360.49 ns]
                        change: [-13.931% -13.808% -13.652%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 32 outliers among 1000 measurements (3.20%)
  20 (2.00%) low mild
  5 (0.50%) high mild
  7 (0.70%) high severe
line_render/42          time:   [518.82 ns 518.92 ns 519.03 ns]
                        change: [-1.9234% -1.8597% -1.7990%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 1000 measurements (0.60%)
  1 (0.10%) low mild
  1 (0.10%) high mild
  4 (0.40%) high severe

@EdJoPaTo
Copy link
Member

EdJoPaTo commented May 7, 2024

Some test cases expect " c" with a space in the beginning where the multi width Unicode was.
This requires padding → Owned data → has a bigger performance impact. But it's something for step 2, ideally even solved within unicode-truncate too.

Oh… it actually doesn't in our use case as we can just increase the x += n and continue render further right (Buffer::set_stringn does something similar).

@joshka
Copy link
Member Author

joshka commented May 10, 2024

I'll wait on @EdJoPaTo's approval to merge this

@EdJoPaTo

This comment was marked as outdated.

@EdJoPaTo EdJoPaTo force-pushed the jm/line-unicode-truncation branch from 142295e to 9de47c5 Compare May 10, 2024 10:41
@EdJoPaTo
Copy link
Member

The logic is somewhat horribly complex as a lot of stuff isn't intuitive… The widths are usize and not u16 as the truncation of the end is implicitly done. That resulted in misunderstandings in what should happen. I don't think we can fully explain the code by comments. We need many tests to ensure it actually does what it is expected to do. Lines/Spans longer than u16 for example is a whole new mess which I haven't thought about before.

Also, I refactored the code to iterator logic. I'm fine with reverting it. Benchmarks suggest it's similar in performance. But it might be even harder to follow what is going on?

Copy link
Member

@EdJoPaTo EdJoPaTo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It somewhat worries me that new test cases pop up and break stuff even when we thought we are done and only need documentation stuff. But it's way better than the current state of random panics.

The approach of debug_asserts is somewhat I like here as it ensures that assumptions are actually true and documents these assertions in the process. And they only panic on debug, so production code will only be displaying stuff wrong, not panicking. Also, we have many test cases and users of ratatui will provide more when they run their stuff as dev builds.

So… documentation won't be ever perfect, and I kinda expect new cases that break the current code as Unicode is Unicode. So whoever wants to improve this should have many test cases available. Which is what we achieved here for sure.

So I'm in for merging this in order to have a better state than before. We won't reach a perfect state anyway.

src/text/line.rs Outdated Show resolved Hide resolved
EdJoPaTo and others added 4 commits May 10, 2024 17:03
its 8% to 30% faster in the benchmark
- Remove perf fast path for left aligned lines as the absolute perf diff
  is negligible (30ns on an M2 Mac)
- move the visible spans calculation to a method that returns an
  iterator over the spans that are visible
- replace truncation clippy lints with converions that clamp to u16::MAX
- replace arithmetic_side_effects clippy lint with saturating_sub
- remove debug_asserts
- rewrite comments in the imperative mood
@joshka
Copy link
Member Author

joshka commented May 11, 2024

Some final tweaks:

It's important to look the absolute magnitude of a perf gain, and not just the relative amount. On my M2 mac, removing the fast path costs ~30ns. For an app running at an absolutely absurd 1000FPS, the app would now run at 999.97 FPS. This isn't worth keeping the code to optimize.

  • Remove perf fast path for left aligned lines as the absolute perf diff
    is negligible (30ns on an M2 Mac)
  • move the visible spans calculation to a method that returns an
    iterator over the spans that are visible
  • replace truncation clippy lints with converions that clamp to u16::MAX
  • replace arithmetic_side_effects clippy lint with saturating_sub
  • remove debug_asserts
  • rewrite comments in the precise imperative mood ("when this, then that", rather than "if this then it's necessary / we only need to")

@joshka
Copy link
Member Author

joshka commented May 11, 2024

I'd like to get this released (perhaps Monday if @orhun is up to it).

Direct link to the current file as of the last commit:

impl WidgetRef for Line<'_> {

It somewhat worries me that new test cases pop up and break stuff even when we thought we are done and only need documentation stuff. But it's way better than the current state of random panics.

I'm not too worried - most text that's rendered in the context of a tui app isn't > u16::MAX, and when it is the resultant bug is easy to fix by pre-trimming the text instead of doing this in the rendering. (There's an actual bug in turbo that is similar to this problem, but with Rect::area).

The logic is somewhat horribly complex as a lot of stuff isn't intuitive… The widths are usize and not u16 as the truncation of the end is implicitly done. That resulted in misunderstandings in what should happen. I don't think we can fully explain the code by comments. We need many tests to ensure it actually does what it is expected to do. Lines/Spans longer than u16 for example is a whole new mess which I haven't thought about before.

The complexity of the implementation is now pretty much aligned with the inherent complexity of the problem, and it's documented well enough.

The approach of debug_asserts is somewhat I like here as it ensures that assumptions are actually true and documents these assertions in the process. And they only panic on debug, so production code will only be displaying stuff wrong, not panicking. Also, we have many test cases and users of ratatui will provide more when they run their stuff as dev builds.

I replaced these with infallible code. I don't like debug_asserts at all, especially in a library. When you're wrong and they fail, you're slapping your user's users with a pain point which is hard to report, and which takes a release of both projects in order to fix.

@EdJoPaTo
Copy link
Member

EdJoPaTo commented May 11, 2024

I'm not too worried - most text that's rendered in the context of a tui app isn't > u16::MAX, and when it is the resultant bug is easy to fix by pre-trimming the text instead of doing this in the rendering.

I am not worried about the actual bug that came up. I am more worried that 3 people looked over it for days and missed yet another bug.

I replaced these with infallible code.

What I liked about it were the assumptions actually tested. We wrote code with assumptions which are now hidden behind infallible code. But with all the tests, fine by me.

Remove perf fast path for left aligned lines as the absolute perf diff is negligible (30ns on an M2 Mac)

I am not at all worried about way overpowered hardware. I know that mqttui is used on Raspberry Pi, personally I regularly use mqttui on Raspberry Pi 1.

Line is one of the core constructs used by a lot of things. Relatively small improvements will impact a lot of code depending on it. So it's way more significant to improve the performance here. And also the Alignment::Left is the default for most things so it has the most impact.

Even when it's not about lower power devices or hardware that is years old (My main device is ~9 years old now). It saves CPU cycles and therefore battery life.

src/text/line.rs Show resolved Hide resolved
src/text/line.rs Outdated Show resolved Hide resolved
src/text/line.rs Outdated Show resolved Hide resolved
@EdJoPaTo
Copy link
Member

To pick even more on the performance fast path of Alignment::Left on slower devices…

Let's imagine a table rendered on a 100x100 terminal. Assume 5 columns and 90 rows. The table cell used Text, so many Lines. More cells vs. more lines per cells are probably irrelevant.

90 * 5 means 450 visible Cells to be rendered. The worst case time in the benchmark below is the Buffer wider than the Line width with 9.7 µs. Let's assume 10 µs for ease of calculation.
10 µs * 450 = 4.5 ms. 60 FPS are 16.66 ms. So we took 1/4 of that for the rendering of lines alone.

And this is a Raspberry Pi 2 benchmark, I use mqttui on Raspberry Pi 1 regularly.

Even an improvement of 12% is really good here already. The benchmark goes to a regression of 36%. So this is definitely significant. For the default of left alignment which is the most often used one.

Benchmark on Raspberry Pi 2:

line_render/Left/3      time:   [2.7422 µs 2.7427 µs 2.7433 µs]
                        change: [+34.945% +36.396% +37.329%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 68 outliers among 1000 measurements (6.80%)
  48 (4.80%) high mild
  20 (2.00%) high severe
line_render/Left/4      time:   [2.9284 µs 2.9290 µs 2.9297 µs]
                        change: [+33.260% +33.549% +33.885%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 1000 measurements (2.30%)
  13 (1.30%) high mild
  10 (1.00%) high severe
line_render/Left/6      time:   [4.7153 µs 4.7158 µs 4.7164 µs]
                        change: [+23.409% +24.391% +24.993%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 63 outliers among 1000 measurements (6.30%)
  30 (3.00%) high mild
  33 (3.30%) high severe
line_render/Left/7      time:   [4.9508 µs 4.9513 µs 4.9518 µs]
                        change: [+23.962% +24.193% +24.427%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 44 outliers among 1000 measurements (4.40%)
  24 (2.40%) high mild
  20 (2.00%) high severe
line_render/Left/10     time:   [7.0004 µs 7.0016 µs 7.0030 µs]
                        change: [+16.254% +16.648% +16.980%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 46 outliers among 1000 measurements (4.60%)
  21 (2.10%) high mild
  25 (2.50%) high severe
line_render/Left/42     time:   [9.7193 µs 9.7207 µs 9.7224 µs]
                        change: [+12.279% +12.559% +12.887%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 131 outliers among 1000 measurements (13.10%)
  57 (5.70%) low mild
  35 (3.50%) high mild
  39 (3.90%) high severe

Benchmark of Raspberry Pi 4: regression of 14.3 to 38.7%. Worst case takes ~2.03 µs, so ~5 times faster than Pi 2.

I will not benchmark this on Pi 1 as running the benchmarks on Pi 4 take ~10 min, on Pi 2 ~40 min. Pi 1 is single core in comparison to Pi 2 with 4 cores.

@joshka
Copy link
Member Author

joshka commented May 12, 2024

10 µs * 450 = 4.5 ms. 60 FPS are 16.66 ms

Let's look at the absolute magnitude of the change though: (10 * .125) = 1.25us per call

1.25*450 = 0.563ms, so each frame now takes (16.667+0.563) = 17.23ms = 58 FPS. That's measurable sure, but rarely directly noticeable. Assuming a factor of 4 on the Pi 1, 18.92ms = 53 FPS.

One thing I did note when removing the fast path was that the slow paths got slightly faster (1-4%), so the tradeoff is one that doesn't necessarily have a definitively good answer.

For now, I'd like to keep it simplified, and keep this PR about fixing the bug and not trying to eek out all the performance of this. You've convinced me that there are some parts of this that matter to you, so let's continue the discussion about perf in an issue / forum topic if it's more a discussion.

Pi 1 is single core in comparison to Pi 2 with 4 cores.

This raises an interesting point. I wonder (not for this PR), what parts of our rendering pipeline might be able to be done more in parallel.

@joshka joshka dismissed EdJoPaTo’s stale review May 12, 2024 02:28

Let's move the perf discussion to another Issue for discussion later

@joshka joshka merged commit 699c2d7 into ratatui-org:main May 12, 2024
33 checks passed
@joshka joshka deleted the jm/line-unicode-truncation branch May 12, 2024 02:28
joshka added a commit to nowNick/ratatui that referenced this pull request May 24, 2024
- Rewrote the line / span rendering code to take into account how
multi-byte / wide emoji characters are truncated when rendering into
areas that cannot accommodate them in the available space
- Added comprehensive coverage over the edge cases
- Adds a benchmark to ensure perf

Fixes: ratatui-org#1032
Co-authored-by: EdJoPaTo <rfc-conform-git-commit-email@funny-long-domain-label-everyone-hates-as-it-is-too-long.edjopato.de>
Co-authored-by: EdJoPaTo <github@edjopato.de>
ymgyt added a commit to ymgyt/syndicationd that referenced this pull request May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panicked in Line::truncated method due to byte index not being a char boundary
3 participants