-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[unic-bidi] API concerns #273
Comments
This comment has been minimized.
This comment has been minimized.
Agreed, this would be worth discussing. See also: kas-gui/kas-text#20 Two tools are needed here in my opinion:
|
I'm curious why you need the mirroring. Shouldn't HarfBuzz take care of that for you as part of its RTL shaping? It's entirely possible there's something I'm missing though. |
HarfBuzz does take care of it, but I also maintain a simple (Kerning only) alternative, and that uses the mirroring. |
Thanks for the detailed report, @raphlinus; and others for valuable comments. The state of Bidi Algorithm impl in Sharing some context: I put together the To give you a super-quick update on the my current focus on UNIC (at least for the next few weeks) is to get it back on track for forward-development work, with the goal to get a proper Text API in place, supporting segmentation, bidi and joining control. (From there, would be possible to get to line-break, as well. I haven't plans for myself for that, yet.) One of the changes needed on this road for me, is to split the "explicit bidi run resolution" part of the Bidi Algorithm working under And, of course, there's a lot more work to be done for Bidi impl here, in terms of conformance, and performance (specifically, getting to linear or near-linear impl, possibly under Now, whether this project is a good place for such work? Well, I have been open about my personal agenda (which is focus on Multilingual Text API), and others have been contributing per their own goals and agenda, sharing what they're comfortable with. I understand well that there's been some confusions because of my lack of resources to get to work areas that others are interested in, and there's not really much I can do about that. My attempt is to push for an initiative where everyone can participate, and every language gets a fair share of support across all the layers of the technology built. I would be more than happy to hear your thoughts on this. :) |
From my point of view, |
In order to support RTL/BIDI, wezterm needs a bidi implementation. I don't think a well-conforming rust implementation exists today; what I found were implementations that didn't pass 100% of the conformance tests. So I decided to port "bidiref", the reference implementation of the UBA described in http://unicode.org/reports/tr9/ to Rust. This implementation focuses on conformance: no special measures have been taken to optimize it so far, with my focus having been to ensure that all of the approx 780,000 test cases in the unicode data for unicode 14 pass. Having the tests passing 100% allows for making performance improvements with confidence in the future. The API isn't completely designed/fully baked. Until I get to hooking it up to wezterm's shaper, I'm not 100% sure exactly what I'll need. There's a good discussion on API in open-i18n/rust-unic#273 that suggests omitting "legacy" operations such as reordering. I suspect that wezterm may actually need that function to support monospace text layout in some terminal scenarios, but regardless: reordering is part of the conformance test suite so it remains a part of the API. That said: the API does model the major operations as separate phases, so you should be able to pay for just what you use: * Resolving the embedding levels from a paragraph * Returning paragraph runs of those levels (and their directions) * Returning the whitespace-level-reset runs for a line-slice within the paragraph * Returning the reordered indices + levels for a line-slice within the paragraph. refs: #784 refs: kas-gui/kas-text#20
I'm looking at using unic-bidi for a contemplated text layout project. For this discussion, consider the primary goal to be adding BiDi support to a (hypothetical) in-Rust implementation of text layout conforming to Piet's TextLayout trait. I see a number of concerns.
As a higher level process issue, one way to address those concerns is to fork unic-bidi. But I see value in trying to maintain a single source of truth for BiDi in the Rust ecosystem, so it feels like working with upstream is a good way to do that. That said, #272 is pretty good evidence to me, in addition to reverse dependencies (note that the RustPython reverse dep has actually recently been removed) that nobody is actually using unic-bidi, especially within the context of paragraph level text formatting.
Now for my concerns.
A big one is the
&'text
lifetime requirement onBidiInfo
. That precludes retaining BidiInfo as part of our text layout object. One pattern for fixing this is to change the type parameter from<'text>
toAsRef<str>
, which would let the text be borrowed or owned, at the caller's request. Another way to fix it is to simply clone the string (it's not as if the implementation is particularly clone-averse). But I'm not convinced we actually need to retain the string. If you look at the methods on BidiInfo, I'm not surereorder_line()
is actually still valid in a modern environment; if my understanding is correct, HarfBuzz always wants its input in logical order. The string is also used inreorder_levels_per_char()
to find the codepoint boundaries (I'm not sure this function is useful), and invisual_runs()
, again mostly to find codepoint boundaries, but I think the "reset" logic can be run entirely fromoriginal_classes
without making reference to the text.Other concerns center around performance. I'm not advocating doing a lot of performance work now, but I also believe that the current API locks in certain decisions that make such performance work difficult in the future. That's mostly around the per-byte (technically, per UTF-8 code unit) arrays to represent things like levels. I think these don't represent the way people typically work with text in Rust, and in any case there are about twice as many UTF-8 code units as UTF-16 for most BiDi work.
So what I'd like better is for
visual_runs
to return an array of runs, where each run is a range in the text and a level (the latter would be after the reset logic is applied). I think the various "reorder*" functions should go, as I don't see them as being particularly useful, and if they are needed, I think their function can be written quite reasonably in terms of this proposedvisual_runs
.Lastly, the stuff around
ParagraphInfo
seems crufty to me. Why is this passed in? It's possible to infer from the text position. Taking a step back, it's not clear to me that segmenting into paragraphs is a useful function for unic-bidi in the first place. It seems extremely likely that its client will have already done segmentation into paragraphs by the time it gets to BiDi analysis. If the goal of the crate is to cover all of UAX#9, then perhaps paragraph segmentation can be provided through a separate API. Then the scope ofBidiInfo
could be narrowed to a single paragraph, which I think would be a desirable simplification.I post this in part to provoke discussion from potential other clients of the crate, and also to test whether unic is a good container for the work.
The text was updated successfully, but these errors were encountered: