Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interface for BiDi levels #89

Open
alcinnz opened this issue May 30, 2023 · 0 comments
Open

Add interface for BiDi levels #89

alcinnz opened this issue May 30, 2023 · 0 comments

Comments

@alcinnz
Copy link

alcinnz commented May 30, 2023

We are using text-icu in Balkón, an inline layout engine intended for a web browser.

In order to handle bidirectional text, Balkón needs to be able to run the BiDi algorithm for a given input text and retrieve the calculated levels, so that it can break this text into directional runs and pass each of them to HarfBuzz for shaping. For this, Balkón needs the output provided by the ubidi_getLevels() function in the ICU C API. Because Balkón allows associating formatting options and metadata with portions of the input text, and because the output of HarfBuzz has the form of glyphs, we cannot use the high-level reorderParagraph function, which only works on plain text and reorders it in such a way that preserving metadata would be very difficult. Fortunately, the reordering step is very simple to implement, so Balkón can take responsibility for it. We just need the output of the BiDi algorithm after rule I2, before reordering (https://www.unicode.org/reports/tr9/#Reordering_Resolved_Levels).

It would also be useful if Balkón could supply the initial embedding levels to the algorithm, so that direction changes can be dictated by higher level protocols (typically HTML, since this is intended for a web browser) without having to insert explicit formatting characters into the input string (which would complicate working with text offsets for selections). This is permitted by BiDi rule HL3 (https://www.unicode.org/reports/tr9/#HL3). Initial embedding levels can be passed as the embeddingLevels parameter of the ubidi_setPara() function in the ICU C API, which is currently hardcoded as NULL in the Haskell bindings (https://hackage.haskell.org/package/text-icu-0.8.0.2/src/cbits/text_icu.c).

It should also be possible to control the paraLevel parameter of the ubidi_setPara() function, which would typically reflect the direction set on HTML block elements. This is permitted by BiDi rule HL1 (https://www.unicode.org/reports/tr9/#HL1).

A high-level, pure Haskell function providing the required functionality might look something like this:

textLevels :: Word8 -> ByteString -> Text -> ByteString
textLevels paraLevel inputLevels inputText =
   unsafePerformIO $ do
     bidi <- open
     setParaWithLevels bidi inputText paraLevel inputLevels
     getLevels bidi

where setParaWithLevels is a foreign call to ubidi_setPara() including the embeddingLevels parameter, and getLevels is a foreign call to ubidi_getLevels(). Additional code may be necessary to handle memory allocation and deallocation. The levels may be stored in a data type other than ByteString if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant