Change wrapping lines algorithm #783

diegomura · 2018-02-18T22:07:47Z

Im building react-pdf and I'm using pdfkit under the hood. The library got a lot of popularity so far and there are plenty of users, so I would like to thank you for also making this possible.
However, we now need to support justify paragraphs and words wrapping (support soft-hyphens, non breakable spaces, etc). I know you already support the first point, but not the second, and there are also people asking for it. Also the current paragraph justification is not optimal.

I started to work on a fix to this using the Breaking paragraphs into lines algorithm, by Donald E. Knuth and Michael F. Plass which I think it brings an excellent solution for this. There is already a JS library who implements it here. However, it's crucial to us to know if this library it's still maintained, and if someone will be able to test this implementation and eventually merge it into master. If now, we would be forced to come up with another solution 😄

Thanks for your time and work!

devongovett · 2018-02-20T07:51:28Z

I would definitely recommend doing this somewhere higher in the stack than PDFKit. PDFKit basically immediately writes content to the file as you are adding it, but for justification, especially when multiple styles or fonts can be inlined, you need a multi-pass layout algorithm. You'll need to do the layout first, and then once the final glyphs and their positions are known, send them to PDFKit for rendering.

Text layout is actually a really hard problem - way harder than it seems at first glance. Getting the details right in a way that works for all languages is crazy challenging. Here's the basic text layout algorithm most text systems (like word processors, operating systems, web browsers, etc.) follow:

Split text into paragraphs - the following steps are applied to each paragraph
Get bidi runs and compute paragraph direction - This is the unicode bidirectional algorithm. See http://www.unicode.org/reports/tr9/
Font substitution - check whether the user-defined font actually supports each character the user wants to render. If not, replace with a font that does. This produces "runs" of text in the same font. See https://github.com/devongovett/font-manager for a way to do font substitution using the native OS.
Script itemization - in Unicode, each character is part of a script. Break the text into runs of similar scripts. This data is exposed by https://github.com/devongovett/unicode-properties.
Font shaping - for each run of text, convert characters to glyphs from that font. This can be done using http://github.com/devongovett/fontkit - the library PDFKit already uses.
Line breaking - Using the generated glyph runs for the paragraph, break into lines using the Unicode line breaking algorithm. This can be done using https://github.com/devongovett/linebreak.
Bidi reordering - Using the bidi information computed earlier, reorder the generated glyph runs on each line according to the bidi algorithm.
Apply tab stops - make sure the tab characters on each line are the correct width so that they align with tab stops.
Justification - If justification is enabled, adjust the spacing between each glyph on each line to justify it.
Finalize lines - Apply text-decoration, hanging punctuation, etc.

As you can see, there are a lot of steps here to do text layout correctly in a Unicode friendly way. PDFKit currently implements an extremely basic version of this without a lot of the steps. It basically only works well for unidirectional text in a single font, and you'll hit that limitation very quickly for anything complicated.

I worked on this problem a bit in https://github.com/devongovett/textkit a while ago. I've been meaning to clean that up and release it, but I don't really have time. It's not really finished or well tested at the moment, but if you feel like taking a look at it feel free! Seems like it might be useful for react-pdf and other similar libraries that want to do text layout. Happy to help out - let me know if you have questions or if you want to help take over that code!

diegomura · 2018-02-24T20:22:20Z

Thanks for your answer @devongovett . Was very informative and helpful.
I know text layouting is a very complex subject.
I will definitely check outtextkit and see how I can fit in in my solution.

As I explained, I really need to implement the Knuth and Plass line breaking algorithm for my solution, and based on what you said, the linebreak lib implements the Unicode line breaking algorithm. I'm not an expert of the subject, but I think they do things a bit different. Do you think this can be something we can parametrize in linebreak to support both ways to split lines?

devongovett · 2018-02-27T03:55:18Z

The linebreak library only tells you where in a string of characters it is valid to break a line according to unicode (e.g. on spaces for latin text). Knuth and Plass is a line layout algorithm. It would use something like linebreak to determine the valid breakpoints.

diegomura closed this as completed Feb 24, 2018

devongovett mentioned this issue Oct 11, 2022

Switch to Harfbuzz vercel/satori#83

Open

jlarmstrongiv mentioned this issue Oct 28, 2023

How do you wrap the Text in Skia? Shopify/react-native-skia#968

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change wrapping lines algorithm #783

Change wrapping lines algorithm #783

diegomura commented Feb 18, 2018

devongovett commented Feb 20, 2018

diegomura commented Feb 24, 2018

devongovett commented Feb 27, 2018

Change wrapping lines algorithm #783

Change wrapping lines algorithm #783

Comments

diegomura commented Feb 18, 2018

devongovett commented Feb 20, 2018

diegomura commented Feb 24, 2018

devongovett commented Feb 27, 2018