Improve word wrap #37

eprovst · 2020-01-12T09:25:00Z

The current word wrap algorithm works for most cases, however it's fairly clunky code and in some very rare cases it inserts a hyphen where it probably shouldn't or does something else weird.

rnkn · 2020-01-13T08:12:09Z

Are you doing hyphenation? Might be better to save your time if you are. As far as I'm aware hyphenation is avoided in scripts (I think because it'd make for stilted line readings).

eprovst · 2020-01-13T09:01:41Z

Yes and no.

Currently Wrap breaks lines by a greedy algorithm (at some point I might look at something similar to TeX's algorithm but screenplays are very constrained so I'm guessing that wouldn't give much better results).

First we collect cells (which are parts of the source text with certain markup, and are already grouped into lines as in the source file) until the line is full.

If there are too many characters on the line we search for a breakpoint in the last cell. A breakpoint is a Unicode space or a hyphen.

If we don't find a breakpoint we break at the upper limit the line length allows. If there are two letters here we insert a hyphen (most likely in a place where we shoudn't), otherwise we don't (but we probably broke a number in two).

There's a number of issues here:

The best point to break is often in the cell before the current cell, cases like:
... end of long line compound_word._ where the ideal breakpoint is between line and compoundword but Wrap breaks compound and word in two, which might even be a mistake in some cases.
panic: index out of range error with underscore, full stop, eol #36 was a special case of this problem, where the last cell had length one. Wrap now backtracks one cell in that case, but ideally we would look to see if the start of the cell is a valid breakpoint before we break there and backtrack as needed.
The current break possibilities are incomplete and even plain wrong (a non-breaking space is also a Unicode space).
Currently we don't support different line lengths throughout a set of lines. Not a big issue but it does mean that the first line of a parenthetical currently get one character less than it should because of it's indentation.
When we're forced to break the line at the limit a thing that could be checked first, is if there is another 'less ideal' breakpoint, eg. the first non letter from the right.

A similar set of issues exist for page breaks. Although we don't do anything wrong there, there are only better selection criteria which would try harder to avoid page breaks which break within an element, etc. (see #38)

rnkn · 2020-01-13T10:19:09Z

Thanks, that’s an impressively comprehensive overview!

eprovst · 2021-04-17T20:17:51Z

Started working on this on the linewrap branch. As of now only 3. is fixed.

eprovst self-assigned this Jan 12, 2020

eprovst added module/pdf priority/high type/bug type/enhancement priority/medium and removed priority/high type/bug labels Jan 12, 2020

eprovst mentioned this issue Jan 13, 2020

Roadmap #33

Open

16 tasks

eprovst mentioned this issue Apr 16, 2021

Avoid breaking "--" across lines #49

Closed

eprovst mentioned this issue Jun 29, 2021

Linewrap #51

Merged

eprovst closed this as completed in #51 Jun 29, 2021

eprovst mentioned this issue Aug 20, 2021

Request option: turn off PDF hyphenation #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve word wrap #37

Improve word wrap #37

eprovst commented Jan 12, 2020 •

edited

Loading

rnkn commented Jan 13, 2020

eprovst commented Jan 13, 2020 •

edited

Loading

rnkn commented Jan 13, 2020

eprovst commented Apr 17, 2021

Improve word wrap #37

Improve word wrap #37

Comments

eprovst commented Jan 12, 2020 • edited Loading

rnkn commented Jan 13, 2020

eprovst commented Jan 13, 2020 • edited Loading

rnkn commented Jan 13, 2020

eprovst commented Apr 17, 2021

eprovst commented Jan 12, 2020 •

edited

Loading

eprovst commented Jan 13, 2020 •

edited

Loading