Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve word wrap #37

Closed
eprovst opened this issue Jan 12, 2020 · 4 comments · Fixed by #51
Closed

Improve word wrap #37

eprovst opened this issue Jan 12, 2020 · 4 comments · Fixed by #51

Comments

@eprovst
Copy link
Owner

eprovst commented Jan 12, 2020

The current word wrap algorithm works for most cases, however it's fairly clunky code and in some very rare cases it inserts a hyphen where it probably shouldn't or does something else weird.

@rnkn
Copy link

rnkn commented Jan 13, 2020

Are you doing hyphenation? Might be better to save your time if you are. As far as I'm aware hyphenation is avoided in scripts (I think because it'd make for stilted line readings).

@eprovst
Copy link
Owner Author

eprovst commented Jan 13, 2020

Yes and no.

Currently Wrap breaks lines by a greedy algorithm (at some point I might look at something similar to TeX's algorithm but screenplays are very constrained so I'm guessing that wouldn't give much better results).

First we collect cells (which are parts of the source text with certain markup, and are already grouped into lines as in the source file) until the line is full.

If there are too many characters on the line we search for a breakpoint in the last cell. A breakpoint is a Unicode space or a hyphen.

If we don't find a breakpoint we break at the upper limit the line length allows. If there are two letters here we insert a hyphen (most likely in a place where we shoudn't), otherwise we don't (but we probably broke a number in two).

There's a number of issues here:

  1. The best point to break is often in the cell before the current cell, cases like:
    ... end of long line compound_word._ where the ideal breakpoint is between line and compoundword but Wrap breaks compound and word in two, which might even be a mistake in some cases.
    panic: index out of range error with underscore, full stop, eol #36 was a special case of this problem, where the last cell had length one. Wrap now backtracks one cell in that case, but ideally we would look to see if the start of the cell is a valid breakpoint before we break there and backtrack as needed.
  2. The current break possibilities are incomplete and even plain wrong (a non-breaking space is also a Unicode space).
  3. Currently we don't support different line lengths throughout a set of lines. Not a big issue but it does mean that the first line of a parenthetical currently get one character less than it should because of it's indentation.
  4. When we're forced to break the line at the limit a thing that could be checked first, is if there is another 'less ideal' breakpoint, eg. the first non letter from the right.

A similar set of issues exist for page breaks. Although we don't do anything wrong there, there are only better selection criteria which would try harder to avoid page breaks which break within an element, etc. (see #38)

@eprovst eprovst mentioned this issue Jan 13, 2020
16 tasks
@rnkn
Copy link

rnkn commented Jan 13, 2020

Thanks, that’s an impressively comprehensive overview!

@eprovst
Copy link
Owner Author

eprovst commented Apr 17, 2021

Started working on this on the linewrap branch. As of now only 3. is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants