-
Notifications
You must be signed in to change notification settings - Fork 710
Description
The current documentation clearly states what UNSAFE_TO_BREAK (or it's absence) can be interpreted to mean. However, it should also be documented as to what it is not, since it's very easy to be lured into a false sense of security with it.
The current behavior (as I understand it) is that one can shape a sequence of code points into glyphs, pick any two safe to break locations in the resulting buffer, reshape the associated sub-sequence of code points in isolation and get the same buffer output as is seen between the original two safe to break locations. Essentially, any pair of safe to break locations is stating the the larger context of the original sequence of code points did not affect the shaping of that run. (This is actually a bit stronger of a statement than the current documentation implies, please correct if this is not true.)
However, it is important to note that the inverse is not true, it is not the case that such a run of code points will always shape this way; any modification to the original sequence of code points can arbitrarily modify the overall shaping. For instance I can make a font which creates ligatures of all the sentences in the works of Shakespeare such that "There is a written scroll" will come back with many safe to break locations, but "There is a written scroll!" will shape into a single glyph which visually represents the icon for the Prince of Morocco. In other words, adding to or modifying any of the original sequence of code points invalidates everything about the shaping before doing so (including any safe to breaks).
In order to avoid requiring complete re-shaping when editing (especially simply adding to the end of a paragraph), it is necessary to let the user know the maximum affected area. For OpenType this is the purpose of OS2V2Tail::usMaxContext, in theory even TrueType knows the answer to this question for at least appending (keep track of the last ground state). With this sort of information it should be possible for a sophisticated user to only re-shape a portion of the overall sequence of code points by dirtying the maximum context area and finding the safe to break locations surrounding that area. The pathological Shakespeare font I propose above would obviously have quite a large value for the maximum context, but for most fonts this value is three or less.
In between these two cases (breaking at safe to break locations and mutating the sequence of code points) there is the case of determining the extent of re-shaping if truncating the original sequence of code points. Is it enough to find some safe to break location before the truncation and shape only the remaining portion? I cannot construct an example to answer in the negative, but it is not obvious that this is always allowable either. I think this is equivalent to asking if any shaping operation can operate by negative look ahead in a way which doesn't affect safe to break. With the maximum context information it seems this could be at least bounded, but it would be nice to know if this common case can be handled even more efficiently (may be common when line breaking).
With all of that as background, the two actual requests here are:
Can general and/or end of shaping maximum context information be exposed to reduce re-shaping when editing?
Can anything be assumed about code point sequence truncation and safe to break information?