Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLUID-6554: Handle soft hyphens in content #1010

Merged
merged 8 commits into from
Oct 7, 2020

Conversation

jobara
Copy link
Member

@jobara jobara commented Sep 30, 2020

@jobara
Copy link
Member Author

jobara commented Sep 30, 2020

@cindyli and @amb26 further updates to better handle syllabification with soft hyphens in the original content. Could you please review this PR.

* markup as the separators and split the original textnode at the hpyhenation points. If there are no places to
* inject a hyphenation point, the node will be left unchanged. If the original textnode inclues soft hyphens (i.e.
* ­) characters, these will be used as a hyphenation point with the original location in the text node
* replacded with the `softHyphenPlaceholderMarkup` to allow it to be replaced when syllabification is removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos hpyhenation and replacded. You probably want the last "replaced" to be "restored" for a bit of extra clarity

*
* @param {HypherHyphenator} hyphenator - an instance of a Hypher Hyphenator
* @param {DomNode} node - a DOM node containing text to be hyphenated
* @param {String} separatorMarkup - the markup to be injected and used to indicate the separators/hyphens
* @param {String} softHyphenPlaceholderMarkup - the markup to be injected and used to indicate the original
* location of any soft hyphens inclded in `node`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo inclded

},
strings: {
languageUnavailable: "Syllabification not available for %lang",
patternLoadError: "The pattern file %src could not be loaded. %errorMsg"
},
markup: {
separator: "<span class=\"flc-syllabification-separator fl-syllabification-separator\"></span>"
separator: "<span class=\"flc-syllabification-separator fl-syllabification-separator\"></span>",
softHyphePlaceholder: "<span class=\"flc-syllabification-softHyphenPlaceholder\"></span>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo name softHyphePlaceholder which at least was copy and pasted correctly at line 161 : P

@jobara
Copy link
Member Author

jobara commented Sep 30, 2020

@amb26 my spelling woes continue. Thanks for spotting these. I've pushed up fixes, ready for more review.

* markup as the separators and split the original textnode at the hyphenation points. If there are no places to
* inject a hyphenation point, the node will be left unchanged. If the original textnode inclues soft hyphens (i.e.
* &shy;) characters, these will be used as a hyphenation point with the original location in the text node
* replacded with the `softHyphenPlaceholderMarkup` to allow it to be restored when syllabification is removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacded -> replaced.

var separator = $(separatorMarkup)[0]; // converts the separatorMarkup markup string into a Dom Node
node = node.splitText(seg.length);
node.parentNode.insertBefore(separator, node);
if (node.textContent.startsWith("\u00AD")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem is, when a soft hyphen is in a middle of a word, syllabification breaks the word into 2 syllables at where the soft hyphen is regardless if these 2 sections are proper syllables.

For example, back is one single syllable. When a soft hyphen is in between ba and ck, hypher will mark them as 2 separate syllables, which might be misleading.

I was thinking about removing soft hyphens from texts before running them thru the hypher and inserting soft hyphens back. But the difficult part is how to keep track of positions of those soft hyphens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion I can think of is to simply clone any DOM that the soft hyphens are removed from and hold the original, together with a pointer to its parent, it in a private fragment out of the document context. This can then be swapped back in again when the enactor is turned off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing soft hyphens without inserting them back when the "syllable" setting is on will lead to the issue below that hyphenations are gone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cindyli I can see your point. However, the ability to break on soft hyphens provides a means for an integrator to define where the syllables are split. Unfortunately the hyphenation engine we use it's 100% perfect. If I remember correctly, it's closer to 80% accurate. So providing an ability for an integrator to explicitly define where breaks should happen may be useful. Also for novel words or localization specific pronunciation that we may not cover.

I'd suggest not using soft hyphens where a word wouldn't have a natural hyphenation point. Although I'm not sure if there are reasons that would contradict this nor is it something we could enforce in anyway.

More thoughts/comments welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to imagine that turning on the syllabification enactor represents the user's intention "I would like to substitute UIO's idea for where syllables should break for the original page author's, if any". So long as we restore the markup to its original condition when the enactor is deactivated, I think this will be accepted. Of course it would be desirable if we could somehow incorporate the "hint" from the markup author provided by the soft hyphen but this doesn't seem to be possible with hypher.

if (node.textContent.startsWith("\u00AD")) {
// converts the softHyphenPlaceholderMarkup markup string into a Dom Node
var placeholder = $(softHyphenPlaceholderMarkup)[0];
var newNode = fluid.prefs.enactor.syllabification.insertIntoTextNode(node, placeholder, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue with replacing soft hyphens with something else is hyphenation that indicates the join of the word is gone when the "syllable" setting is turned on:
Screen Shot 2020-10-01 at 4 24 26 PM

If the soft hyphen is still in place, the hyphenation persists when the "syllable" is on:
Screen Shot 2020-10-01 at 4 24 41 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good remarks, @cindyli!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cindyli I've pushed up some changes which add a soft hyphen to the placeholder pseudo element. Please let me know if this addresses the issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cindyli pointed found a bug in the updated implementation if the soft hyphen appears in the last segment, it doesn't get replaced with the placeholder. This is because the algorithm removes the last segment as we don't want to add a separator after it. However, we do need to replace the soft hyphen in all segments.

@jobara
Copy link
Member Author

jobara commented Oct 7, 2020

@cindyli and @amb26 ready for more review.

@cindyli
Copy link
Member

cindyli commented Oct 7, 2020

Could you add tests to make sure all soft hyphens, including when it's on the last syllable segment, are replaced by the soft hyphen placeholder? Thanks.

@cindyli cindyli merged commit d1a0055 into fluid-project:master Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants