Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(common/models): mid-context suggestions & reversions, fix(common/models): correction-search SMP issues #4427

Merged
merged 19 commits into from
Feb 26, 2021

Conversation

jahorton
Copy link
Contributor

@jahorton jahorton commented Feb 5, 2021

While working on #4411, I noticed that our predictive text has, to this point, often assumed that it will always be operating at the end of the current context. This PR seeks to round out that rough edge and provide support for mid-context scenarios:

  • accepting suggestions at the end of a word in the middle of the context (easy)
  • reverting such a suggestion
  • accepting suggestions in the middle of a word/token
  • reverting suggestions that were accepted mid-word/token (hard)

Support won't be completely perfect, but it's a definite upgrade from how things were before. The main issue: the caret will always be placed at the end of text affected by a reversion, even if it was originally before some of the reverted characters. (Because a suggestion triggered right-deletions.)

Also note that no post-caret text will actually be used by the predictive text engine, same as before.

jahorton added 2 commits February 8, 2021 09:31
…redictions' into feat/common/models/mid-context-suggestions
…redictions' into feat/common/models/mid-context-suggestions
Comment on lines 354 to 360
if numCharsToRightDelete > 0 {
for _ in 0..<numCharsToRightDelete {
textDocumentProxy.adjustTextPosition(byCharacterOffset: 1)
textDocumentProxy.deleteBackward()
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be done more properly. It was sufficient for initial testing, at least.

Unfortunately, there is no deleteForward command on that object, hence the text-position shenanigans.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to adjustTextPosition(byCharacterOffset: numCharsToRightDelete)?

I am guessing this needs some real testing for interactions with clusters, SMP because it's likely that the cursor cannot be positioned in the middle of a cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it's possible to do that... but it may make any SMP-double-checking even rougher.

I'll definitely need to reference this block:

for _ in 0..<numCharsToDelete {
let oldContext = textDocumentProxy.documentContextBeforeInput ?? ""
textDocumentProxy.deleteBackward()
let newContext = textDocumentProxy.documentContextBeforeInput ?? ""
let unitsDeleted = oldContext.utf16.count - newContext.utf16.count
if unitsDeleted > 1 {
if !InputViewController.isSurrogate(oldContext.utf16.last!) {
let lowerIndex = oldContext.utf16.index(oldContext.utf16.startIndex,
offsetBy: newContext.utf16.count)
let upperIndex = oldContext.utf16.index(lowerIndex, offsetBy: unitsDeleted - 1)
textDocumentProxy.insertText(String(oldContext[lowerIndex..<upperIndex]))
}
}

Noting the step-by-step deletion procedure it follows... and the need to compare/contrast Swift's preferred UTF-8 representation and UTF-16, it's probably best to go step-by-step here too.

@jahorton
Copy link
Contributor Author

jahorton commented Feb 8, 2021

Android will need a few tweaks to support right-deletions in order to better support mid-token suggestion acceptance - but only in-app. Oddly, it's already supported for the system keyboard... just not the in-app one. Huh.

@darcywong00 Might need your help on this one. I can identify the system-keyboard block that handles this pretty easily:

// Perform right-deletions
for (int i = 0; i < dr; i++) {
CharSequence chars = ic.getTextAfterCursor(1, 0);
if (chars != null && chars.length() > 0) {
char c = chars.charAt(0);
SystemKeyboardShouldIgnoreSelectionChange = true;
if (Character.isHighSurrogate(c)) {
ic.deleteSurroundingText(0, 2);
} else {
ic.deleteSurroundingText(0, 1);
}
}
}

There simply is no equivalent for the in-app, and the surrounding code is remarkably different between the two cases. All the in-app one does:

if(dr != 0) {
Log.d(TAG, "Right deletions requested but are not presently supported by the in-app keyboard.");
}

I'd prefer not to make things worse there than they already are, hence the call for help here.

@jahorton jahorton marked this pull request as ready for review February 8, 2021 06:29
@darcywong00
Copy link
Contributor

There simply is no equivalent for the in-app, and the surrounding code is remarkably different between the two cases. All the in-app one does:

I hope you've got a time machine cause you wrote both right-deletion blocks in #1732 😄

@jahorton
Copy link
Contributor Author

jahorton commented Feb 8, 2021

There simply is no equivalent for the in-app, and the surrounding code is remarkably different between the two cases. All the in-app one does:

I hope you've got a time machine cause you wrote both right-deletion blocks in #1732 😄

And I can see why I didn't - take a look at how different the rest of the insertText code is for in-app vs system! Were it up to me, I'd refactor one of the two to use the other's approach if possible.

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the improvements in general. I have a few questions, and a concern around testing on iOS (presumably similar on Android although that may be coming later?)

common/predictive-text/worker/model-compositor.ts Outdated Show resolved Hide resolved
Comment on lines 354 to 360
if numCharsToRightDelete > 0 {
for _ in 0..<numCharsToRightDelete {
textDocumentProxy.adjustTextPosition(byCharacterOffset: 1)
textDocumentProxy.deleteBackward()
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to adjustTextPosition(byCharacterOffset: numCharsToRightDelete)?

I am guessing this needs some real testing for interactions with clusters, SMP because it's likely that the cursor cannot be positioned in the middle of a cluster.

@@ -16,7 +16,7 @@ protocol KeymanWebDelegate: class {
/// - Parameters:
/// - numCharsToDelete: The number of UTF-16 code units to delete before inserting the new text.
/// - newText: The string to insert.
func insertText(_ keymanWeb: KeymanWebViewController, numCharsToDelete: Int, newText: String)
func insertText(_ keymanWeb: KeymanWebViewController, numCharsToLeftDelete: Int, newText: String, numCharsToRightDelete: Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have kinda liked this to have a different parameter order:

Suggested change
func insertText(_ keymanWeb: KeymanWebViewController, numCharsToLeftDelete: Int, newText: String, numCharsToRightDelete: Int)
func insertText(_ keymanWeb: KeymanWebViewController, numCharsToLeftDelete: Int, numCharsToRightDelete: Int, newText: String)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Web:

/**
* @param {number} dn Number of pre-caret characters to delete
* @param {string} s Text to insert
* @param {number=} dr Number of post-caret characters to delete
*/
['oninserttext']: (dn: number, s: string, dr?: number) => void;

if(keyman.isEmbedded) {
// A special embedded callback used to setup direct callbacks to app-native code.
keyman['oninserttext'](ruleTransform.deleteLeft, ruleTransform.insert, ruleTransform.deleteRight);

From Android:

// This annotation is required in Jelly Bean and later:
@JavascriptInterface
public void insertText(final int dn, final String s, final int dr) {

That said, both of these arose during 14.0 - there's no evidence of them in 13.0. In their original versions, they closely matched the iOS function above: deleteLeft, then newText.

Admittedly, the order was chosen b/c it's a new parameter, and new things - especially potentially-undefined parameters (b/c JS/TS) - go on the right-hand side. That said, there is a beauty to the order:

abc de | fg hij

With respect to the caret, we first delete-left, then insert the text at the caret's position, then delete-right if needed after the caret. Though, I suppose that temporal order doesn't particularly matter - it's largely transitive as long as delete-lefts come before text insertion. It does make sense with spatial order, though.

That said... I don't mind changing it... but only if we also change the parameter order in those locations. And I think that's best left to a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay with that refactor being a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked as #4529.

// Use it when we're ready to implement that.
// Our .insertText will need to be adjusted accordingly.
_ = Int(fragment[drRange.upperBound...])!
let dr = Int(fragment[drRange.upperBound...])!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let dr = Int(fragment[drRange.upperBound...])!
let numCharsToRightDelete = Int(fragment[drRange.upperBound...])!

Perhaps could rename numCharsToDelete to numCharsToLeftDelete as well?

if(postContextTokenization) {
// Handles display string for reversions triggered by accepting a suggestion mid-token.
revertedPrefix = postContextTokenization.left[postContextTokenization.left.length-1];
revertedPrefix += postContextTokenization.caretSplitsToken ? postContextTokenization.right[0] : '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be certain that postContextTokenization.right always contains at least one element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If postContextTokenization.caretSplitsToken == true, yes. Otherwise, no.

Comment on lines +524 to +528
suggestions.forEach(function(suggestion) {
// A reversion's transform ID is the additive inverse of its original suggestion;
// we revert to the state of said original suggestion.
suggestion.transformId = -reversion.transformId;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is repeated on linees 565-569, do you want to extract it into a function?

@@ -255,7 +255,7 @@ namespace com.keyman.text.prediction {
// the input will be automatically rewound to the preInput state.
transform: original.transform,
// The ID part is critical; the reversion can't be applied without it.
transformId: original.token, // reversions use the additive inverse.
transformId: -original.token, // reversions use the additive inverse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the significance of this change. Is it a bug fix -- the comment suggests as such? Or is it just to support the other changes you are making now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug fix. I think at some point, the intent was to use the base ID (which already used the additive inverse), but transformID is the field in active use for ID checks now. Not sure when reversions broke. Anyway, that comment was very important in figuring out why things were broken.

Base automatically changed from fix/common/models/context-reset-predictions to beta February 9, 2021 08:46
@jahorton
Copy link
Contributor Author

jahorton commented Feb 9, 2021

Right now, iOS seems covered, even for SMP cases... except for reverting suggestions that were accepted mid-word. (The reversions aren't showing up at the moment; it's related to async [sadly] context-reset ops within the iOS Keyman engine.)

Android in-app also still needs work.

@keyman-server
Copy link
Collaborator

keyman-server commented Feb 11, 2021

Changes in this pull request will be available for download in Keyman version 14.0.242-beta

@jahorton
Copy link
Contributor Author

Well... thanks, Apple:

Screen Shot 2021-02-12 at 8 14 01 AM

Screen Shot 2021-02-12 at 8 13 51 AM

We were definitely right to be concerned about how clusters would be handled. For those who can't read Khmer script, that's a four-character jump. (Three of them visible.)

So, my assumption later in the loop for repositioning the caret is incorrect, causing issues on later loop iterations.

@jahorton
Copy link
Contributor Author

Okay, I've got a fair bit of the core worked out there, though there's something really weird going on now. There seems to be a desync between the text-manipulation method and what actually gets output - of course, only when right-deletions are happening.

So, let's take this as our starting point:

Screen Shot 2021-02-12 at 9 45 36 AM

I've confirmed via temporary debug-log statements that this, according to the textDocumentProxy object used for text manipulation, has an expected final context of ម្រាយ បន្ស៊ី . Exactly what a user would expect. So, of course, what do we get?

Possibility 1

Screen Shot 2021-02-12 at 9 08 02 AM

Uh... that's not what textDocumentProxy told us we'd get. The heck?

Possibility 2

(Note: these screenshots were taken from a clean context, rather than with the English text present at the start.)

Screen Shot 2021-02-12 at 9 55 08 AM

Uh... what, mate? Didn't even do anything?

Turns out, it actually did. If you hit BKSP, it'll remove the hidden 'subconsonant' marker. Alternatively, if you reselect the same suggestion again...

Screen Shot 2021-02-12 at 9 55 16 AM

And again...

Screen Shot 2021-02-12 at 9 55 21 AM

So... the true result incrementally inches closer to the desired suggestion. Wha?


Again, note that in both cases, the actual text-manipulation handling itself computes the correct text immediately, and even the textDocumentProxy confirms this. Something is interfering with this process. The question is... is there some yet-undiscovered bug in our code that has only appeared now, and only for right-deletions at that... or is it an Apple-side bug?

There's also the fact that the result isn't even 100% predictable, as noted by the two variations seen above!

@jahorton
Copy link
Contributor Author

Since the iOS engine is having trouble with right-deletions and a resolution is proving tricky, I've gone ahead and turned off predictive text's right-deletions for now. Instead, any suggestions accepted mid-token will insert a standard word-break afterward. (Note: this is a perfect match for the behavior of iOS's default predictive text; it doesn't right-delete.)

I can simply add the right-deletion aspect as a 'feature request' for the future, allowing us to revisit it at another time.

I have tried a few other approaches, and one seemed to get remarkably close much of the time... the issue being that it also gave way worse results some of the other times. So... yeah, not changing it over until it's stable.

@mcdurdin
Copy link
Member

I can simply add the right-deletion aspect as a 'feature request' for the future, allowing us to revisit it at another time.

Sounds good to me. Have you opened an issue for this yet?

@jahorton
Copy link
Contributor Author

I can simply add the right-deletion aspect as a 'feature request' for the future, allowing us to revisit it at another time.

Sounds good to me. Have you opened an issue for this yet?

It's now up as #4538.

@jahorton
Copy link
Contributor Author

jahorton commented Feb 26, 2021

Code related to the new issue (for the deferred right-deletion functionality) has been split off into #4541.

Note that the most recent commit here (which reverted them for this PR) was hand-written, with #4541's first commit a reversion of that.

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Appreciate you going the extra mile on this one -- it's been a bit of a challenge to get it right given the trickiness of the iOS API around right-deletion.

If I was going to be really nitpicky, I'd suggest reverting the whitespace only change in InputViewController.swift so that we have no changes to the Swift code at all, but it's pretty unimportant!

@jahorton jahorton merged commit c7cf3d8 into beta Feb 26, 2021
@jahorton jahorton deleted the feat/common/models/mid-context-suggestions branch February 26, 2021 05:11
@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 14.0.248-beta

@MakaraSok
Copy link
Collaborator

Retest on Android 10 (on both emulator and physical device) based on #4427 (comment):

  • "accepting suggestions in the middle of a word/token" does not delete the post-half of the word, even though this is the intention, it is not quite helpful because the post-half is not intelligible and has to be manually delete anyway.

  • I like the ability to switch back and forth to the suggested word automatically when tapped on. For Khmer language, it seems like there is no space after the word after a suggestion is chosen in a new line:

    Khmer Angkor - in the first line, a space is seen after a suggestion is chosen; in the second line, it takes three taps on the spacebar to get a regular spacẹ
    https://www.youtube.com/watch?v=VcaL1R7X-L8

    EuroLatin (SIL) - no space after the chosen suggestion, it takes two taps on the spacebar to output a regular space
    https://youtu.be/lC7ZLZ94ALw

  • The globe key on the emulator does not respond as expected -- making it impossible to switch between keyboard unless doing it from within Keyman app
    https://youtu.be/1p3qXUun3aA

For any more specific test, ping me again. :)

@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 15.0.19-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants