Skip to content
This repository has been archived by the owner on Nov 29, 2023. It is now read-only.

Fix: not all signature images are uploaded in offline-capable forms #373

Merged

Conversation

eyelidlessness
Copy link
Contributor

Closes enketo/enketo#1043.

The details of the bug are described in more depth here.

I'm really not sure what the actual name of this solution should be (please feel free to bikeshed it), but this is how it works:

  1. Autosave is delayed by 500ms to work around Signature not recorded in touchscreen view enketo#1124
  2. That delay is now wrapped with a Promise, which acts as a lock, blocking the request to getCurrentFiles when saving an offline record, ensuring signatures are auto-saved before the record is saved
  3. As a matter of extra caution, multiple calls to delay only execute the last call if prior timers have not completed
  4. Also as a matter of extra caution, multiple calls to delay where prior timers have completed block wait until they all resolve.

I have verified this PR works with

  • Online form submission
  • Offline form submission
  • Saving offline drafts
  • Submitting offline drafts
  • [ ] Editing submissions
  • [ ] Form preview

What else has been done to verify that this works as intended?

Manual testing, with a particularly long delay and a lot of logging to be sure the claims I made above are true.

Why is this the best possible solution? Were any other approaches considered?

It's probably not the best solution, see discussion of that in the comment linked above.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Locks are hard, this might deadlock in ways I haven't thought about.

Do we need any specific form for testing your changes? If so, please attach one.

See enketo/enketo#1043.

@jnm
Copy link
Member

jnm commented Jan 24, 2022

Hi @eyelidlessness, thanks for working on this. I have some questions about the code, but before getting into that, did you plan on changing the 1500 ms delay in enketo-core to complement this PR? When testing this change alone in the browser, I still observed data loss when clicking submit less than 1500ms after ending the last stroke in a signature widget.

Although it's not useful for multi-stroke signatures, a quick visual indicator as to whether or not the signature would be included in the submission was whether or not the download icon appeared:
image
(In case anyone wonders why, including myself the next time I read this, it works as an indicator because the onEnd handler of the SignaturePad calls setTimeout( that._updateValue.bind( that ), DELAY ), and _updateValue() in turn calls updateDownloadLink().)

@eyelidlessness
Copy link
Contributor Author

eyelidlessness commented Jan 24, 2022

did you plan on changing the 1500 ms delay in enketo-core to complement this PR? When testing this change alone in the browser, I still observed data loss when clicking submit less than 1500ms after ending the last stroke in a signature widget.

Oof, in hindsight this is totally unsurprising! That's what I get for testing with a 5000ms timer for convenience.

I'll probably want to sleep on this, but my gut instinct right now is:

  1. I don't know what a safe delay would be, and I'm hesitant to keep and reduce the delay as it exists. I'm unsure both because I don't know about the timing characteristics of the underlying technologies, and because I assume the delay is intended to account for humans interacting with the field, and I don't know that I'd feel comfortable arbitrarily cutting 2/3 of the time a user can enter a signature without a lot more usage information.

  2. Delaying 1500ms here is probably unreasonable UX.

  3. Something needs to change in core to address this. I'll want to sleep on this, but here's some options that feel reasonable right now:

    • Ideally, the readiness of SignaturePad's canvas can be detected programmatically, e.g. a (shorter?) debounce in the event listener, producing a Promise which can be used with this change.

    • If that isn't possible for some reason, keep the delay as is, and still produce a Promise to use with this change. This approach would impose that 1.5 second worst case delay, but restrict it to forms using this particular widget.

    • Maybe this isn't the right change at all, and the form should provide a general Promise which says its state is ready to be submitted? This is probably more future proof, even if it'll be incomplete. Allllll of the caveats in the issue comments apply here.

    In all of these cases this would be exposed at the form level, at least starting to move in the direction of a general way to know that a form's state is ready to be submitted.

Longer term, I think it's worth looking at how libraries designed for consistent, concurrent state management and rendering handle this. (Or, just using one.) Like I said in the issue tracking this bug, I'm expecting there are more issues like this we haven't discovered yet.

@eyelidlessness
Copy link
Contributor Author

@jnm I've pushed up a much smaller, simpler solution (though unfortunately without automated tests). I've also removed the delay here, because I found it's unnecessary.

Part of the reason it took so long to get here is that, while I thought was able to verify data loss as you described with the previous change, I had forgotten that Enketo caches the offline script and does not (currently) invalidate that cache for dev changes (because it's cached based on the version number). So I was actually verifying it against a different build, from a different branch I'd worked on in the meantime. This is an issue I intend to fix next, because it's been a frequent source of false positives/negatives and time lost.

Anyway, I bring this up because I suspect this may be why you continued to see data loss as well. With these changes (as with the previous approach), I was able to verify successful submission even with the enketo-core delay extended well past submission.

The reason this works (even after removing the delay here) is:

  1. Drawing on the widget starts the delay timer
  2. Clicking on the submit button (or performing any other interaction I'm aware of which could potentially submit the form) triggers a blur event on the widget
  3. The blur event handler calls _forceUpdate, which cancels the outstanding timer and updates immediately
  4. This change now triggers auto-save immediately on change, producing a Promise reference
  5. Submitting (or saving a draft) now waits for that Promise to resolve before it can proceed

All of that said, you should be able to verify this by ensuring the offline cache is up to date. There are two ways to do that:

  1. Start a new private/incognito mode browser session (you'll likely need to also close all currently open private/incognito mode windows, as they generally share state until they're all closed).
  2. In Chrome dev tools:
  3. Go to the Application tab.
  4. Expand the Cache Storage.
  5. Right click enketo-common-[...].
  6. Click Delete in the menu.
  7. Reload.

@lognaturel
Copy link
Contributor

The reasoning here seems sound to me. Let’s give @jnm a day or two to chime in and if he doesn’t I’ll do a deeper review and we can decide whether or not to release with it.

@jnm
Copy link
Member

jnm commented Feb 2, 2022

Catching up here now. Thanks for your patience!

@jnm
Copy link
Member

jnm commented Feb 3, 2022

Aha, the reason I could (and still can) reproduce data loss within the 1500-ms enketo-core delay interval is that I'm using a touchscreen on a desktop browser 🙃 The blur event never fires, and the alternative
find( '.hide-canvas-btn' ).on( 'click', () => { /* snip */ that._forceUpdate(); /* etc. */ } ) approach only applies to mobile (i.e. when touch is true). Sure enough, there's a comment about this that I should've given more attention:

// This event does not fire on touchscreens for which we use the .hide-canvas-btn click
// to do the same thing.

I'm happy to confirm that with the 500-ms delay removed, I can no longer reproduce any signature data loss with a mouse, nor can I reproduce it using touch on an Android phone. 🎉

My only remaining question: is the locking around getCurrentFiles() still necessary? From the earlier rationale:

That delay is now wrapped with a Promise, which acts as a lock, blocking the request to getCurrentFiles when saving an offline record, ensuring signatures are auto-saved before the record is saved

I see that both _autoSaveRecord() (called when XFormsValueChanged fires) and _saveRecord() (called 100 ms after the submit button is clicked) call getCurrentFiles(), and that makes me wonder if it's necessary, sequentially, to "ensur[e] signatures are auto-saved before the record is saved". A cursory look at getCurrentFiles() makes me think it always returns a new array of blobs without mutating any other state. Also, _saveRecord() appears to build a new record from scratch, without referencing the work done by _autoSaveRecord(), discarding that work instead.

@jnm
Copy link
Member

jnm commented Feb 4, 2022

Also, I'm in favor of releasing a change that removes some risk of data loss, even if it doesn't cover touch on a desktop browser. Sorry to have not made that clear in my last post. I think that's a fairly rare use case so far(?) or at least none of these reports reference that scenario:

I tried to reproduce data loss after simply removing the 500 ms delay (and not adding any locking around getCurrentFiles()), but my signatures were saved every time. I am curious, though, if @MartijnR remembers how to reproduce enketo/enketo#1124, since that delay was added for a reason—and maybe that reason points to the necessity of @eyelidlessness' locking approach.

I never quite got to the bottom of this, but simply delaying the autosave until after the xforms-value-changed event works around the issue.
Somehow the timing of the event handler autosave action resulted in getting an empty file from the canvas.
_Originally posted by @MartijnR in enketo/enketo#1124

@eyelidlessness
Copy link
Contributor Author

@jnm thank you for the detail around touch, that definitely explains why you're still seeing data loss in that case. While it may seem like a rare use case, I can easily imagine this being an issue on tablets/larger form factor touch devices in general. So I'm thinking about next steps on how to address that.

My only remaining question: is the locking around getCurrentFiles() still necessary?

Unfortunately it is necessary without other changes (more on that below). Without locking (or, alternatively, awaiting a potentially redundant call to _autoSaveRecord), it creates a race condition between the call to _autoSaveRecord. Then:

  1. _saveRecord calls records.save
  2. Which calls set
  3. Which overwrites the record's files with the auto-saved record's files

According to the comment there (and I can confirm), this is to ensure that any unchanged files which were populated from a previous auto-saved record are included when queueing the record for submission. This is also the case when submitting a previously saved draft.

I've investigated changing the way this works, to use only the current state of the record. I would like to make that change, but I've found it difficult to follow all of the data flow in order to find a coherent way to make that change. As far as I can tell, that data flow goes through at least:

  • enketo-express:
    • controller-webform
    • file-manager
    • records-queue
  • enketo-core:
    • form
    • form-model
    • widget
    • draw-widget
    • ... probably more that I'm not remembering

I decided to prioritize getting this simpler (albeit incomplete) fix through, and investigate a more holistic change to simplify and (hopefully) unify that flow later on.

I tried to reproduce data loss after simply removing the 500 ms delay (and not adding any locking around getCurrentFiles()), but my signatures were saved every time.

Just checking out of an abundance of caution: are you sure that the offline script was updated after you made that change? Until #382 (or something like it) lands, the caching issue I described above applies for every change made in dev.

I can imagine it becomes much harder to trigger that data loss (with a mouse or on a touchscreen phone) by removing the delay, but because _autoSaveRecord is async it's still technically a race condition.

Copy link
Member

@jnm jnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Punch list for myself of things to actually test:

  • Named drafts
  • Auto-saved drafts
  • Encrypted forms
    • Named drafts
    • Auto-saved drafts

public/js/src/module/controller-webform.js Outdated Show resolved Hide resolved
public/js/src/module/controller-webform.js Show resolved Hide resolved
@jnm
Copy link
Member

jnm commented Feb 7, 2022

  1. _saveRecord calls records.save
  2. Which calls set
  3. Which overwrites the record's files with the auto-saved record's files

That's quite eye-opening, thanks! Knowing that now, I'm surprised

  • that _saveRecord() bothers calling getCurrentFiles() at all (although I guess simplifying this was contemplated when the record.files = autoSavedRecord.files overwrite was added?)
  • that a simple overwrite was done instead of using the update() logic that's invoked when auto-saving (which, to be honest, I don't have enough time to understand fully right now)

…but perhaps the rationale is lost to history. In any case, I certainly recognize now why you've proposed this locking and fully agree that it's necessary.

Thanks also for emphasizing the caching pitfall. I feel confident that I'm running the intended code because I'm starting a factory-fresh browser with a temporary home directory each time I test a new revision. It's likely not cross-platform, but in case you're curious, I'm using this bash function:

tmpchrome () 
{ 
    td=$(mktemp --tmpdir --directory tmpchrome.XXX);
    mkdir -p "$td/.config/google-chrome";
    touch "$td/.config/google-chrome/First Run";
    HOME="$td" google-chrome "$@";
    rm -r "$td"
}

I added a few questions to the code in review form. Thank you again 🙏

- No longer assigns `null` to `autoSavePromise`
- Auto-save locks/waits on its previously assigned promise directly, so outside calls to it do not need to `await` it implicitly
@eyelidlessness
Copy link
Contributor Author

@jnm Thank you for taking time to review! I've addressed your questions. I also:

  • Removed the null assignments to autoSavePromise
  • Moved the lock for a previously assigned autoSavePromise out of the event handler, into _autoSaveRecord itself, which feels safer and more clear

We're hoping to include this in the pending #384 3.0.5 release, would you mind taking another look to see if these changes look good?

@lognaturel
Copy link
Contributor

@yanokwa can we get this branch on the QA server, please? We can do a quick round with it before merging and releasing.

@yanokwa
Copy link
Member

yanokwa commented Feb 7, 2022

QA server now has this change in enketo.dockerfile

-FROM ghcr.io/enketo/enketo-express:3.0.4
+FROM ghcr.io/eyelidlessness/enketo-express:fix-339-signatures-data-loss

@jnm
Copy link
Member

jnm commented Feb 8, 2022

I tested a variety of scenarios and everything looked good 🚀 I did encounter enketo/enketo#1023 (TypeError: Require Blob) and noticed that auto-save is disabled for encrypted forms (but it was on 2.8.1 as well). Apparently, though, both auto-save and named drafts are supposed to be disabled for encrypted forms altogether(?) Or at least that's what enketo/enketo#1057 and https://blog.enketo.org/encryption/ say.

@lognaturel
Copy link
Contributor

Thanks so much for the careful follow-up here, @jnm!!

Similar results from QA -- looks like they also ran into enketo/enketo#1023.

@lognaturel lognaturel merged commit fcfb346 into enketo:master Feb 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data loss: not all signature images are uploaded in offline-capable forms
4 participants