Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decide on synchronization strategy when pushing to servers #51

Closed
redshiftzero opened this issue Oct 19, 2018 · 9 comments
Closed

decide on synchronization strategy when pushing to servers #51

redshiftzero opened this issue Oct 19, 2018 · 9 comments

Comments

@redshiftzero
Copy link
Contributor

so far our sync logic is pull logic - it fetches everything on the server, and treats it as the canonical source of truth for updating local databases in securedrop clients. this issue is about a strategy for pushing new changes to servers.

something to bear in mind is that interfaces feel sluggish to humans if there's a >100ms response time. the mean round trip latency for tor is greater than 100ms, so any activity involving interaction with the securedrop server is going to feel sluggish to users if we wait for changes to be successfully pushed to servers prior to reflecting it in the GUI.

push option 1: apply to the server prior to reflecting locally and make the user wait i.e. let it be slow. UX wise we'd need to build in a lot of spinning beach balls so that the user knows that something is happening. this is easier logic-wise and it's what i've done in #50, but it's not necessarily the best strategy long term as spinning beach balls are probably pretty annoying to users.

push option 2: make the change locally, then attempt to apply it on the server. rollback if there was a failure applying the change to the server (there are lots of reasons this can happen: tor is flaky, securedrop application server goes down, etc.). we'd need to present the failure to push the change to the server to the user in this scenario somehow.

@heartsucker
Copy link
Contributor

I'm in favor of option 2. This is how chat apps (Signal, Whatsapp, Gitter) work. The message appears in the conversation with a spinner then a check when it's received or a (!) when it fails. Clicking the failure icon could present a notification that says "Failed to send" or whatever with options to delete or retry. However, this only makes sense in the conversation view for sending/deleting messages. An action like assigning a source, starring a source, etc. would need a "flash" message at the top saying "there was an error starring source foo".

@joshuathayer
Copy link
Contributor

I'm also in favor of option 2 from a UX perspective. Of course, properly implementing two-way sync is a known hard problem :)

There's been a bunch of exciting movement in the world of distributed, eventually-consistent databases, particularly with the popularization of offline-first application architectures. Apache's CouchDB, for example, says right in the marketing copy

CouchDB’s unique Replication Protocol is the foundation for a whole new generation of “Offline First” applications for Mobile applications and other environments with challenging network infrastructures.

"Challenging network infrastructure" sounds like our environment!

It's probably a bit late in the game to change our client's underlying database, but we should be aware that there's a whole world of research and established techniques for distributed, eventually consistent applications, and, to the extent possible and reasonable, try to incorporate those ideas in whatever we design.

@redshiftzero
Copy link
Contributor Author

I'm in favor of option 2. This is how chat apps (Signal, Whatsapp, Gitter) work. The message appears in the conversation with a spinner then a check when it's received or a (!) when it fails.

yep - 2 is definitely better and what we want, but is non trivial

hm couchDB looks very interesting @joshuathayer! agreed re: trying to use standard techniques where we can. for the alpha we'll stick with a relatively simple approach, that said if there are systems e.g. couchDB that can do some of this work for us, we're definitely down to consider them given the importance of maintaining consistent state and not trying to reinvent the wheel.

what about this way forward for now:

  1. send change to server
  2. display change in UI - with spinning beach ball type user feedback next to the user action
  3. if success: remove beach ball, and possibly put a little check or something on sent messages
    if fail: display error message to user "could not send to server, try again", make the UI change back to what it was before

implementation-wise: the UI is reflecting the state of the local database, so one way to implement this is to commit the change to the database (we do need to commit the change for the UI to update), then make the change back if a fail in step 3 happens (bearing in mind that other user actions can happen between the "change A sent to server" and "change A confirmed by server").

we could do this UI rollback logic via callbacks passed to Client.call_api. we'd have a custom callback method for success/failure for each possible server action. we'd need to add UI functionality for "spinny beach ball" next to each UI action. it's a fair bit of code but we'd have:

  1. better UX as the UI change would be shown immediately
  2. consistent state, favoring the server (what we want)

thoughts?

@joshuathayer
Copy link
Contributor

joshuathayer commented Oct 19, 2018

Yep that all makes sense!

One danger comes from here:

(bearing in mind that other user actions can happen between the "change A sent to server" and "change A confirmed by server")

Imagine a state machine like

(start) -- A --> (state X) -- B --> (state Y)

where A and B are change actions. Consider a situation where the user takes action A and we optimistically advance application state to X. Then the user takes action B, and we advance application state to Y. If A and B eventually succeed on the server, then everything is fine. But! If A fails on the server and B succeeds, then we're in a weird state: the server is in some state (start) -- B --> (state Z), and locally we've illegally advanced the application to state Y. What does the failure callback do in that case?

Less likely but still possible are out-of-order problems: what happens if A and B happen in the client in rapid succession, but because of weirdness over Tor, B hits the server first. If A and B are both valid transitions from the state of the server but aren't commutative, the server transparently arrives at a different state than the client ( A(B(state)) vs B(A(state)).

I promise I'm not trying to be a jerk! And I don't have simple answers to these problems, aside from the laborious process of examining the actions we'd actually taking, considering ways in which their success and failure could interact, and programming defensively against those scenarios- it could be that in practice we don't have to worry much about the more tangled consistency issues.

@joshuathayer
Copy link
Contributor

joshuathayer commented Oct 19, 2018

We could also take a hybrid approach- actions which are idempotent and commutative (starring, tagging...) could be done optimistically, and failure could be handled simply: "enqueue this request and try again until it works". Actions whose eventual failure (or misordering) could land the system in an inconsistent or incorrect state could be done synchronously, either beachballing the entire UI or some elements of it until the request either succeeds or fails.

@redshiftzero
Copy link
Contributor Author

excellent points, thank you @joshuathayer! i thought a bit more about this - what follows is each write action possible by securedrop clients, and proposals for what the client should do, such that we don't enter into any of the inconsistent states you describe. feel free to point out issues here!

considering first this state machine and scenario:

(start) -- A --> (state X) -- B --> (state Y)

Consider a situation where the user takes action A and we optimistically advance application state to X. Then the user takes action B, and we advance application state to Y. If A and B eventually succeed on the server, then everything is fine. But! If A fails on the server, then we've illegally advanced the application to state Y.

my understanding is that for this to happen, there have to be actions that are valid at state Y that were invalid at state X. we are fortunate (for now - we'll have to be careful about this until we modify our synchronization process) in that for securedrop we only have two of these:

  • login: this enables many actions but we're ok here as it blocks any other interaction with the server
  • flag a source: enables one to reply to a source, but again we're ok here as we are forced to wait until it successfully completes because we need a result from flagging a source (generation of the source's public key) prior to actual sending a reply

(we do however have actions that are invalid at state Y that were valid at state X, i.e. all delete actions, so we may block some actions which is ok)

Less likely but still possible are out-of-order problems:

the good news is that many of our actions are idempotent (see below section). the bad news is two of these actions do not commute: unstarring and starring do not commute with each other (star(unstar(x)) = star(x) and unstar(star(x)) = unstar(x)) so we'll have to block the star part of the UI to prevent the user from clicking after the first action has completed. this doesn't handle the out of order arrival at the server however from starring or unstarring actions from other clients around the same time, so this can produce user confusion. i think for the securedrop scenario this is rare enough to be acceptable for now (since there are few clients per server).

in the non-idempotent cases we have deletion, but the server will report a failure for any actions attempted after a deletion has occurred, so we can safely rollback (and deleting the source does not enable any other previously-disallowed actions). replying is the one case where the ordering really does matter from the user's perspective, so we should sync the API often to update the replies locally. but it still may be the case that replies arrive at approximately the same time just as in other chat applications, and clients may not agree on the ordering. in this case, the server keeps track of the true global ordering (based on time of arrival) and clients will update to reflect that after sync.

summary of possible write actions and the failure behavior

idempotent actions

  • star a source
    • returns 404 if source has been deleted
  • remove a star
    • returns 404 if source has been deleted
  • flag a source
    • returns 404 if source has been deleted

non-idempotent actions

  • delete reply
    • returns 404 if reply has already been deleted
  • add reply
  • delete submission
    • returns 404 if submission already deleted
  • delete source (and all submissions)
    • returns 404 if already deleted

suggested proposed callback behavior for each action and implementation notes

idempotent actions

  • starring, unstarring or flagging a source
    • while starring or unstarring, the in-progress UI should prevent the user from sending another API request to star/unstar (these could get processed out of order and they do not commute)
    • if success: UI change stays
    • if failure:
      • remove star (for starring case, add star for the unstarring case, mark source not flagged for the flagging a source case) locally, report error to the user
      • update UI to reflect local db
    • if failure due to 404:
      • sync API, source is deleted, report the source was deleted
      • (this delete scenario is why we should avoid enqueuing idempotent changes until we succeed, we can enter a deadlock state if the source was deleted on the server)

non-idempotent actions

  • delete reply, submission, or source
    • important: for all the deletion actions, we need to be able to rollback. so we need to first update the UI and then only actually delete the item when we have confirmed the deletion on the server. implementation-wise this means we need to add a column for is_marked_for_deleted on the reply, submission, and source objects. the “delete” action locally would just be: marking source as deleted in the database. the UI should only display rows in each table that have not been marked as deleted.
    • if success:
      • UI change stays in place
      • now we carry out the deletion of both the row and the files on disk.
    • if failure:
      • mark the row as not marked for deletion, display to user that an error occurred deleting the object “try again"
      • update UI to reflect local db
    • if failure due to 404:
      • sync API, object is deleted, report the object was deleted to user
  • add reply
    • this is the situation where ordering matters a lot: what if journalist A and journalist B reply at almost the same time? i suggest we should allow this to happen (just as it can in any other chat application). the server will record the ordering it sees, and that will be propagated to clients during a sync.
    • (also: we want the replies list to be up to date, so we should probably sync the API when the user starts typing in the reply field)
    • if success:
      • UI change stays in place
    • if failure:
      • remove row from reply table locally
      • important from user perspective: move the reply text back to the compose reply area in the UI otherwise they will lose their draft
    • if failure due to 404:
      • sync API, source is deleted, report that another journalist deleted that source and one can no longer reply

other thoughts

timeouts

if we just don't get a response from any of the above actions, then the change may or may not have been committed to the server, in which case we should sync the API.

durability

  • what is the behavior if the client application crashes before one of these callbacks finishes?
    • in this case when the application starts back up and the user logs in, it syncs from the server again (this is the current behavior), and will just fetch whatever was committed. we favor the server when making changes in the local database, i.e. any half completed actions are effectively rolled back and we don't get to a inconsistent halfway point

queue/retry

proposal for at least beta: we add a queue for processing these API calls and add retry logic (ideally only after transient errors like network issues).

@ntoll
Copy link
Contributor

ntoll commented Oct 22, 2018

Folks, thanks for such a stimulating set of comments. @redshiftzero that's an epic and really useful summary of the various actions and states. Nice one!

I'm not sure I can add much more to this discussion because:

  • I agree with you all that 2 is the preferred option.
  • Your analysis and pointers to other similar projects is spot on.
  • I think simplicity is key, but that goes without saying.
  • This reminds me of the DVCS ecosystem (such as Git / GitHub, which ironically failed today).

Thoughts on UI: Instead of Beachball-mageddon, why not have a status bar across the bottom of the application (as browsers used to do) which contains a message/icon indicating the current activity/state..? It's a single place for the user to look and means we only have one beachball to worry about IYSWIM from a code perspective. Of course UX may indicate such a status bar should go somewhere else... I'm anxious we retain simple and easy to maintain code.

@redshiftzero
Copy link
Contributor Author

Thoughts on UI: Instead of Beachball-mageddon, why not have a status bar across the bottom of the application (as browsers used to do) which contains a message/icon indicating the current activity/state..? It's a single place for the user to look and means we only have one beachball to worry about IYSWIM from a code perspective. Of course UX may indicate such a status bar should go somewhere else... I'm anxious we retain simple and easy to maintain code.

ah right this was picked up in #63

otherwise I think we have a decent plan for the alpha behavior and I'm going to close this... I'll link out to my comment re: proposed callbacks on the relevant issues

note there is room after the alpha for further thought on this (i.e. i don't want to shut down the conversation if people have more ideas on sync strategy in the future, but i do want to signal that this is the plan for the alpha 😉)

@ninavizz
Copy link
Member

Jeebus, my brain hurts reading all of that! GitHub needs a nerd-heart emoji, as the simple red heart just doesn't cut it sometimes. :)

@ntoll I'm guessing you guys have a set solution in place for Alpha (that will be great, of course!), and that the timed/animated Invision clickthroughs I posted last week adequately display intent-goals for Beta. Pls let me know if anything else is needed; happy to provide!

legoktm pushed a commit that referenced this issue Dec 11, 2023
legoktm pushed a commit that referenced this issue Dec 11, 2023
Users have reported printers not working because of `ppdc` warning
messages reported in the client (asking them to contact an
administrator). These warnings do not result in non-zero return codes
and are seemingly really just warnings, so no need to get users
involved.

Fixes #51
legoktm pushed a commit that referenced this issue Dec 15, 2023
legoktm pushed a commit that referenced this issue Dec 15, 2023
Supports logout endpoint (fixes #51)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants