Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add offline use case for 'migrateFromSessionToken' #2492

Merged
merged 1 commit into from Jan 31, 2020

Conversation

vladikoff
Copy link
Contributor

@vladikoff vladikoff commented Jan 20, 2020

Fixes #2396

Pull Request checklist

  • Quality: This PR builds and tests run cleanly
    • cargo test --all produces no test failures
    • cargo clippy --all --all-targets --all-features runs without emitting any warnings
    • cargo fmt does not produce any changes to the code
    • ./gradlew ktlint detekt runs without emitting any warnings
    • swiftformat --swiftversion 4 megazords components/*/ios && swiftlint runs without emitting any warnings or producing changes
    • Note: For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry in CHANGES_UNRELEASED.md or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

@vladikoff
Copy link
Contributor Author

@eoger @rfk simpler approach to this now based on what we learned in the past PR. r?

@rfk rfk requested a review from grigoryk January 21, 2020 04:38
@rfk
Copy link
Contributor

rfk commented Jan 21, 2020

Thanks @vladikoff; I'm adding @grigoryk as reviewer for the proposed approach from the consumer side.

Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a lot simpler!

Both of my high-level comments from the previous PR still kind of apply here though:

  • How do we handle the difference between transient and fatal errors?
  • How can we effectively test the above in CI?

(I don't want to get too hung up on tests here, because we plan for this code to have a short shelf-life. But we also don't want to be accidentally breaking it as we land other fixes in the leadup to the migration release).

### Features

* `migrateFromSessionToken` now handles offline use cases. It caches the data the consumers originally provide.
If there's no network connectivity then the migration could be retried using the new `retryMigrateFromSessionToken` method.

This comment was marked as resolved.

Ok(migration_result) => migration_result,
Err(err) => {
println!("Error: {}", err);
// test offline behaviour
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this gets run in CI on a regular basis, I'm not sure we can claim it as a "test" ;-)

let retry = fxa.try_migration();
match retry {
Ok(result) => break result,
Err(_) => println!("Retrying... Are you connected to the internet?"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing discussion from the previous PR, each migration attempt could fail for either both fatal or non-fatal reasons. It would be useful for this example to show how to handle the two different cases. For example right now, if I enter an invalid sessionToken into this example script, then IIUC it will loop forever asking me whether I'm connected to the internet. We don't want to get anyone's migrated Fennec stuck in such a loop in practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll also note that "not connected to the internet" is not necessarily the only transient error; 500-level server errors should probably also be treated as transient and retried).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's only for testing purposes, but I'd like to see this example use the same logic as we expect consumers to use in practice, which IIUC is basically:

match migrate_from_session_token(...) {
  Ok(result) => result,
  Err(...) => {
    while is_in_migration_state() {
       sleep(...)
       match retry_migrate_from_session_token() {
         Ok(result) => result,
         Err(...) => continue,
        }
    }
  }
}

Copy link
Contributor

@grigoryk grigoryk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start, but I'm not sure how well the proposed API will work in practice.

From an a-c perspective, I expected to see one of:

  • automagical behaviour - a failed attempt persists FxA state, pretends to succeed, and will attempt to re-auth on FxA API access. On startup, a-c would detect that there's persisted state, and a call to ensureCapabilities will attempt to re-auth as well. Non-transient failures will reset this process.
  • diy approach - an ability to detect if there was an attempt to sign-in, which failed and could be retried

First option will make the integration much smoother, of course - we probably won't need to change anything in a-c, nor introduce any new UI states in Fenix. Second option is arguably much simpler for ya'll, but the complexity is moved upwards - in a-c/fenix, we now get a new state (attempted to sign-in via migration data, failed, but could retry) that needs to be managed and integrated with the existing flow.

What I see is close to second, but with some details missing. How can I figure out if I need to retry? What is the state of the world after a failed retry attempt? What actions will work, and what won't work? What state should the client end-up in right after this failure? Can a client attempt a fresh sign-in? What should happen after an app restart?

We already have a possibility of transient network issues at the very end of the sign-in process - perhaps we can adopt a similar approach for both of these problems (transient migration failures and transient regular sign-in flow failures)? E.g. IIUC, completeOAuthFlow failures could be treated similarly, with a caveat that code(and maybe state?) has a shorter lifespan.

Also, nit - it'll be very nice if there were some docs on what's in the returned JSON blobs - or at least a comment with a struct name, to make looking this stuff up easier.

@vladikoff
Copy link
Contributor Author

automagical behaviour - a failed attempt persists FxA state, pretends to succeed, and will attempt to re-auth on FxA API access.

We tried that in the other approach and it didn't pan out. We will possibly revisit that in the future once we simplify the state persistence

@rfk
Copy link
Contributor

rfk commented Jan 23, 2020

We tried that in the other approach and it didn't pan out

To add a bit of context from the previous conversations, that approach turned out to be surprisingly invasive, with lots of methods suddenly having the potential to modify the account state and need to trigger the persistence callback.

@vladikoff vladikoff force-pushed the migration-offline-4 branch 4 times, most recently from a781306 to 60dfc28 Compare January 26, 2020 17:55
@vladikoff
Copy link
Contributor Author

Updated!

Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vladikoff. The shape of the API here looks inline with what we discussed in the meeting last week.

However, I still think we're missing some logic to differentiate between transient errors (e.g. network failure, server error) and permanent errors (e.g. bad session token). We don't need to expose that over the FFI, but we do still need to handle it internally, otherwise we risk being permanently stuck with isInMigrationState returning true but retryMigrateFromSessionToken failing with the same permanent error over and over again.

My suggestion would be to have an outer catch block around try_migration, something like:

pub fn try_migration() {
  let res = { ... all the code that's there currently ... }
  if let Err(err) = res {
    // If it's a transient error, leave the state so we can retry later.
    if (err is a network error or  500-server-error) {
      return res;
    }
  }
  // We've either succeeded, or are never going to succeed.
  self.state.in_flight_migration = None;
  res
}

let retry = fxa.try_migration();
match retry {
Ok(result) => break result,
Err(_) => println!("Retrying... Are you connected to the internet?"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's only for testing purposes, but I'd like to see this example use the same logic as we expect consumers to use in practice, which IIUC is basically:

match migrate_from_session_token(...) {
  Ok(result) => result,
  Err(...) => {
    while is_in_migration_state() {
       sleep(...)
       match retry_migrate_from_session_token() {
         Ok(result) => result,
         Err(...) => continue,
        }
    }
  }
}

* `migrateFromSessionToken` now handles offline use cases. It caches the data the consumers originally provide.
If there's no network connectivity then the migration could be retried using the new `retryMigrateFromSessionToken` method.
Consumers may also use the `isInMigrationState` method to check if there's a migration in progress.
([#2492](https://github.com/mozilla/application-services/pull/2492))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest rewording this for clarity, to focus on the things that consumers need to know in the order they need to know them. Along the lines of:

  • migrateFromSessionToken now has special handling for transient failures such as network errors. If the migration fails due to a transient error, then the provided credentials will be cached so that the migration can be retried later. Consumers can call the isInMigrationState method to check if there is a cached migration in progress, and retryMigrateFromSessionToken to retry it.

self.try_migration()
}

/// Check if the client is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the client is what? :-P

self.state.in_flight_migration.is_some()
}

pub fn try_migration(&mut self) -> Result<FxAMigrationResult> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit: I think the overall structure of the code would be a bit clearer if this were called retry_migrate_from_session_token for symmetry with what it's called in the higher layers.

Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have many remaining nits, but let's get this into a build and try it out ;-)

@vladikoff vladikoff merged commit 807c176 into master Jan 31, 2020
@rfk rfk deleted the migration-offline-4 branch June 7, 2021 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graceful handling of the offline case during a sessionToken migration
4 participants