Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

FxA Account Manager refactor #7856

Merged
merged 5 commits into from
Sep 22, 2020

Conversation

grigoryk
Copy link
Contributor

@grigoryk grigoryk commented Jul 27, 2020

State machine changes:

account manager changes:

  • removed ability to "set sync config" from the public API. we weren't using it (and have no plans to use this), so the added complexity wasn't benefiting anything.

async/suspend API changes:

account manager, account and device constellation APIs are now suspend, which makes both internal code and consuming of these APIs nicer and more flexible. Actual semantics of what's going on didn't change.

sync manager changes:

introduced a delay to the periodic sync scheduling, so that we can properly manage what goes on during startup (Closes #7335)

Tests are half-way being updating, so this is still a WIP.

cc @csadilek @rfk @eoger @jonalmeida


Pull Request checklist

  • Quality: This PR builds and passes detekt/ktlint checks (A pre-push hook is recommended)
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry or does not need one
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features

After merge

  • Milestone: Make sure issues closed by this pull request are added to the milestone of the version currently in development.
  • Breaking Changes: If this is a breaking change, please push a draft PR on Reference Browser to address the breaking issues.

@grigoryk grigoryk requested a review from csadilek July 27, 2020 06:03
@grigoryk grigoryk force-pushed the issue7536PairingCrash branch 2 times, most recently from 4c9ddfa to 8432ab9 Compare July 27, 2020 19:58
@csadilek csadilek self-assigned this Jul 27, 2020
Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot going on here, but I did a first pass and left some notes. I think you're right that it will make the state-machine easier to understand overall.

FWIW the PR would be easier to digest with the async->suspend stuff landed separately 😁

*/
data class SyncConfig(
val supportedEngines: Set<SyncEngine>,
val syncPeriodInMinutes: Long? = null
val periodicSyncConfig: PeriodicSyncConfig?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I continue to be sad that we have to plumb a bunch of sync-related things through here, but oh well, not really anything you can do about it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure this is bad, the upside is that it gives consumers a single point of interaction with all of these things.

If you mean something closer to "it's too bad sync manager is managed in one repo, but sync scheduling in another", then yes, I agree that it is unfortunate and something we should look into fixing 👍

clientId: String,
scopes: Array<String>,
state: String,
accessType: AccessType
) = scope.async {
) = withContext(scope.coroutineContext) {
handleFxaExceptions(logger, "authorizeOAuthCode", { null }) {
inner.authorizeOAuthCode(clientId, scopes, state, accessType.msg)
}
}

override fun getSessionToken(): String? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any of your consumers actually use getSessionToken, or is it exposed mostly for completeness? It doesn't feel like anyone should be calling this method in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, this is just used internally by the account manager (to get sync going).

In theory, if consumers aren't using the account manager this lets them do something useful with the account object beyond just fetching a profile, but... yeah, not really something we want to encourage.

Next on my plate is to work through the "we have a firefox-shaped oauth account abstraction with just a single implementation" bit, and simplify things on that front... Perhaps we can clean this up as part of that work.

AuthType.Signin,
AuthType.Signup,
AuthType.Pairing,
AuthType.MigratedCopy -> DeviceFinalizeAction.Initialize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, the fact that there are so many ways to spell "did a fresh signin" seems like a tiny bit of a code-smell to me. What things do we need to do that differ between Signin, Signup and Pairing? (I seem to recall this being added initially for metrics purposes?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place where we use this is to kick-off sync after an onAuthenticated event (first sync vs startup sync), with a different mapping from device's init/ensure.

Otherwise, this is plumbed around in such detail mostly for telemetry, yeah.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about perhaps a different internal/external representation of this.

) = withContext(coroutineContext) {
when (val s = state) {
// Can't sync while we're still doing stuff.
is State.Active -> Unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this means the syncNow request is just dropped on the floor. Is it possible that this might cause a bad user experience where they request a "sync now" but we don't honour it because we happened to be doing something else at the time?

Copy link
Contributor Author

@grigoryk grigoryk Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, we're dropping it here. FWIW, I think this can't really happen in practice - if we're in any of the "active" states, there isn't really a way for the user to request a sync. We're either starting/completing auth, logging out, migrating, etc - Fenix's UI, at least, doesn't expose "sync" button in any of these scenarios.

I'm not quite sure if having this check is necessary (it shouldn't happen), but left it here for completeness. This may as well produce a caught exception that we submit into Sentry, since it is an illegal event essentially (but, it exists outside of the state machine).

Another alternative is to make "sync" an "external event" on the state machine itself, but I'm not sure that's significantly better really, and it starts to introduce "sync" bits into what's otherwise an fxa-only thing.

@grigoryk
Copy link
Contributor Author

grigoryk commented Jul 28, 2020

Thanks for taking a look, @rfk! I'm still making some tweaks, but the core of it is there.

FWIW the PR would be easier to digest with the async->suspend stuff landed separately 😁

Yup, I got a little carried away there... At this point, it'll be quite a task to split the comments into a sensible "story", happy to chat on zoom/slack about any of this.

Copy link
Contributor

@csadilek csadilek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, completed the first pass now.

The state machine looks really clean now with the different type of states! One suggestion below.

I also have one question about changing scopes in the push feature if I read this correctly and whether it was intentional.

Did a round of manual testing as well. All looking good.

processRawEventAsync(String(rawEvent))
}
accountManager.withConstellation { CoroutineScope(Dispatchers.Main).launch {
processRawEvent(String(rawEvent))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to run this on the main thread? Before this ran on AutoPushFeature.coroutineScope via notifyObservers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we maybe want to make these observer methods suspend functions as well, so we don't need to context switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this will run as before in practice (underneath we have a dedicated dispatcher), and marking the interface funs as suspend will avoid switching and certainly make the intention here clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #7899

Copy link
Contributor

@jonalmeida jonalmeida Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grigoryk I don't follow why we need to switch to the main dispatcher here. Previously the FxaPushSupportFeature would execute the call on the autopush dispatcher since account manager had it's own dispatcher. Shouldn't there be the ability to add a process a raw message that doesn't come from a coroutine on an internal dispatcher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonalmeida Previously the FxaPushSupportFeature would execute the call on the autopush dispatcher i don't think that's quite correct. Yes, it'll execute, say, processRawEventAsync on the autopush dispatcher (or, in whichever context onMessageReceived will be invoked). However processRawEventAsync itself will do the actual work on the account manger's dispatcher. With the suspend function, the actual "work" is still happening on the account manager's dispatcher, just as before, it's just how we "kick it off" is different now - but I don't think that actually changes semantics here in a way that matters?

I agree that it's a little awkward, hence #7889. However we can't make observer methods suspend since Observable doesn't currently support "notifying" suspend observers, and I don't think we want to expand that pattern.

Another option is to remove withcontext wrapper from impl suspend methods, and rely on callers to correctly call these functions (e.g. on a worker dispatcher of sorts). Current approach holds your hand a bit, and doesn't let you screw up too badly. I'm not sure if that's necessary.

@@ -210,7 +215,9 @@ internal class AutoPushObserver(
return@subscribe
}

account.deviceConstellation().setDevicePushSubscriptionAsync(subscription.into())
CoroutineScope(Dispatchers.Main).launch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above re: main scope.

@grigoryk grigoryk force-pushed the issue7536PairingCrash branch 4 times, most recently from 6d6d7d4 to 8af69cf Compare July 29, 2020 01:57
bors bot pushed a commit that referenced this pull request Jul 29, 2020
7915: Make sure device finalization succeeds before proceeding to postAuth r=csadilek a=grigoryk

Closes #7536 

Proper fix is coming in #7856, but this gets the crashes to go away.



Co-authored-by: Grisha Kruglov <gkruglov@mozilla.com>
bors bot pushed a commit that referenced this pull request Jul 30, 2020
7915: Make sure device finalization succeeds before proceeding to postAuth r=csadilek a=grigoryk

Closes #7536 

Proper fix is coming in #7856, but this gets the crashes to go away.



Co-authored-by: Grisha Kruglov <gkruglov@mozilla.com>
@grigoryk
Copy link
Contributor Author

grigoryk commented Aug 6, 2020

One remaining aspect of this PR that I need to change before this can land is improving how we handle the "offline startup" case. Current version will likely fail to finalize the account after multiple attempts, and will drive the user to the NotAuthenticated state, which isn't really something we'd ship. Existing state machine would just kind of give up at that point, and some things may not work correctly, but at least the user remains signed-in.

Ideally, we shouldn't be concerned with this stuff at the startup at all. Current fxaclient APIs don't make any guarantees around what happens during ensure/init calls, so we assume they talk to the network during a regular startup flow.
It would be nice if this wasn't the case - regular startup shouldn't require any network interactions.

We can "paper over" this at the state machine level, but I'm inclined to first investigate changing the a-s APIs first. cc @rfk

@grigoryk
Copy link
Contributor Author

This should be good to land now. However, since I'm away until mid-September, and due to how various releases are aligning right now, let's land it once I'm back on the 16th.

@grigoryk grigoryk force-pushed the issue7536PairingCrash branch 2 times, most recently from 9f892e0 to ae3c302 Compare September 16, 2020 20:31
@grigoryk
Copy link
Contributor Author

@csadilek once you're back from PTO, let's get this landed. Should be good to go.

@grigoryk grigoryk added the 🕵️‍♀️ needs review PRs that need to be reviewed label Sep 19, 2020
Copy link
Contributor

@csadilek csadilek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grigoryk OK did another pass and also looked at latest commits. Just some nits, but good to land from my perspective!

@grigoryk
Copy link
Contributor Author

grigoryk commented Sep 22, 2020

@csadilek addressed your feedback, thanks! Let's land this 👍

bors bot pushed a commit that referenced this pull request Sep 22, 2020
7856: FxA Account Manager refactor r=csadilek a=grigoryk

### State machine changes:
- "profile" states have been removed, `profile` is now handled outside of the state machine
- states and events have been split into two subtypes: external and internal. External states are the `AccountState`s (authenticated, not authenticated, auth problem, etc) and internal states are `InternalState` (beginning auth, completing auth, recovering from auth problems, etc)
- this split should hopefully make navigating the state machine more intuitive
- retry logic and error handling was added for all internal transitions which require networking (Closes #7536)

### account manager changes:
- removed ability to "set sync config" from the public API. we weren't using it (and have no plans to use this), so the added complexity wasn't benefiting anything.

### async/suspend API changes:
account manager, account and device constellation APIs are now `suspend`, which makes both internal code and consuming of these APIs nicer and more flexible. Actual semantics of what's going on didn't change.

### sync manager changes:
introduced a delay to the periodic sync scheduling, so that we can properly manage what goes on during startup (Closes #7335)  

Tests are half-way being updating, so this is still a WIP.

cc @csadilek @rfk @eoger @jonalmeida 



Co-authored-by: Grisha Kruglov <gkruglov@mozilla.com>
@grigoryk grigoryk added 🛬 needs landing PRs that are ready to land and removed 🕵️‍♀️ needs review PRs that need to be reviewed labels Sep 22, 2020
@mozilla-mobile mozilla-mobile deleted a comment from bors bot Sep 22, 2020
@mergify mergify bot merged commit a49be43 into mozilla-mobile:master Sep 22, 2020
@grigoryk grigoryk deleted the issue7536PairingCrash branch September 22, 2020 20:40
grigoryk pushed a commit to grigoryk/android-components that referenced this pull request Nov 13, 2020
Prior to mozilla-mobile#7856 we only had a NotAuthenticated
state, which user would remain in until the very end of the authentication process (receiving a 'finished...' api call).

After the refactor, we introduced an in-progress states - specifically, BeginningAuthentication. If the user
cancels the auth flow (e.g. closes a custom tab), they will remain in this state. Subsequent login attempts
would then fail.

This patch fixes this by introducing a Cancel event which we trigger whenever handling 'beginAuthentication' calls.
In normal scenarios, this event is ignored. Otherwise, it "resets" the state machine into NotAuthenticated
state before emitting any Begin* events.
mergify bot pushed a commit that referenced this pull request Nov 13, 2020
Prior to #7856 we only had a NotAuthenticated
state, which user would remain in until the very end of the authentication process (receiving a 'finished...' api call).

After the refactor, we introduced an in-progress states - specifically, BeginningAuthentication. If the user
cancels the auth flow (e.g. closes a custom tab), they will remain in this state. Subsequent login attempts
would then fail.

This patch fixes this by introducing a Cancel event which we trigger whenever handling 'beginAuthentication' calls.
In normal scenarios, this event is ignored. Otherwise, it "resets" the state machine into NotAuthenticated
state before emitting any Begin* events.
st3fan pushed a commit that referenced this pull request Nov 14, 2020
Prior to #7856 we only had a NotAuthenticated
state, which user would remain in until the very end of the authentication process (receiving a 'finished...' api call).

After the refactor, we introduced an in-progress states - specifically, BeginningAuthentication. If the user
cancels the auth flow (e.g. closes a custom tab), they will remain in this state. Subsequent login attempts
would then fail.

This patch fixes this by introducing a Cancel event which we trigger whenever handling 'beginAuthentication' calls.
In normal scenarios, this event is ignored. Otherwise, it "resets" the state machine into NotAuthenticated
state before emitting any Begin* events.
st3fan pushed a commit that referenced this pull request Nov 14, 2020
Prior to #7856 we only had a NotAuthenticated
state, which user would remain in until the very end of the authentication process (receiving a 'finished...' api call).

After the refactor, we introduced an in-progress states - specifically, BeginningAuthentication. If the user
cancels the auth flow (e.g. closes a custom tab), they will remain in this state. Subsequent login attempts
would then fail.

This patch fixes this by introducing a Cancel event which we trigger whenever handling 'beginAuthentication' calls.
In normal scenarios, this event is ignored. Otherwise, it "resets" the state machine into NotAuthenticated
state before emitting any Begin* events.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🛬 needs landing PRs that are ready to land
Projects
None yet
4 participants