Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU spikes every 10 minutes due to accounts-base package clearing login tokens #10462

Closed
wildhart opened this issue Feb 22, 2019 · 11 comments
Closed

Comments

@wildhart
Copy link
Contributor

I have a single combined meteor server and MongoDB (I know, but it's appropriate to the size of my app for now), and precisely every 10 minutes I see a CPU spike:
image

I'm currently using Meteor 1.8.1-beta.17 but this has been happening for a long while now and unfortunately I can't remember in which version of Meteor it started happening. Nor can I revert my production server to old releases to find out, sorry.

I've tracked it down to the following code in accounts-base/accounts_server.js:

const setExpireTokensInterval = accounts => {
  accounts.expireTokenInterval = Meteor.setInterval(() => {
    accounts._expireTokens();
    accounts._expirePasswordResetTokens();
    accounts._expirePasswordEnrollTokens();
  }, EXPIRE_TOKENS_INTERVAL_MS);
};

I've proved this code is the culprit by changing EXPIRE_TOKENS_INTERVAL_MS to 15 minutes instead of 10 minutes and the spikes change accordingly.

To see how long this code is taking and how many tokens are removed I modified the code to:

const setExpireTokensInterval = accounts => {
  accounts.expireTokenInterval = Meteor.setInterval(() => {
    const a0 = new Date().valueOf();
    const r1 = accounts._expireTokens(); // I've hacked this to return the number of records updated
    const a1 = new Date().valueOf();
    const r2 = accounts._expirePasswordResetTokens(); // I've hacked this to return the number of records updated
    const a2 = new Date().valueOf();
    const r3 = accounts._expirePasswordEnrollTokens(); // I've hacked this to return the number of records updated
    const a3 = new Date().valueOf();
    console.log('expire tokens:', a1-a0, a2-a1, a3-a2, 'total='+(a3-a0), 'removed:', r1, r2, r3);
  }, EXPIRE_TOKENS_INTERVAL_MS);
};

This gives the following output:

20T22:48:37 expire tokens: 2825 447 1268 total=4540 removed: 0 0 0
20T22:58:38 expire tokens: 2572 630 2109 total=5311 removed: 0 0 0
20T23:08:37 expire tokens: 2810 1461 387 total=4658 removed: 0 0 0
20T23:18:36 expire tokens: 2132 784 519 total=3435 removed: 0 0 0
20T23:28:36 expire tokens: 2322 694 471 total=3487 removed: 0 0 0
20T23:38:36 expire tokens: 2280 1354 384 total=4018 removed: 0 0 0
...

Since node and mongo are running on the same server, I tracked the CPU usage of each using htop during one of these intervals:
graph

accounts._expireTokens() seems to be the worst offender of the three functions within the interval.

My users collection only contains 946 users, and 407 of these contain "services.resume.loginTokens.when", but none of those tokens are over 90 days old. However, most of them contain multiple loginTokens (up to 50 in some cases).

When I run a similar query direct in the mongo shell using .explain("executionTime") it takes only a couple of milliseconds and reports that it is using an index.

Since the query should be fast, the working hypothesis on this thread in the Meteor forums is that the collection.update() function is yielding to another fiber, but I don't know how to track down which other code is causing the delay, or why it would only be happening only during these regular queries. No user documents are being updated so I don't think it can be an observer on the Meteor.users collection.

Meteor is deployed to Digital Ocean on an Ubuntu server. Sorry I can't produce a repro - the CPU spike is only apparent on my production server.

Any ideas what could be causing the CPU spike?

Also a side question - since the DEFAULT_LOGIN_EXPIRATION_DAYS, DEFAULT_PASSWORD_RESET_TOKEN_EXPIRATION_DAYS and DEFAULT_PASSWORD_ENROLL_TOKEN_EXPIRATION_DAYS periods are 90, 3 and 30 days respectively, why is the default interval running every 10 minutes?

@nathan-muir
Copy link
Contributor

nathan-muir commented Feb 25, 2019

@wildhart Can you post a gist with your .meteor/packages / .meteor/versions.

@wildhart wildhart reopened this Feb 25, 2019
@wildhart
Copy link
Contributor Author

@wildhart
Copy link
Contributor Author

I’ve cloned my production db on my local machine and also a staging server identical to production. Neither exhibit the same spikes and the expireTokens interval only takes 20-150ms total. So it only appears to be a problem when users are connected (production server varies between 50-100 concurrent sessions, spikes are worse when there are more active sessions).

@pmcochrane
Copy link

If I was in your situation I would try adding some indexes to the users collection that have the tokens that are being checked for for in each of the 3 functions called in the accounts package and see if that reduces the times recorded for the update queries. I find that having a few thousand records in the users collection does cause a few problems.

My other thought would be to clear the 50 tokens for a user when they logout. I have not noticed this pattern of token build up myself however.

@wildhart
Copy link
Contributor Author

I've added an index to the "services.password.reset.reason" field, all the other fields involved were already indexed. It didn't really help much.

I've expired all tokens older then 30 days and that helped a bit. I also figured out why I had so many tokens: a bug in my signIn.vue component was causing the Meteor.loginWithPassword function to be called twice. Essentially:

<form @submit="submit>
  ...
  <button @click="submit" type="submit">Sign in</button>

was causing the submit method to be called twice. This was causing two new resume tokens for every login, but when a user logs out only one of those is removed (not many of my users log out though).

Removing tokens over 30 days reduced the number of tokens by about half (1125 to 602) and the maximum number of tokens per user from 55 down to 26. Over time more of those older tokens should expire as well making things better. However, I'm still still getting significant spikes when the expire tokens interval runs.

I've made a new discovery:

One of my publications is also spiking every 10 minutes coinciding with the token expiry setInterval:
image

My user_teams publication is where I send all my custom user data to logged in users, and looks like this:

Meteor.publish("user_teams", function() {
	if (this.userId) {
		const result = Meteor.users.find(this.userId, {fields: {
			teams: 1,
			teamAdmin: 1,
			view: 1,
			set: 1,
			superAdmin: 1,
			"profile.name": 1,
			userTour: 1,
			hideHeader: 1,
			loneWorkerAccepted: 1,
			isLoneWorker: 1,
			loneWorker: 1,
			loneWorkerHistory: 1,	// keep this separate so can send all of loneWorker to other users.
			commsOptOut: 1,
			devices: 1,
			"mobile.number": 1,
			"mobile.verified": 1,
			notifications: 1,		// a list of unacknowledged notifications
			showConsole: 1,
		} });
		// console.log("meteor.publish user_teams", JSON.stringify(result.fetch()));
		return result;
	}
	return [] ;

None of these fields are being changed during the token removal updates (nothing is being changed because most of the time no old tokens are found to expire).

So why would this publication be re-running for lots of users after the token expiry query?

I'm using peerlibrary:reactive-publish by @mitar, but this publication isn't using an autorun.

@nathan-muir
Copy link
Contributor

@wildhart Can you tell if that publication is using the Polling or Oplog driver?

@wildhart
Copy link
Contributor Author

@nathan-muir it was using Polling, because I had deployed with mup without enabling the oplog at all.

I've since redeployed with oplog enabled. The user_teams publication now looks like this (redeployed at 16:03):
user_teams

There is still a spike in overall fetched documents every 10 min:

fetched

and still a small spike in CPU:

cpu2

both coinciding with the token expiry, but much less than before. Overall CPU is down.

@nathan-muir
Copy link
Contributor

nathan-muir commented Feb 27, 2019

Yeah, thought it was polling.

When using the polling driver, Meteor.users.update(selectorWithoutId, modifier) will cause every active publication[1] with Meteor.users.find(...) to re query all of it's documents.

When using oplog driver, it will only fetch the modified documents (from the oplog) and then see if the doc matches any of the active publications (which can be done in nodejs for simple publications)

[1] Technically the polling driver will re-query every instance of Meteor.users.find(...).observeChanges() which a Meteor.publish does behind the scenes.

@wildhart
Copy link
Contributor Author

Thanks for your help.

I'm happy to consider this issue closed but will leave it to a Meteor Dev to close because I still think that, given that the expire tokens setInterval does have a noticible impact on performance, (albeit quite small if oplog is enabled), I think it excessive that the interval is set to every 10 minutes when the DEFAULT_LOGIN_EXPIRATION_DAYS, DEFAULT_PASSWORD_RESET_TOKEN_EXPIRATION_DAYS and DEFAULT_PASSWORD_ENROLL_TOKEN_EXPIRATION_DAYS periods are 90, 3 and 30 days respectively.

If that's unlikely to be changed then go ahead and close this issue.

@stale
Copy link

stale bot commented Dec 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale-bot label Dec 10, 2019
@stale
Copy link

stale bot commented Dec 18, 2019

This issue has been automatically closed it has not had recent activity.

@stale stale bot closed this as completed Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants