New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New mongo package options to optimize Oplog tailing performance to include/exclude certain collections #13009
New mongo package options to optimize Oplog tailing performance to include/exclude certain collections #13009
Conversation
… exclude certain collections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most important part will be to add tests. At least:
- Configure both and see if it fails.
- Configure inclusion and check if non-included ones are not processed.
- Configure exclusion and check if non-excluded ones are processed.
Why can the configuration be set from settings.json and not the code? // This is pseudo code so don't judge
import { mongo } from 'meteor/mongo'
mongo.oplogExcludeCollections = [];
mongo.oplogWatchCollections = []; I've noticed this trend in Meteor lately and I believe it's getting out of hand. The settings file is getting bloated with configurations; it's supposed to hold ENV variables/secrets. 🤷♂️ cc @zodern |
@harryadel The settings file is kinda "always loaded". Also I tried to follow the logic that already exists for the mongo package - it already uses the Meteor.settings for other package options. |
@Twisterking I understand why you went this path. I'm merely discussing the idea not trying to put you down :)
But isn't that what
Yeah, this trend has been going on for a while so naturally you continued down this trail. I just don't think this path is the best IMO. |
@harryadel understood, no worries! No harm done! I will write some tests as suggested and will also create a PR for the meteor docs so this new setting is documented. Any other feedback? As I said I would love to get this PR ready for merging asap! 👍 |
@Twisterking I'm for leaving it in the settings. Doing it in the code would be problematic, as it'd need to execute before the connection is established and that's deep. @harryadel The |
@radekmie I am very happy to do that! The issue is, I don't really have any experience with process.env.MONGO_OPLOG_URL && Tinytest.addAsync(
'mongo-livedata - oplog - options.oplogExcludeCollections',
async test => {
if (!Meteor.settings.packages) Meteor.settings.packages = {};
if (!Meteor.settings.packages.mongo) Meteor.settings.packages.mongo = {};
Meteor.settings.packages.mongo.oplogExcludeCollections = ['foobar'];
console.log('oplogExcludeCollections', Meteor.settings.packages.mongo);
test.equal(true, true);
}
); This of course works, but it is "too late" - the oplog file itself is already loaded and there, --> How do I define a Meteor setting in a test "early enough"? |
The only easy option I see would be to change the |
I am sorry, but I am not familiar enough with the meteor core to do this based on this comment! :/ I will need some more detailed help please! Maybe @zodern can help? |
This will be very helpful for some apps. I'm not very familiar with this code, but I have a couple questions.
To give an example in case my description isn't clear. Let's say the database that is being observed is updated every 10 minutes on average, and there is another database for metrics that has hundreds of updates per second. Whenever Meteor restarts tailing the oplog (which would happen almost every 30 seconds since there are rarely oplog entries for the observed database), it would query for all of the relevant docs newer than the last oplog entry it received. Since the last oplog entry Meteor received might have been up to 10 minutes ago, there could be a large number of other oplog entries mongo would have to filter.
This would probably involve OplogHandle tracking the last ts it's seen: meteor/packages/mongo/oplog_tailing.js Line 256 in c26569d
and then stopping and recreating the oplog tail handle as needed. I'm not sure if there would be a performance issue with listing too many databases or collections.
|
@zodern ad 1) good point indeed, to be honest I don't know. Another idea ad 3): I could also at least implement a simple check whenever a Collection is Is there anyone else who is willing to contribute here? I might be biased of course but I honestly believe this change can help with some Meteor scaling issues a lot for some projects out there! Really any help is very much appreciated! ❤️ |
Okay, so I finally managed to make the tests work! 🎉 Please have a look! |
For 1), it might be worth mentioning in the docs that it would be inefficient if the watched collections are rarely updated, and other collections have a very large number of updates.
|
On the discussion on where to place the configuration, in terms of developer experience, is it possible to do this in the definition of the collection (also where you normally add the schema). So when a developer creates a new collection, the developer can configure that collection for oplog tailing or not on the same file. Moving it outside the creation of the collection will result for most developers to forget to disable/enable oplog tailing. This has the benefit of adding it as a snippet/template for Collection files. Just an idea and of course if it is even possible [addendum] |
@zodern: Thank you, I will have a look into EDIT: @zodern: That was really helpful, thank you! I now have implemented, that in these cases Meteor indeed now falls back to longpolling! (see here) |
…d by the Oplog. We now also fall back to long pooling in these cases too.
It may be a visibly long operation, but it should not be an issue - oplog collection is usually already in memory, so it'll be really fast to grep through.
It could, sure. I'd rather make it in a separate step to get this PR out sooner. |
@radekmie thanks for your feedback! PR updated! |
- fixed some tests in collection_tests.js which were not able to be re-run, because of already existing Mongo collections (collection names were missing a "+ test.id")
@radekmie one more update done! Thanks again for your feedback! |
Hello everyone! Thanks for approving the PR @radekmie ! |
I would be for Meteor 2.16 with #12436 asap. |
You can already use this without waiting for a new release since the changes are in the package and not in the core |
How? The version of the package was not released yet, no? Sorry, don't fully understand how this works! Also, I forget to increase the version number of the What exactly do I need to do to use this version of the |
https://guide.meteor.com/writing-atmosphere-packages#local-packages |
That won't work here! I tried to install my package as a local package! The issue is, that all the other packages still used the "default mongo package" behind the scenes, and not my "custom" local package. Couldn't get it to work when I tried it. |
Short of releasing this package, the things that can be tested are:
|
If you use meteor from local checkout with this branch that should do the trick if local package override is failing. |
Any updates on when the release of 2.16 will happen? Hopefully incl. this PR please! ✌️ |
Updates? Is there a planned release date for 2.16 yet? We really need this change asap ... |
No estimated date yet. First, we're focused on the 3.x RC. Meanwhile, we aim to merge candidates and make quick gains into the 2.16 branch. The PR has been updated to provide more details about the plan. We'll announce a date as soon as we have one. |
Hello all,
Following several conversations on the drawbacks of Meteor's oplog implementation and its limitations when it comes to scaling (see: my forums post, a slightly older blog post and a 2 year old github discussion), I investigated in the core a bit and propose an easy solution which, I hope, should work for many users who have issues with this.
Please consider our situation at Orderlion: We have several collections, where huge amounts of data are being updated on a daily basis. On all of these collections we have not a single publication, for these we don't care about any reactivity.
Due to Meteor's Oplog implementation, even in these cases a big batch insert/update of data results often in very big sustained CPU spikes on the meteor app server.
Internal tests I did with my suggested implementation have shown, that the CPU load due to Oplog tailing collections with big batch updates, can be reduced about 100x with just completely ignoring those collections.
During an import of about 600k documents in a test collection, I could reduce the CPU load from an average 25-30% down to approx. 0,2 - 0,3%.
The usage is actually very straight forward: Consider a
settings.json
file like this:Please feel free to comment but I really hope this can be merged soon. It would help us at Orderlion a lot and solve many unwanted server restarts (server is auto-restarting because it becomes unresponsive due to the sustained high CPU load).
Cheers, Patrick