-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meta collection alternative #44
Comments
When you customize routing you cannot do a get without the routing info. Do you need to support deletes in mongo propagating to ES? If not then you don’t need meta. Inserts and updates are fine cause your JS sets the routing. On a delete all we have from mongo is the mongo Id which doesn’t give us the routing. |
What about ids search? It's probably not as fast as get but shouldn't be that much slower. |
Why do you think meta is an issue? Do you see errors? |
Are the meta upserts to mongo bulked? Not sure if that's a bottleneck, but syncing is very slow, I have yet to fully sync a db of 10mil docs without it breaking, i think the fix you did last night helped, it was able to sync to 4mil whereas before it could only do 2. Also it takes a lot of space on the db. |
Got you. Definitely could be bulked. But if the indexing count going up slowly? |
Ya i think it gets slower as the count goes higher. It's at 2.7 right now and I started syncing like 6 hrs ago. |
Can you try with direct-read-limit really high? Less queries. Read up on the direct-* options. Also I noticed from you comment yesterday about the error, the direct read query errored with a timeout. The query actually sorts the entire collection by _id and then seeks to the offset applies the limit. That is why I suggest a really high limit. Default is 5000 I think. That’s still 2000 queries and as it gets higher it has to seek past more documents so gets slower. |
Also did you up the thread pool on the ES side? And consider setting the refresh interval to -1? |
Are you using skip?
$gt: id should fix this |
Skip used yes. And $gt is a good idea. I wonder if $gt would work if someone used a strange Id like an object? Query would be like { _id : $gt: {x: 1 } }. Would have to try it cause it needs to work in general case. I guess if we’re sorting by _id it must work for any value of _id. |
I think using the range selector instead of skip is a huge performance gain! I’ll fix it and publish a new release on Monday. Thanks for your help! |
@benan789 git it another try with the latest release when you get a chance. I'm seeing collections with millions of documents getting synced pretty quickly now. |
Much better! Thank you for fixing it so fast! |
Does the meta collection serve any other purpose other than getting the routing info? If not, wouldn't it be better to just query elasticsearch for the routing info?
The text was updated successfully, but these errors were encountered: