-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monstache stalling while indexing large collection holding 700K documents #509
Comments
Hi @jayminkapish what version of Elasticsearch do you have in production? I wonder from that error message if you might be running into a problem like the one described at elastic/elasticsearch#50670. You may want to compare the results of a call to This page may also be helpful https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html |
I should've provided these earlier:
Thanks for the pointers and will look at them. I am assuming you found our toml config just fine to sync entire mongo collection onto the elasticsearch cluster. |
Yes production data size is pretty big compared to staging and there are many many unique fields (the way mongo collection is designed). This must be adding to the dynamic mapping timeouts. We had similar issue with mongo-connector but adjusting bulk size got us to the finish line. We are going to try limiting the bulk size via |
I think we are just looking for ways to slow down monstache for the initial sync. |
@jayminkapish I'm having the same issue. looks like it's taking forever to sync the data. @rwynn any opinion? stats {
"Flushed": 280,
"Committed": 380,
"Indexed": 403,
"Created": 0,
"Updated": 0,
"Deleted": 0,
"Succeeded": 403,
"Failed": 0,
"Workers": [
{
"Queued": 0,
"LastDuration": 7000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 6000000
},
{
"Queued": 0,
"LastDuration": 6000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 7000000
},
{
"Queued": 0,
"LastDuration": 5000000
},
{
"Queued": 0,
"LastDuration": 4000000
}
]
}
Config:
|
We've paused sync since my last comment. We're hoping to resume this work in July. |
@rwynn any updates on this? |
@asmaaelk can you describe the error or behavior you are seeing? |
Monstache has been working really well for us in the staging environment past couple of weeks. It gave us a lot of excitement syncing 23K documents from staging database to staging elasticsearch cluster very quickly (< 10m). We then moved the deployment to production 3 days ago with the same configuration toml file as staging except the production collection size is very big. COLLECTION SIZE: 4.97GB and TOTAL DOCUMENTS: 713458
We are looking to sync entire mongo collection onto the elasticsearch cluster and then tail the oplog.
And we have the following env vars:
Monstache kicked off sync at higher rate but after just a few hours it seems to be stalling and only indexing 2-5 documents a minute. It logged
Direct reads completed
after about 6h.and stats timer logs stats
Monstache also logged the following error about 800 times in the first few hours of production sync kick off:
We've allocated 2 CPUs and 4gb memory to the monstache and it is hardly using 2% of it at the moment.
Can you tell us looking at the config what can we do to speed up the sync?
Thanks in advance.
The text was updated successfully, but these errors were encountered: