-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk insertion timeout #25
Comments
@miklosbarabas,
Ran 2 processes like that and it looked like when I killed one the other took over and was still able to connect and send bulk requests. There is a currently hard code 10s timeout on the http client given to use for elasticsearch https://github.com/rwynn/monstache/blob/master/monstache.go#L1101 I'm not sure if that's the problem or not. Do you also see this issue on a simple local install of elasticsearch? |
Just published new version with configurable timeout. Defaults now to 60 seconds. |
Sorry for the late response, and thank you for your fast reaction and for the new release! Some additions for the scenario:
Also the error was happening during the direct-read-namespaces stage (new oplog entries weren't synced at this point) and somehow made the ES cluster stucked: no error in logs, the REST endpoint seemingly up, tho apps could not connect on transport protocol (9300) anymore. The metrics showed normal resource consumption, the cluster was in green state, but not really fully usable... Finally after rereading the monstache documentation (:+1:) found the Thead link
So far the above mentioned scenario is working properly. I could even roll out your new version without any issue with other host's workers picking up the job! |
@rwynn I have some other questions, hope you don't mind if I put them down here as well:
Many thanks! |
@miklosbarabas,
I pushed a new release which should solve the ts field in the monstache collection. This was because it was recording timestamps from direct reads which it should not. Now the ts should only progress forward and be from the oplog data only. I also added some documentation about retry. Hopefully, this will explain it. Have you tried only running the direct reads via a cron job and not in the process with clustering and workers? The direct reads are good for catching up periodically. The long running processes can be tailing the oplog and not doing direct reads in a cluster. Another single process, non clustered, could be run periodically to do the direct reads and exit. |
@rwynn It is happening after around the same amount of documents indexed, which is ~3 million and only happening for this collection which is ~6 million. Tried to run only one monstache, non-worker and non-HA mode, just to direct-read the data and set to exit when it's done. Even tried to increase gtm values like this:
with Following the trace file, it looks like it really is inserting mostly 1-3 indices per request, but there are some times when there one or two larger (1000) bulk is appearing. |
Also after enabling The application stops after a while so direct-read finished it's job, but comparing the indexed document count in ES and the count of the collection from mongodb it still seemingly missing ~1mil docs. |
The oplog-time is not always available for direct reads. If the _id of the document is an ObjectID then it can be determined, because of the nature of ObjectIDs. If not, the timestamp will just be 0. |
My current guess is that the slowdown may be occuring because of the nature of the direct read query. It uses a $natural sort which might not be optimal. I can try to change this to an _id sort to see if it then uses the index to speed things up. |
I could give it a try straight away if you release one based on the _id sort! |
@miklosbarabas, just pushed a new release. Give that one a try and let me know if the problem persists. Thanks for reporting this issue. |
The new release includes an updated version of the github.com/rwynn/gtm library. That is where the changes related to the slowness issue were made. |
@miklosbarabas, any luck with the latest release? |
@rwynn: with the new release, the missing ~1million doc got inserted
properly, didn't experience the slowdown.
Thanks for all the help!!!
|
Glad to hear it helped. You might be able to squeeze out a little more performance with the workers option and multiple processes. That will increase, by a multiple of N workers, the overall query load on MongoDB but will decrease the Elasticsearch indexing load of each process by N-1/N . Depending on how well Mongo can handle the queries it might give you a boost overall. |
Hi!
I am running monstache with HA mode on, which goes fine for the first minutes, but then:
ERROR 2017/05/22 13:59:53 elastic: bulk processor "monstache" failed but will retry: Post http://elastichost:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Seems like it cannot reconnect to the cluster. The elasticsearch log only shows that it was updateing, but no errors.
Any idea why this could happen? Is there any timeout in monstache that can be set against the elasticsearch cluster? (like the mongodb timeout options)
Thanks for the help in advance!
M
The text was updated successfully, but these errors were encountered: