Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Make sure dequeued batches have a minimum size #495
It seems that some outputs, like the Elasticsearch one (though clearly not all of them) would benefit if rsyslog makes sure it batches multiple messages instead of sending many small batches under light load. The worst scenario being many rsyslog instances on many machines hammering a small ES cluster with 1-doc bulks.
@rgerhards says it can be done in the queue engine so that we have best of both worlds: send as fast as we can - like we do now - for some outputs, and ensure a minimum batch size (or maybe some other solution?) for outputs like omelasticsearch. This issue is more like a reminder for him :) Reference mailing list thread: http://search-devops.com/m/PamuZV4TVQ1M0AJg&subj=Re+rsyslog+Can+we+have+a+minimum+bulk+size+for+omelasticsearch+
rough idea: add config params to set minimum messages m and timeout t. In queue dequeue operation, iterate until number of messages pulled n is at last m. If no data present in n<m situation, wait on notempty signal, releasing queue mutex, with timeout t. If t expired, re-check if empty, if so finish batch, else continute iteration. We need to compute t once at begin of function, so that successive iterations do not increase the overall timeout.