-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have a problem with the logstash-input-http #76
Comments
Can you provide your configuration? I've done a basic test to see the speed of 3.0.8 in logstash 6.1:
On the producer side:
On the consumer side running a couple of minutes:
204k events per second on a local machine. |
@jsvd I'm sorry , I might not have described it clearly. I set up an elk cluster,X-pack is installed to monitor some key metrics,When I found that the data entry was very slow, I tried a separate test for login-http-input, configuration file: |
@jsvd Download a 6.10 version of logstash and then copy your example for testing(Base configuration but Installing the X-ray pack): |
this is really weird indeed. so we can test the exactly same thing, can you install/use the pv tool like I did in my test? producer:
consumer:
|
@jsvd Use the pv tool (probably the version I use is different from you, and my pv doesn't have the 'a' parameter): |
@jsvd @ph @colinsurprenant @elasticdog @karmi During the test, I also found several phenomena:
Through try many times, I feel the reason of the slow speed in logstash - HTTP - input here, increase the Java client send speed (about 500 events/s) if the HTTP input using the default configuration, will return to 429, using the generator generated by input data, have been SocketTimeout thrown I am very sorry for the use of @. I have been bothered by this question for many days. I hope to get your help and thanks again for the reply to JSVD! |
Please don't ping people directly, specially since I'm working on this with you. After some testing I'm also seeing a very low throughput, my numbers before were mislead because I was not measuring the dots codec output. I am investigating this, I'll report back when I know more. |
@Ccxlp can you try starting the http input receiver with an ip instead of host and check the speed? Try: |
@jsvd I just gave it a try: |
I'm seeing slow speeds as well when testing with the logstash as an event producer. However, using a tool like
siege test: 10579.81 trans/sec So around 8.5-10k eps |
@jsvd Thank you for your answer.However, for logsage-http-input and Java httpclient, the interaction is so slow that the processing of a single message requires 200ms, and by grabbing the software (fidder),You can see that log-http-input is slow to respond, what do you suggest? |
@jsvd |
@Ccxlp jsvd: Were you able to find any solution for this? |
I'm also experiencing this issue. Any updates? |
Can you try version 3.1.0 which replaces puma with netty? |
@jsvd 3.1.0 has issues with both ab and siege |
interesting, seems to be an issue with how ab does http 1.0 |
I tried siege locally and it worked:
|
I guess 16 seconds is not enough, try it with --time=2M with new plugin and old and a new one will fail, transaction rate is also lower in my testing bunch of connections in TIME_WAIT state |
my results with 3.1.0:
and with version 3.0.10:
|
This is strange indeed, a curl works correctly, siege works too, but ab fails:
|
Just some information as I was investigating this plugin for a number of weeks now and the performance is lower than expected, even with a new 3.1.0 version. I'm not sure what machine you have but even from your results this is also not exceptional. Replacing http input with generator I was able to get 2-3x more documents and by benchmarking elastic using rally I was seeing the same, that elastic can process much more documents than this plugin can provide. Testing the same by using TCP input this is again confirmed. In the end we have continued by writing custom Node.js HTTP and then HTTP/2 service using bulk api from elastic and throughput got up 3-4x times on the same system with the same identical data set which is also slightly processing data on the fly (parsing, modification, stringification). I looked into implementing HTTP/2 support for this plugin as netty does support it but this is beyond my scope and expertise and I'm not sure how much of a boost it would provide here. |
thanks for the feedback @pkoretic
This is certainly expected, generator performs no networking calls, no waiting, it's just a tight loop allocating logstash events and pushing them to the queue.
I'm not sure I understand the comparison here. the http input doesn't provide events, it just accepts http requests and pushes them to the logstash queue to be processed by filters+outputs.
again, I don't think I understand the comparison, if you're writing data to elasticsearch then the http input is not necessary. Is your architecture Also, the benchmarks you're performing for this plugin are based on each request having a single event. The one receiving data and showing event count per second:
Another instance generating events as fast as possible and pushing entire batches of events in a single request:
In my 13" mbp from 2018 I see about 75-85k e/s |
Sorry if I haven't explained this properly but we are not trying to do anything special. We were trying to use http input plugin because of the obvious reasons that we can reuse our logstash with http input being just one input provider and keeping all the filter and index logic which logstash provides. http input is, as expected, slower, but it should not be that slower which we easily confirmed by comparing generator, tcp, and http input with stdout and elastic outputs and then writing a Node.js service that mimics this plugin behavior. On an m5.2xlarge aws instance we had trouble getting over 30k e/s with this plugin. And we use and develop a lot http services so we easily noticed this one performs poorly. I would have to recheck but I also remember graylog with http gelf input, which is somewhat equivalent to this, performing as expected. Another issue was that this plugin returns busy if you use a number of clients that is higher than a number of threads (try with siege -c 10) which is not a proper reactor pattern and that should be solved by using netty and 3.1.0. |
the previous < 3.1.0 implementation supports multiple clients above the number of threads, but it responds immediately with 429 if those threads are busy. The netty implementation adds a queue to buffer bursts of traffic and micro delays.
building a service that performs the same task as the http input will always be much faster as logstash is a general purpose event pipeline with a lot of built in flexibility at the cost of performance. Can you talk a bit more about the load you're putting on http input? is it high volume per connection, or many connections ? or both? As for this plugin being slow, after the netty rewrite and pr #85 gets merged, we'll need java based codecs to be faster. That said, the plugin is at this point very light in what it does, it's a standard http netty processing pipeline, requests are read from the socket, transformed into HttpRequest objects, the content is decoded using the ruby codecs (the costliest part) and then pushed to the logstash queue. Another option here is to support http pipelining, we can use elastic/elasticsearch#8299 as inspiration. |
It's mostly a lot of connections but there are spikes when there is also a big volume of events per connection, an event is generally under 100 characters. In the end I personally think http/2 would also be a good direction given it has proper multiplexing support and binary data transfer and that would replace pipelining and compressed requests efforts for http/1.x. |
Thank you all for the great feedback. Since this issue has now diverged from the initial topic and the internal implementation has changed I'm going to close this. If you see a performance regression or any odd behaviour please open a new issue. |
I used the latest version of 3.0.8 about logstash ,The post request sent from the Java client httpclient is filtered through the logstash and placed in es,But the problem is that the request sent by httpclient enters the logstash at a very slow speed: 5/ s. Through fidder grab bag view, the information is as follows:
GotRequestHeaders: 20:44:57.472
ClientDoneRequest: 20:44:57.472
Determine Gateway: 0ms
DNS Lookup: 0ms
TCP/IP Connect: 0ms
HTTPS Handshake: 0ms
ServerConnected: 20:44:56.272
FiddlerBeginRequest: 20:44:57.472
ServerGotRequest: 20:44:57.472
ServerBeginResponse: 20:44:57.475
GotResponseHeaders: 20:44:57.475
ServerDoneResponse: 20:44:57.675
ClientBeginResponse: 20:44:57.675
ClientDoneResponse: 20:44:57.675
Overall Elapsed: 00:00:00.2030116
Response assembly header cost 200ms? At first I thought it was an httpclient problem, but when I sent messages to the tomcat server that was inside the Intranet, the speed was 500/s (single thread).Next, I use the curl script to send messages to HTTP input on the server where logstash is located, and it can also have 300/s speed.It is important to note that httpclient, logstash, and tomcat services are all in the same Intranet.
logstash.yml:
pipeline.output.workers: 8
pipeline.workers: 10
pipeline.batch.size: 500
Who can help me explain why this happens?
The text was updated successfully, but these errors were encountered: