Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have a problem with the logstash-input-http #76

Closed
Ccxlp opened this issue Dec 18, 2017 · 27 comments
Closed

I have a problem with the logstash-input-http #76

Ccxlp opened this issue Dec 18, 2017 · 27 comments

Comments

@Ccxlp
Copy link

Ccxlp commented Dec 18, 2017

I used the latest version of 3.0.8 about logstash ,The post request sent from the Java client httpclient is filtered through the logstash and placed in es,But the problem is that the request sent by httpclient enters the logstash at a very slow speed: 5/ s. Through fidder grab bag view, the information is as follows:
GotRequestHeaders: 20:44:57.472
ClientDoneRequest: 20:44:57.472
Determine Gateway: 0ms
DNS Lookup: 0ms
TCP/IP Connect: 0ms
HTTPS Handshake: 0ms
ServerConnected: 20:44:56.272
FiddlerBeginRequest: 20:44:57.472
ServerGotRequest: 20:44:57.472
ServerBeginResponse: 20:44:57.475
GotResponseHeaders: 20:44:57.475
ServerDoneResponse: 20:44:57.675
ClientBeginResponse: 20:44:57.675
ClientDoneResponse: 20:44:57.675
Overall Elapsed: 00:00:00.2030116
Response assembly header cost 200ms? At first I thought it was an httpclient problem, but when I sent messages to the tomcat server that was inside the Intranet, the speed was 500/s (single thread).Next, I use the curl script to send messages to HTTP input on the server where logstash is located, and it can also have 300/s speed.It is important to note that httpclient, logstash, and tomcat services are all in the same Intranet.
logstash.yml:
pipeline.output.workers: 8
pipeline.workers: 10
pipeline.batch.size: 500
Who can help me explain why this happens?

@jsvd
Copy link
Member

jsvd commented Dec 18, 2017

Can you provide your configuration?

I've done a basic test to see the speed of 3.0.8 in logstash 6.1:

/tmp/logstash-6.1.0 % bin/logstash-plugin list --verbose http
logstash-input-http (3.0.8)

On the producer side:

/tmp/logstash-6.1.0 % bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://localhost:8080' http_method => post } }" --path.data data2

On the consumer side running a couple of minutes:

/tmp/logstash-6.1.0 % bin/logstash -e "input { http {port => 8080} }" | pv -War > /dev/null
[326KiB/s] [ 204KiB/s]

204k events per second on a local machine.

@Ccxlp
Copy link
Author

Ccxlp commented Dec 19, 2017

@jsvd I'm sorry , I might not have described it clearly. I set up an elk cluster,X-pack is installed to monitor some key metrics,When I found that the data entry was very slow, I tried a separate test for login-http-input, configuration file:
input{
http{
host => "0.0.0.0"
port => 10001
additional_codecs => {"application/json"=>"json"}
threads => 10
}
}
output{
stdout{
codec => dots
}
}.
In addition, the Java httpclient that previously sent HTTP is running on Windows.When the speed exception is found, instead of sending HTTP to the local tomcat server, which is a satisfactory 500 events/s.To test the throughput of loginch-http-input, I also tried to send curl HTTP at the same time:
function fun(){
for (( i = 0; i < 300000; i++ ));do
{
line="http://172.16.0.50:10001/ear/erererer"
isok=curl -s -X POST --data '{"clientHost":"172.16.0.50","requestTime":"11/Aug/2017:17:21:05 +0800","method":"post","page":"/openlayers3/index3","protocol":"HTTP/1.1","stateCode":"200","bytes":"3245"}' -w %{http_code} $line -o /dev/null
}
done
}
for((j = 0;j<5;j++));do
{
fun > /dev/null &
}
done
Throughput is 1000 events/s. This speed is not satisfactory, but it is also possible that curl sends speed limits.So then, inspired by you, I also tried the generator + http-output to generate data and
http-input + stdout {codec => dots} consumption data ,It's far from what you've given。
Thank you very much for your reply. I look forward to your reply!

@Ccxlp
Copy link
Author

Ccxlp commented Dec 19, 2017

@jsvd Download a 6.10 version of logstash and then copy your example for testing(Base configuration but Installing the X-ray pack):
On the producer side:
bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://localhost:8080' http_method => post } }" --path.data data2
On the consumer side running a couple of minutes:
bin/logstash -e "input { http {port => 8080} } " > /dev/null
View throughput through kibana:100 events/s ,
At the same time, some HTTP connection timeout errors are found in the logstash log:
[2017-12-19T17:02:53,307][ERROR][logstash.outputs.http ] [HTTP Output Failure] Could not fetch URL {:url=>"http://localhost:8080", :method=>:post, :body=>"{"@timestamp":"2017-12-19T09:02:22.640Z","message":"Hello world!","@Version":"1","host":"site50","sequence":3707}", :headers=>{"Content-Type"=>"application/json"}, :message=>"Read timed out", :class=>"Manticore::SocketTimeout", :backtrace=>nil, :will_retry=>true}
I really don't know why this gap has been created. It's frustrating

@jsvd
Copy link
Member

jsvd commented Dec 19, 2017

this is really weird indeed. so we can test the exactly same thing, can you install/use the pv tool like I did in my test?

producer:

bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2

consumer:

bin/logstash -e "input { http {port => 8080} }" | pv -War > /dev/null

@Ccxlp
Copy link
Author

Ccxlp commented Dec 19, 2017

@jsvd Use the pv tool (probably the version I use is different from you, and my pv doesn't have the 'a' parameter):
producer:
bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2
consumer:
bin/logstash -e "input { http {port => 8080} }" | pv -Wr > /dev/null
Stable data:[64.8KB/s]

@Ccxlp
Copy link
Author

Ccxlp commented Dec 19, 2017

@jsvd @ph @colinsurprenant @elasticdog @karmi During the test, I also found several phenomena:

  1. I used the 5.6 version before, when I tried to run the above example in version 5.6, the logstash started and generated the data for about two seconds before the OOM was reported:
    Error: Your application used more memory than the safety cap of 4G.
    Specify -J-Xmx####m to increase it (#### = cap size in MB).
    Specify -w for full OutOfMemoryError stack trace
  2. Increase the number of threads of logstash-http-input , increasing the proportion of processing speed.

Through try many times, I feel the reason of the slow speed in logstash - HTTP - input here, increase the Java client send speed (about 500 events/s) if the HTTP input using the default configuration, will return to 429, using the generator generated by input data, have been SocketTimeout thrown

I am very sorry for the use of @. I have been bothered by this question for many days. I hope to get your help and thanks again for the reply to JSVD!

@jsvd
Copy link
Member

jsvd commented Dec 19, 2017

Please don't ping people directly, specially since I'm working on this with you.

After some testing I'm also seeing a very low throughput, my numbers before were mislead because I was not measuring the dots codec output. I am investigating this, I'll report back when I know more.

@jsvd
Copy link
Member

jsvd commented Dec 19, 2017

@Ccxlp can you try starting the http input receiver with an ip instead of host and check the speed?
So instead of:
bin/logstash -e "input { http {port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null

Try:
bin/logstash -e "input { http {host => '127.0.0.1' port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null

@Ccxlp
Copy link
Author

Ccxlp commented Dec 19, 2017

@jsvd I just gave it a try:
producer:
bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2
consumer:
bin/logstash -e "input { http {host => '127.0.0.1' port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null
[100B/s],
What's the situation with you?

@jsvd
Copy link
Member

jsvd commented Dec 19, 2017

I'm seeing slow speeds as well when testing with the logstash as an event producer. However, using a tool like ab or siege I see normal speeds:

siege --content-type "application/json" 'http://127.0.0.1:8080 POST {"ids": ["1","2","3"]}'          
** SIEGE 4.0.4
** Preparing 25 concurrent users for battle.
The server is now under siege...^C
Lifting the server siege...
Transactions:		       89082 hits
Availability:		      100.00 %
Elapsed time:		        8.42 secs
Data transferred:	        0.17 MB
Response time:		        0.00 secs
Transaction rate:	    10579.81 trans/sec
Throughput:		        0.02 MB/sec
Concurrency:		        3.78
Successful transactions:       89082
Failed transactions:	           0
Longest transaction:	        0.02
Shortest transaction:	        0.00
ab -p data.json -n 100000 -T application/json -m POST -k "http://127.0.0.1:8080/"  
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /
Document Length:        2 bytes

Concurrency Level:      1
Time taken for tests:   11.720 seconds
Complete requests:      100000
Failed requests:        0
Keep-Alive requests:    100000
Total transferred:      9000000 bytes
Total body sent:        17400000
HTML transferred:       200000 bytes
Requests per second:    8532.58 [#/sec] (mean)
Time per request:       0.117 [ms] (mean)
Time per request:       0.117 [ms] (mean, across all concurrent requests)
Transfer rate:          749.93 [Kbytes/sec] received
                        1449.87 kb/s sent
                        2199.80 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.1      0       8
Waiting:        0    0   0.1      0       8
Total:          0    0   0.1      0       8

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      8 (longest request)

siege test: 10579.81 trans/sec
ab test: 8532.58 [#/sec] (mean)

So around 8.5-10k eps

@Ccxlp
Copy link
Author

Ccxlp commented Dec 20, 2017

@jsvd Thank you for your answer.However, for logsage-http-input and Java httpclient, the interaction is so slow that the processing of a single message requires 200ms, and by grabbing the software (fidder),You can see that log-http-input is slow to respond, what do you suggest?
fidder Caught HTTP packet:
GotRequestHeaders: 20:44:57.472
ClientDoneRequest: 20:44:57.472
Determine Gateway: 0ms
DNS Lookup: 0ms
TCP/IP Connect: 0ms
HTTPS Handshake: 0ms
ServerConnected: 20:44:56.272
FiddlerBeginRequest: 20:44:57.472
ServerGotRequest: 20:44:57.472
ServerBeginResponse: 20:44:57.475
GotResponseHeaders: 20:44:57.475
ServerDoneResponse: 20:44:57.675
ClientBeginResponse: 20:44:57.675
ClientDoneResponse: 20:44:57.675
Overall Elapsed: 00:00:00.2030116

@Ccxlp
Copy link
Author

Ccxlp commented Dec 20, 2017

@jsvd
Do you have other operations in the process of building the logstash system?
I used another virtual machine, download the 6.10 version, install jdk1.8.0 _74, no configuration java_home, but in logstash.lib.sh file specifies the JAVACMD, then keep the default configuration, use ab tools to implement your operation process, but my speed is still 25 b/s
Am I missing something?

@varun0002
Copy link

varun0002 commented Mar 27, 2018

@Ccxlp jsvd: Were you able to find any solution for this?

@troymevosus
Copy link

I'm also experiencing this issue. Any updates?

@jsvd
Copy link
Member

jsvd commented Apr 26, 2018

Can you try version 3.1.0 which replaces puma with netty?

@pkoretic
Copy link

@jsvd 3.1.0 has issues with both ab and siege
ab will stall after first few requests and siege will use up all the ports and fail, no issues like that with 3.0.10, any idea?

@jsvd
Copy link
Member

jsvd commented Apr 26, 2018

interesting, seems to be an issue with how ab does http 1.0
what is the problem with siege?

@jsvd
Copy link
Member

jsvd commented Apr 26, 2018

I tried siege locally and it worked:

% siege  http://localhost:8888
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...^C
Lifting the server siege...
Transactions:		       73917 hits
Availability:		      100.00 %
Elapsed time:		       16.64 secs
Data transferred:	        0.14 MB
Response time:		        0.00 secs
Transaction rate:	     4442.13 trans/sec
Throughput:		        0.01 MB/sec
Concurrency:		        9.86
Successful transactions:       73917
Failed transactions:	           0
Longest transaction:	        0.27
Shortest transaction:	        0.00

@pkoretic
Copy link

I guess 16 seconds is not enough, try it with --time=2M with new plugin and old and a new one will fail, transaction rate is also lower in my testing
keepalive or not:
Cannot assign requested address

bunch of connections in TIME_WAIT state

@jsvd
Copy link
Member

jsvd commented Apr 26, 2018

my results with 3.1.0:

% siege  http://localhost:8888 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:		     1297582 hits
Availability:		      100.00 %
Elapsed time:		      119.91 secs
Data transferred:	        2.47 MB
Response time:		        0.00 secs
Transaction rate:	    10821.30 trans/sec
Throughput:		        0.02 MB/sec
Concurrency:		        9.69
Successful transactions:     1297582
Failed transactions:	           0
Longest transaction:	        0.19
Shortest transaction:	        0.00

and with version 3.0.10:

siege  http://localhost:8889 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:		     1054378 hits
Availability:		      100.00 %
Elapsed time:		      119.70 secs
Data transferred:	        2.04 MB
Response time:		        0.00 secs
Transaction rate:	     8808.50 trans/sec
Throughput:		        0.02 MB/sec
Concurrency:		        4.76
Successful transactions:     1053448
Failed transactions:	           0
Longest transaction:	        0.28
Shortest transaction:	        0.00

@jsvd
Copy link
Member

jsvd commented Apr 26, 2018

This is strange indeed, a curl works correctly, siege works too, but ab fails:

% curl --http1.0 -v localhost:8888/ 
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8888 (#0)
> GET / HTTP/1.0
> Host: localhost:8888
> User-Agent: curl/7.54.0
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Length: 2
< Content-Type: text/plain
< 
* Closing connection 0
ok
% ab -n 100 -c 1 -s 5 http://127.0.0.1:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)...apr_pollset_poll: The timeout specified has expired (70007)

% siege  http://localhost:8888 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:		     1297582 hits
Availability:		      100.00 %
Elapsed time:		      119.91 secs
Data transferred:	        2.47 MB
Response time:		        0.00 secs
Transaction rate:	    10821.30 trans/sec
Throughput:		        0.02 MB/sec
Concurrency:		        9.69
Successful transactions:     1297582
Failed transactions:	           0
Longest transaction:	        0.19
Shortest transaction:	        0.00

@pkoretic
Copy link

pkoretic commented May 3, 2018

Just some information as I was investigating this plugin for a number of weeks now and the performance is lower than expected, even with a new 3.1.0 version. I'm not sure what machine you have but even from your results this is also not exceptional.
Just having this logstash http input and elastic output with no filters I was not able to get more than 10k trans/sec per second when playing with memory, batch size and worker count on my machine (i7 7700, 8GB RAM, Intel SSD) and logstash was using all the cpu it can get.

Replacing http input with generator I was able to get 2-3x more documents and by benchmarking elastic using rally I was seeing the same, that elastic can process much more documents than this plugin can provide. Testing the same by using TCP input this is again confirmed.

In the end we have continued by writing custom Node.js HTTP and then HTTP/2 service using bulk api from elastic and throughput got up 3-4x times on the same system with the same identical data set which is also slightly processing data on the fly (parsing, modification, stringification).

I looked into implementing HTTP/2 support for this plugin as netty does support it but this is beyond my scope and expertise and I'm not sure how much of a boost it would provide here.

@jsvd
Copy link
Member

jsvd commented May 3, 2018

thanks for the feedback @pkoretic

Replacing http input with generator I was able to get 2-3x more documents

This is certainly expected, generator performs no networking calls, no waiting, it's just a tight loop allocating logstash events and pushing them to the queue.

by benchmarking elastic using rally I was seeing the same, that elastic can process much more documents than this plugin can provide.

I'm not sure I understand the comparison here. the http input doesn't provide events, it just accepts http requests and pushes them to the logstash queue to be processed by filters+outputs.

In the end we have continued by writing custom Node.js HTTP and then HTTP/2 service using bulk api from elastic and throughput got up 3-4x times on the same system with the same identical data set which is also slightly processing data on the fly (parsing, modification, stringification).

again, I don't think I understand the comparison, if you're writing data to elasticsearch then the http input is not necessary.

Is your architecture http endpoint -> processing -> elasticsearch? for this you have ingest node in elasticsearch.

Also, the benchmarks you're performing for this plugin are based on each request having a single event.
You can send json document that is an array of objects, so the http input will produce one event per object.
You can see the performance boost of bulk vs single item by doing running two logstash instances:

The one receiving data and showing event count per second:

bin/logstash -e "input { http {} } output { stdout { codec => dots } }" | pv -War > /dev/null

Another instance generating events as fast as possible and pushing entire batches of events in a single request:

bin/logstash -e "input { generator {} } output { http { url => 'http://localhost:8080' format => json_batch http_method => post } }"

In my 13" mbp from 2018 I see about 75-85k e/s

@pkoretic
Copy link

pkoretic commented May 4, 2018

Sorry if I haven't explained this properly but we are not trying to do anything special.
And we did tests with batches of events in one request which, as expected, performs much better because, again, this http input is very slow so multiple evens confirm and workaround that but that also pushes batching logic to all clients.

We were trying to use http input plugin because of the obvious reasons that we can reuse our logstash with http input being just one input provider and keeping all the filter and index logic which logstash provides. http input is, as expected, slower, but it should not be that slower which we easily confirmed by comparing generator, tcp, and http input with stdout and elastic outputs and then writing a Node.js service that mimics this plugin behavior.

On an m5.2xlarge aws instance we had trouble getting over 30k e/s with this plugin. And we use and develop a lot http services so we easily noticed this one performs poorly. I would have to recheck but I also remember graylog with http gelf input, which is somewhat equivalent to this, performing as expected.

Another issue was that this plugin returns busy if you use a number of clients that is higher than a number of threads (try with siege -c 10) which is not a proper reactor pattern and that should be solved by using netty and 3.1.0.

@jsvd
Copy link
Member

jsvd commented May 4, 2018

Another issue was that this plugin returns busy if you use a number of clients that is higher than a number of threads (try with siege -c 10) which is not a proper reactor pattern and that should be solved by using netty and 3.1.0.

the previous < 3.1.0 implementation supports multiple clients above the number of threads, but it responds immediately with 429 if those threads are busy. The netty implementation adds a queue to buffer bursts of traffic and micro delays.

but it should not be that slower which we easily confirmed by comparing generator, tcp, and http input with stdout and elastic outputs and then writing a Node.js service that mimics this plugin behavior.

building a service that performs the same task as the http input will always be much faster as logstash is a general purpose event pipeline with a lot of built in flexibility at the cost of performance.

Can you talk a bit more about the load you're putting on http input? is it high volume per connection, or many connections ? or both?

As for this plugin being slow, after the netty rewrite and pr #85 gets merged, we'll need java based codecs to be faster. That said, the plugin is at this point very light in what it does, it's a standard http netty processing pipeline, requests are read from the socket, transformed into HttpRequest objects, the content is decoded using the ruby codecs (the costliest part) and then pushed to the logstash queue.

Another option here is to support http pipelining, we can use elastic/elasticsearch#8299 as inspiration.

@pkoretic
Copy link

pkoretic commented May 4, 2018

It's mostly a lot of connections but there are spikes when there is also a big volume of events per connection, an event is generally under 100 characters.
We have tried plain codec with filter to extract data and that gave some improvement but not as much as we would hoped, and I tried msgpack as well but that was abandoned for other reasons. I haven't tried plain codec with 3.1.0 though.

In the end I personally think http/2 would also be a good direction given it has proper multiplexing support and binary data transfer and that would replace pipelining and compressed requests efforts for http/1.x.
It should be noticeably faster and lighter both for data ingestion and processing speed and in our service the difference between http/1.x and http2 is more than visible.
But I do understand this could be a big task to implement given elastic itself doesn't support it yet, although netty is also used, so there is a reason for that I guess elastic/elasticsearch#10981.
Thanks for all the work on the plugin!

@jsvd
Copy link
Member

jsvd commented May 14, 2018

Thank you all for the great feedback. Since this issue has now diverged from the initial topic and the internal implementation has changed I'm going to close this. If you see a performance regression or any odd behaviour please open a new issue.

@jsvd jsvd closed this as completed May 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants