Add proxy_user, proxy_password and move IOError (EOFError) to its own rescue clause #59

guyboertje · 2018-10-17T09:35:57Z

I also removed the integration tests, Twitter has blocked the Publisher App now - makes them unreliable and of little value.

Pere had a PR to add the proxy auth, I will close it #33.

A contributor submitted a PR a long time ago that changed the patch code to the twitter gem but I feel that it introduces a loop that can't check for stop #46.

IMHO the streaming part of this plugin needs to be rewritten because I don't think we via the twitter gem are handling the semantics of the streaming api very well especially not if via a proxy.

Either we rewrite or we remove proxy support.

Fixes logstash-plugins#58 Fixes logstash-plugins#39

guyboertje · 2018-10-17T09:46:34Z

I tried mitmproxy (hit some https limitation in full vs relative urls after CONNECT), squid (nothing happens that I can tell, logs EOFError with retry) and OWASP ZAP (more traction, but GatewayTimeout and TooManyRequests errors only from Twitter). Note: that with ZAP I could connect to twitter from a browser after setting up the ca cert and proxy info in firefox. This on a MBP (time to get a linux laptop I think).

guyboertje · 2018-10-17T09:47:47Z

About the rewrite. I found that Twitter have a Java based Streaming Client with proxy support...

https://github.com/twitter/hbc/blob/master/hbc-example/src/main/java/com/twitter/hbc/example/FilterStreamExample.java

and an unofficial one that has seen more recent upkeep...

https://github.com/Twitter4J/Twitter4J/blob/master/twitter4j-examples/src/main/java/twitter4j/examples/stream/PrintFilterStream.java

jsvd · 2018-10-17T10:32:27Z

lib/logstash/inputs/twitter.rb

@@ -98,11 +99,24 @@ class LogStash::Inputs::Twitter < LogStash::Inputs::Base
  # Port where the proxy is listening, by default 3128 (squid)
  config :proxy_port, :validate => :number, :default => 3128

+  # Username where the proxy is listening, by default 3128 (squid)
+  config :proxy_port, :validate => :number, :default => 3128


duplicate config item here

jsvd · 2018-10-17T10:33:15Z

lib/logstash/inputs/twitter.rb

  # Duration in seconds to wait before retrying a connection when twitter responds with a 429 TooManyRequests
  # In some cases the 'x-rate-limit-reset' header is not set in the response and <error>.rate_limit.reset_in
  # is nil. If this occurs then we use the integer specified here. The default is 5 minutes.
  config :rate_limit_reset_in, :validate => :number, :default => 300

+
+  # Duration in seconds to wait before retrying when Twitter client socket raises a EOF on network interruption
+  config :steam_eof_wait_duration, :validate => :number, :default => 30


should we just implement an exponential backoff with no need for more configuration items?

Now, if only we had a exponential backoff widget.
I will add the backoff.

jsvd · 2018-10-17T10:34:30Z

lib/logstash/inputs/twitter.rb

@@ -145,6 +159,10 @@ def run(queue)
      @logger.warn("Twitter too many requests error, sleeping for #{sleep_for}s")
      Stud.stoppable_sleep(sleep_for) { stop? }
      retry
+    rescue IOError => e
+      @logger.info("Twitter error: #{e.message}, retry in #{steam_eof_wait_duration}s")


maybe @logger.error here?

Suggested change

@logger.info("Twitter error: #{e.message}, retry in #{steam_eof_wait_duration}s")

@logger.error("Socket error, retrying in #{steam_eof_wait_duration}s", :error => e.message)

I chose INFO because it is not an error per se. When the user sees the ERROR log line they panic as if there is something that they need to do/fix. The message can be more INFO and not ERROR in its wording.

maybe a good compromise with warn? :D Since that the ocurrence of this event means that there's no tweet ingestion I think we should do > info

jsvd · 2018-10-17T10:49:24Z

lib/logstash/inputs/twitter.rb

-        proxy_address: @proxy_address,
-        proxy_port: @proxy_port,
-      }
+      uri = URI.parse('')


it seems the twitter client docs suggest you could just place the host and port into the hash as we're doing before, what is the need for creating the URI object here?

I misread the part in the patch that uses the http gem's Request object. I though it was using the typhoeus gem and that uses a uri.

Still, now that I'm reading the http gem code I'm more confused.

guyboertje · 2018-10-18T09:47:44Z

@jsvd
After reading through the http gem code I figured that we were not interacting with the proxy correctly. I changed this:

            if request.using_proxy?
              request.connect_using_proxy(socket)
            else
              socket = ssl_stream(socket)
              request.stream(socket)
            end

I used wireshark and owasp zap to check the before/after behaviour.
After:

with wireshark capturing on the loopback adapter, I see the "CONNECT" conversation between LS and the proxy.
with wireshark capturing on the wifi adapter, I see SSL conversations with twitter incl plenty of data (not readable) and keep alives.
I see no output in zap
I see no tweet events only Twitter error: End of file reached periodically.

Before:

with wireshark capturing on the loopback adapter, I see the "POST" to https://stream.twitter.com:443/1.1/statuses/filter.json.
with wireshark capturing on the wifi adapter, I see no conversations with twitter.
I see no output in zap
I see no tweet events only Twitter error: End of file reached periodically.

I can't say whether this plugin actually works with the types of proxies used in the enterprise.

I would like to park/close this PR until I have some time to experiment with the Java libraries mentioned earlier.

Guy Boertje added 2 commits October 15, 2018 11:38

remove integration specs as Twitter has blocked the test accounts

158b129

Add proxy user and password. Add rescue clause for EOFError and IOError

2278c20

Fixes logstash-plugins#58 Fixes logstash-plugins#39

guyboertje requested a review from jsvd October 17, 2018 10:31

jsvd reviewed Oct 17, 2018

View reviewed changes

guyboertje closed this Oct 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proxy_user, proxy_password and move IOError (EOFError) to its own rescue clause #59

Add proxy_user, proxy_password and move IOError (EOFError) to its own rescue clause #59

guyboertje commented Oct 17, 2018

guyboertje commented Oct 17, 2018

guyboertje commented Oct 17, 2018 •

edited

jsvd Oct 17, 2018

jsvd Oct 17, 2018

guyboertje Oct 17, 2018

jsvd Oct 17, 2018

guyboertje Oct 17, 2018

jsvd Oct 17, 2018

jsvd Oct 17, 2018

guyboertje Oct 17, 2018

guyboertje commented Oct 18, 2018

	@logger.info("Twitter error: #{e.message}, retry in #{steam_eof_wait_duration}s")
	@logger.error("Socket error, retrying in #{steam_eof_wait_duration}s", :error => e.message)

Add proxy_user, proxy_password and move IOError (EOFError) to its own rescue clause #59

Add proxy_user, proxy_password and move IOError (EOFError) to its own rescue clause #59

Conversation

guyboertje commented Oct 17, 2018

guyboertje commented Oct 17, 2018

guyboertje commented Oct 17, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guyboertje commented Oct 18, 2018

guyboertje commented Oct 17, 2018 •

edited