Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proxy_user, proxy_password and move IOError (EOFError) to its own rescue clause #59

Closed
wants to merge 2 commits into from

Conversation

guyboertje
Copy link

I also removed the integration tests, Twitter has blocked the Publisher App now - makes them unreliable and of little value.

Pere had a PR to add the proxy auth, I will close it #33.

A contributor submitted a PR a long time ago that changed the patch code to the twitter gem but I feel that it introduces a loop that can't check for stop #46.

IMHO the streaming part of this plugin needs to be rewritten because I don't think we via the twitter gem are handling the semantics of the streaming api very well especially not if via a proxy.

Either we rewrite or we remove proxy support.

@guyboertje
Copy link
Author

I tried mitmproxy (hit some https limitation in full vs relative urls after CONNECT), squid (nothing happens that I can tell, logs EOFError with retry) and OWASP ZAP (more traction, but GatewayTimeout and TooManyRequests errors only from Twitter). Note: that with ZAP I could connect to twitter from a browser after setting up the ca cert and proxy info in firefox. This on a MBP (time to get a linux laptop I think).

@guyboertje
Copy link
Author

guyboertje commented Oct 17, 2018

About the rewrite. I found that Twitter have a Java based Streaming Client with proxy support...

https://github.com/twitter/hbc/blob/master/hbc-example/src/main/java/com/twitter/hbc/example/FilterStreamExample.java

and an unofficial one that has seen more recent upkeep...

https://github.com/Twitter4J/Twitter4J/blob/master/twitter4j-examples/src/main/java/twitter4j/examples/stream/PrintFilterStream.java

@guyboertje guyboertje requested a review from jsvd October 17, 2018 10:31
@@ -98,11 +99,24 @@ class LogStash::Inputs::Twitter < LogStash::Inputs::Base
# Port where the proxy is listening, by default 3128 (squid)
config :proxy_port, :validate => :number, :default => 3128

# Username where the proxy is listening, by default 3128 (squid)
config :proxy_port, :validate => :number, :default => 3128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate config item here

# Duration in seconds to wait before retrying a connection when twitter responds with a 429 TooManyRequests
# In some cases the 'x-rate-limit-reset' header is not set in the response and <error>.rate_limit.reset_in
# is nil. If this occurs then we use the integer specified here. The default is 5 minutes.
config :rate_limit_reset_in, :validate => :number, :default => 300


# Duration in seconds to wait before retrying when Twitter client socket raises a EOF on network interruption
config :steam_eof_wait_duration, :validate => :number, :default => 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just implement an exponential backoff with no need for more configuration items?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, if only we had a exponential backoff widget.
I will add the backoff.

@@ -145,6 +159,10 @@ def run(queue)
@logger.warn("Twitter too many requests error, sleeping for #{sleep_for}s")
Stud.stoppable_sleep(sleep_for) { stop? }
retry
rescue IOError => e
@logger.info("Twitter error: #{e.message}, retry in #{steam_eof_wait_duration}s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe @logger.error here?

Suggested change
@logger.info("Twitter error: #{e.message}, retry in #{steam_eof_wait_duration}s")
@logger.error("Socket error, retrying in #{steam_eof_wait_duration}s", :error => e.message)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose INFO because it is not an error per se. When the user sees the ERROR log line they panic as if there is something that they need to do/fix. The message can be more INFO and not ERROR in its wording.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a good compromise with warn? :D Since that the ocurrence of this event means that there's no tweet ingestion I think we should do > info

proxy_address: @proxy_address,
proxy_port: @proxy_port,
}
uri = URI.parse('')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems the twitter client docs suggest you could just place the host and port into the hash as we're doing before, what is the need for creating the URI object here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread the part in the patch that uses the http gem's Request object. I though it was using the typhoeus gem and that uses a uri.

Still, now that I'm reading the http gem code I'm more confused.

@guyboertje
Copy link
Author

@jsvd
After reading through the http gem code I figured that we were not interacting with the proxy correctly. I changed this:

            if request.using_proxy?
              request.connect_using_proxy(socket)
            else
              socket = ssl_stream(socket)
              request.stream(socket)
            end

I used wireshark and owasp zap to check the before/after behaviour.
After:

  • with wireshark capturing on the loopback adapter, I see the "CONNECT" conversation between LS and the proxy.
  • with wireshark capturing on the wifi adapter, I see SSL conversations with twitter incl plenty of data (not readable) and keep alives.
  • I see no output in zap
  • I see no tweet events only Twitter error: End of file reached periodically.

Before:

  • with wireshark capturing on the loopback adapter, I see the "POST" to https://stream.twitter.com:443/1.1/statuses/filter.json.
  • with wireshark capturing on the wifi adapter, I see no conversations with twitter.
  • I see no output in zap
  • I see no tweet events only Twitter error: End of file reached periodically.

I can't say whether this plugin actually works with the types of proxies used in the enterprise.

I would like to park/close this PR until I have some time to experiment with the Java libraries mentioned earlier.

@guyboertje guyboertje closed this Oct 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants