Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't busywait forever if Twitter hangs up on us. #194

Closed
wants to merge 20 commits into from
Closed

Don't busywait forever if Twitter hangs up on us. #194

wants to merge 20 commits into from

Conversation

alin-luminoso
Copy link

Twitter will sometimes disconnect you while you're listening to a stream. (In our experience this happens because the credentials you used are being used somewhere else.) Currently that causes the stream iterator to spin forever reading zero bytes from the socket. This patch causes it to return (ending the iterator) instead.

I am not sure what should happen in the case of timeout or non-blocking mode and did not attempt to address those.

alin-luminoso and others added 20 commits December 19, 2013 19:00
Twitter is moving all of its APIs towards HTTP1.1.
https://dev.twitter.com/blog/deprecating-http-1.0-streaming-api
Userstream passed already mid-november and has been borken for this lib since
On Jan 6, Twitter will apply it as weel for all the other stream methods,
meaning they probably would be all broken.

The problem was in two pieces.
First with this commit, we stop asking for gzip encoded data for streaming
calls: for the HTPP1.0 calls, this was not taken into account anyway, but with
HTTP1.1 the answer is now chunked and should therefore not caught zipped.

(First part of fix for userstream and soon all streams cf #190 #116)
Chunked data sent via HTTP1.1 by Twitter is not as straight forward to read,
chunks indicating size of tweets are present no matter what, so we find the
json start in the buffer and forget the rest for now.

(Second part of correction for userstream and soon all streams. Fix #190 #116)
Instead of trashing it as in previous commit, return the sizes of the messages
sent by the API as regular messages if requested through the normal API
argument 'delimited=length'
https://dev.twitter.com/docs/streaming-apis/parameters#delimited

(Last part of userstream and other streams fix)
When running a stream with a low traffic, the current timeout logic was
hanging indefinitely without processing.
This rewritten logic seems to be working fine now with both http1.0 and 1.1
and active or low traffic.

@dkanygin: would love to get your code review on this (cf #178)
The existing error was never reached, luckily since it was using
undefined arguments. moved it in its right place and added handling
ssl errors as TwitterHTTPError.

(Fix #136)
Added all missing methods from https://dev.twitter.com/docs/api/1.1
Also included some of the streaming methods which work with both GET
and POST but accept arguments like "track" which can quickly require
POST.

(Closes #187 #145 #188)
Oups, the default for all other methods should be True, not False
Adapt argument calls to the intended behavior

Fixes #177 #192 #126 #10
So the calls to send media (update_with_media, update_profile_image,
update_profile_background_image and update_profile_banner) require
to send queries differently, as multipart form, and oauth needs to be
signed without taking any parameter into account.
This does the trick, following the rules here and there:
https://dev.twitter.com/docs/uploading-media
https://dev.twitter.com/docs/api/1.1/post/statuses/update_with_media
https://dev.twitter.com/discussions/1059

Closes #127
The fix #64 by @grahame was fixing the issue of sending tweets with
'~' for python 3 but not for python 2 due to urllib.urlencode's lack
of safe functions. Here's a dirty hack to do it anyway.
Any better fix welcome :)
convert non string before replacing in it...
(follows da33e70 )
Even when there is a low traffic, we know we didn't lose the
connection yet when ready_to_read indicates a signal from Twitter,
so we don't yield any timeout message and continue, but we should
at least yield None like in the non-blocking mode to allow
iterator's use to get the hand back and exit cleanly if required.

Completes 3df3413
Ping @dkanygin #178
TwitterHTTPError should not be tried to be made from SSLError, the
original behavior of just raising this is best, my bad...

(Follows 040f13e )
@sixohsix
Copy link
Collaborator

sixohsix commented Feb 4, 2014

Hi,

After merging #196 and #197 , this PR doesn't merge correctly. Do the other two PR's fix the problem described here? If not, could you resubmit this PR?

Sorry for not getting to this for such a long time, but I do want to merge your work.

@RouxRC
Copy link
Member

RouxRC commented Feb 4, 2014

I believe the question addressed here is handled by the "hangup" part that @adonoho added to his stream fix, should be checked.

@alin-luminoso
Copy link
Author

I'll check soon and get back.

FYI, the hangups can be reproduced by attaching to a Twitter stream, and then separately attaching to another stream using the same credentials. Twitter notices and kicks you off.

@RouxRC
Copy link
Member

RouxRC commented Feb 4, 2014

indeed good call :)

@alin-luminoso
Copy link
Author

Okay, I checked what it does now. If you get disconnected, the iterator will yield {'hangup': True} repeatedly (forever).

This is certainly better than entering a busy-hang, but maybe not ideal; users have to notice the hangup message and break out themselves, or their own code will probably loop forever. It might be better to exit the iterator explicitly by adding a "return" after (or insted of) https://github.com/sixohsix/twitter/blob/master/twitter/stream.py#L87 .

Thanks!

@sixohsix
Copy link
Collaborator

sixohsix commented Feb 4, 2014

Yeah, it's not great that it yields forever. @adonoho can you explain some things here?

I'm a bit out of the loop on the specifics. It looks like we now have three "unusual" yields: None, {'timeout': True} and {'hangup': True}. The dict ones are new to me.

Originally None was specified as the result for a non-blocking read timeout, so I'm not sure what the difference is between that and {'timeout': True}. These are both recoverable conditions, right?

{'hangup': True} seems to be non-recoverable. I'd suggest replacing it with:

raise StopIteration('hangup')

Alternately, yield the dict, then return right after to stop iteration.

Does that make sense?

@sixohsix
Copy link
Collaborator

sixohsix commented Feb 4, 2014

Oh, check out #198 . I can barely keep up with y'all. 😄

@adonoho
Copy link
Contributor

adonoho commented Feb 4, 2014

The {'timeout': True} yields were already in the code. I added the {'hangup': True} to align with this pre-existing mechanism. I have pushed a fix that breaks out of the loop letting the normal generator termination code execute. I have run this code since about midnight last evening to no ill effect. It has not been tested under Python v2.7.6. I pushed it up because folks appear to need it now.

Anon,
Andrew

@sixohsix
Copy link
Collaborator

sixohsix commented Feb 4, 2014

Cool. I merged #198 and released it as 1.12.1. It's out there now. I'll look at the unusual yields later.

@alin-luminoso any further work, or are we cool to close this?

@adonoho
Copy link
Contributor

adonoho commented Feb 4, 2014

The yield None occurs only during non-blocking reads that do not result in valid JSON. IMO, it is probably the right answer for the polling case. That predated my involvement with the code. In the blocking or timeout cases, we need to decide if we want to throw exceptions or these non-standard dicts. I just went with the flow and added the hangup message to the already existing timeout message. Being a new Pythonista, I'm not sure which is the preferred method to signal this situation. A timeout is clearly an exceptional situation and should signal, IMO, via a Python exception. As there isn't really an end of stream error code in the HTTP spec, an HTTP exception is not semantically the right answer. It is an expected state of the protocol. As a result, breaking the loop silently is acceptable. If we choose to use exceptions, then I would bring recv_chunk back into the object and throw a TwitterHTTPError.

@RouxRC
Copy link
Member

RouxRC commented Feb 4, 2014

Hey guys, great work again!

Regarding the timeouts, we need to ensure it also works properly on low traffic queries. When I did my first tryouts, I tried both high traffic (track="bieber gaga") and low traffic (tracking whatever unusual word you like which you can then tweet after a few minutes without any tweet to test you still get tweets).

I can't recall exactly the behavior, but I remember the existing code was actually either breaking into an infinite loop sometimes, either sending plenty of timeouts even though the connection was still up, depending on using the timeout and non blocking methods (or reciprocally).

I'll try and take a bit of time to run my battery of combined different testcases on the new version this week-end, but it seems to me like this question should be checked before taking the risk to always throw exception and break connections when there might actually just not be any coming tweet in the pipe since the past "timeout" time. If eveyrthing is handled properly this shouldn't happen, since twitter still signals the connection is up to the sock in the timeout tester part.

Again this might already be the case, just felt like sharing my 2 cents warnings since I've been dealing a bit with this over last christmas :)

@alin-luminoso
Copy link
Author

#198 fixes the problem this patch was originally addressing. I don't know anything about the behavior on timeout @RouxRC was referring to, which might need a new issue. But this one can certainly be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants