Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix streams timeout & hangup behavior + ensure python2.6 compat #203

Merged
merged 8 commits into from Feb 17, 2014

Conversation

RouxRC
Copy link
Member

@RouxRC RouxRC commented Feb 16, 2014

Here are the changes to fix the misbehavior of the timeout in case of low tweets by catching Twitter's keep-alive heartbeat signals thanks to the select.select originally added in #178 and still problematic as pointed out by @ksecrist in #202

I also generalized the hangup to all cases since there is no reason to stay in infinite loop after a hangup in non-blocking mode.

And to make things easier and avoid merging issues, I adapted the refacto and fixed python2.6 compatibility from @adonoho's #201

@RouxRC
Copy link
Member Author

RouxRC commented Feb 16, 2014

FYI, here is the script I use to test the different combinations of modes/traffic/endpoint/python_version. I've successfully tried timeout and hangup situations on most combinations. Here are examples to run different possible combinations (assuming you have a simple keys.py file with your KEY, SECRET, OAUTH_TOKEN, OAUTH_SECRET set in the same dir):

python2.6 test_stream.py block low regular
python2.7 test_stream.py noblock low regular 
python3.2 test_stream.py noblock high user
python2.6 test_stream.py timeout20 low+ user
python2.6 test_stream.py timeout60 low+ user
...

A good way to test Twitter's keepalive signal is to use timeout lower than 30 (which seems to be the average seconds between two signals) on the lowtraffic and then to 60 (twitter recommands 90) to understand the problem @ksecrist was pointing out.

Interesting thing as well is that with the userstream, it turns out we can apparently run multiple connections at once (it refuses after a various total number which seems to be lower when in timeout mode)

# -*- coding: utf-8 -*-
import sys
from twitter import OAuth, TwitterStream
from keys import KEY, SECRET, OAUTH_TOKEN, OAUTH_SECRET
# USAGE:
# python test_stream.py [block|noblock|timeout] [low|high|high][+] [regular|user]
if len(sys.argv) > 3:
    stream = sys.argv[3].lower()
else: stream = "regular"
if len(sys.argv) > 2:
    traffic = sys.argv[2].lower()
else: traffic = "low"
if len(sys.argv) > 1:
    mode = sys.argv[1].lower()
else: mode = 'block'
args = {'auth': OAuth(OAUTH_TOKEN,OAUTH_SECRET,KEY,SECRET)}
qargs = {}
if mode == 'noblock':
    args['block'] = False
if mode.startswith('timeout'):
    args['timeout'] = int(mode.replace('timeout', ''))
if stream == 'user':
    args['domain'] = 'userstream.twitter.com'
    method = 'user'
else: method = 'statuses/filter'
if traffic.endswith('+'):
    qargs['delimited'] = 'length'
if traffic.startswith('high'):
    qargs['track'] = 'bieber,gaga'
else: qargs['track'] = 'youpiiiiii'
ct = 0
for a in getattr(TwitterStream(**args), method)(**qargs):
    if not a:
        continue
    ct += 1
    if isinstance(a, dict):
        if "id_str" in a:
            print ct, a['id_str'], a['text']
        elif "friends" in a:
            print len(a['friends'])
        else:
            print a
    else:
        print a 

@RouxRC
Copy link
Member Author

RouxRC commented Feb 16, 2014

I've versionned this as a gist for further use https://gist.github.com/RouxRC/9028951

@sixohsix
Copy link
Collaborator

@RouxRC This patch causes twitter-stream-example to fail. On master it works fine (Python 3.3 and 2.7). When this PR is checked out I get this:

Traceback (most recent call last):
  File "/Users/mike/.venvs2.7/twitter_master/bin/twitter-stream-example", line 9, in <module>
    load_entry_point('twitter==1.13.1', 'console_scripts', 'twitter-stream-example')()
  File "/Users/mike/projects/twitter/twitter/stream_example.py", line 56, in main
    if tweet.get('text'):
AttributeError: 'NoneType' object has no attribute 'get'

The TwitterStream is returning None objects even though I did not specify non-blocking, or a timeout. Perhaps None is being used as a number (where it is interpreted as 0)?

@adonoho This branch seems to include your work from the other PR. Are you willing to close that one? Further, are there any specific tests you'd like to run with this PR to ensure that it supports your needs?

@RouxRC
Copy link
Member Author

RouxRC commented Feb 16, 2014

@sixohsix I guess when ran without option, twitter-stream-example runs with a timeout of 60 looking at here https://github.com/sixohsix/twitter/blob/master/twitter/stream_example.py#L48
I added the fact that with a timeout the stream should yield none as in non blocking mode
I believe this "break" was already the case when using block=False which nis not actually tried in this sample tryer.
I've always used a if not data: continue in my uses of the iterator stream because of this.
I believe in a timeout way of thinking it is more logical to send messages to the user at all times by yielding none as keepalive messages but this is a design choice to make.
So I'd say it's your choice:

@adonoho what do you think?

@sixohsix
Copy link
Collaborator

@RouxRC the weird thing is that it returned none immediately, not after 60 sec. If you prevent the immediate return, I think that's fine.

If I explicitly set timeout=None will I get None results.? I would prefer not to get them. But if so, that can be fixed in another PR.

@RouxRC
Copy link
Member Author

RouxRC commented Feb 16, 2014

@sixohsix If you just remove , timeout=60.0 from line 49 of https://github.com/sixohsix/twitter/blob/master/twitter/stream_example.py#L48 it works

@RouxRC
Copy link
Member Author

RouxRC commented Feb 16, 2014

Regarding the first yield, this is due to the fact the read buffer loop happens before the read socket one, so at the first start it necessarily yeilds none except if we add some specific logic. This was the current behavior already with non-blocking mode, I don't feel like this is a real issue that would need a specific if

@sixohsix
Copy link
Collaborator

I believe it should be fixed. It utterly broke the test script. How many other users will update and discover their scripts immediately fail, despite never failing before?

I can fix it myself later if you're busy, but I want to remove that initial None before I put this in the next release.

Sorry for brevity. Writing from my phone.

@RouxRC
Copy link
Member Author

RouxRC commented Feb 16, 2014

In this case, I believe it's better to back port to the no None yielded in timeout mode. This is the only way to ensure total portability to the existing users since removing only the first yield won't change the fact they will get other ones later when they might not expect them.
That's why I was asking. As i said, It sounds to me like such yield would make sense in a timeout case but again this is totally a feeling question, so i'll rather backport, and instead think about ways to further allow in a different PR the possibility to use a both timeout and non-blocking mode that would behave as such (right now using timeout forces the use of blocking mode however the option is set).

@sixohsix
Copy link
Collaborator

Ahh now I understand! Yes I'd like you to backport. Thank you!

@RouxRC
Copy link
Member Author

RouxRC commented Feb 16, 2014

done. I kept the change in the twitter-stream-sample though since It sounds more logical to me that the sample would use the same mode for each optional stream (the user and sitestream ones are already using the regular blocking mode, public one was the only one set with timeout)

@adonoho
Copy link
Contributor

adonoho commented Feb 17, 2014

@RouxRC/Benjamin,

While we have very different styles of coding, I've been very respectful of your code and ask the same in return. Making changes to my pull requests without any public justification is not an OK process.

I've chosen my variable names and order of tests for specific reasons. If you are going to pull code out of my patches, please stick with the order I've chosen. For example, you've changed the sense of the boolean from modern_python to python26. The common case, modern_python, should be the True branch of the if statement. It needs to be easily apparent to the next programers that they can easily excise the Python v2.6 support when that time comes. Python v2.6 is the declining use case. It is obsolete technology and we should not be doing anything other than maintaining compatibility for it as we move towards the future. It should certainly not take the fast path in any branches in the core of the streaming class.

Please, be respectful. Return my variable names to what I chose. If you think they should be different, make the case for your opinion in this public fora. Let the rest of us judge your decision on the technical merits.

Anon,
Andrew

@RouxRC
Copy link
Member Author

RouxRC commented Feb 17, 2014

@adonoho I must admit I do not understand your reaction. I just reproposed your refacto since my PR was fixing a line from the code that was disappearing. Exactly as you did earlier with a previous concurrent PR of mine and which I never complained about even though I then had to repropose many things, and we are here now still repushing some fixes I did 2 months ago.

The only change I did from your PR was the change of the variable name, since I believe saying modern_python is a judgment choice which should not be reflected in some code. But I seriously do not care at all about the name and will definitely not lose my time fighting about this... Same about the order of the if, I'll change it right now just for your pleasure since you seem to take this very seriously...

Regarding versions of Python, I respect your position, just know that python3 users are highly less numerous than python2, and that python 2.6 is still the official release on various linux distributions like debian stable, which host thousands of production environments...

@RouxRC
Copy link
Member Author

RouxRC commented Feb 17, 2014

Apart from that, any mind on the real questions at stake here?

@sixohsix
Copy link
Collaborator

I've never had to set up contributor guidelines before, but this is the first time two people have been competing for bugfixes and questioning some fundamental stuff in this project. So, for what it's worth...

If you do build on top of someone else's PR, please fetch and checkout their branch instead of including the work as a diff. By doing that you will keep their credentials in the history, which makes git blame output more helpful. It's also polite.

It's my fault for not calling this out in the past. I'll keep my eyes on it better in the future. This PR is fine as it is, but it'll be the last one to do so.

I'm not hugely interested in variable naming controversies, but since it's become a hot topic, the name modern_python is vague and mean-spirited. Vague in that many people would argue that 3.4 is the only modern Python (myself included). Mean spirited in that it accuses anyone who disagrees as being "not modern". But having the most-common case as the first chunk of an if/else is preferable. So, python27_3 is fine. It's a bit awkward, but it is precise. Explicit is better than implicit.

As for supporting 2.6, I have no trouble doing that perpetually for the API component, so long as it doesn't slow down other work. Right now the extent of that support is about six lines of code. That's fine by me. 2.6 stays in the codebase.

I'll run a few last tests, and then, hopefully, merge and release this.

Again, thank you for your contributions, and your patience with me.

@RouxRC
Copy link
Member Author

RouxRC commented Feb 17, 2014

Thanks for your patience and openness Mike, and apologies for the situation which I really didn't expect at all and would have tried to avoid otherwise, sorry again about that.
I will definitely cherrypick other commits in the future if I'm ever in a similar situation.

sixohsix added a commit that referenced this pull request Feb 17, 2014
Fix streams timeout & hangup behavior + ensure python2.6 compat
Here are the changes to fix the misbehavior of the timeout in case of low tweets by catching Twitter's keep-alive heartbeat signals thanks to the select.select originally added in #178 and still problematic as pointed out by @ksecrist in #202

I also generalized the hangup to all cases since there is no reason to stay in infinite loop after a hangup in non-blocking mode. 

And to make things easier and avoid merging issues, I adapted the refacto and fixed python2.6 compatibility from @adonoho's #201
@sixohsix sixohsix merged commit 1d59e1e into python-twitter-tools:master Feb 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants