Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken userstream and all other streams supposed to break after Jan 13 + add image support + fix diverses issues #195

Closed
wants to merge 24 commits into from

Conversation

RouxRC
Copy link
Member

@RouxRC RouxRC commented Dec 27, 2013

Merry Christmas!... Twitter is moving all of its streaming APIs towards HTTP1.1 chunked encoded transfer and breaking the whole lib for us! https://dev.twitter.com/blog/deprecating-http-1.0-streaming-api

Userstream passed already mid-november and has been broken since, cf #190
I first didn't pay much more attention to it than reading the issue thread since I only use the public statuses.filter stream. But I just found out about this announce and browsing issues it became clear the userstream issue here was coming from this change, which did happen on November 23rd in fact (https://dev.twitter.com/discussions/23224#comment-53758) just as the issues reported here confirm.

The thing is the announce also says that on January 13, Twitter will apply it as well for all the other streaming calls, meaning they probably will all be broken as well.
So this motivated me to fix the issue for userstream and supposedly for the other ones further on.

The solution was to stop asking for gzip encoded data for streaming calls since they already answer not zipped with the HTTP1.0 calls (like statuses/filter for now) and should not be zipped when dealing with chunked transfer. This also required to adjust the parsing of the chunks to separate "delimited" length sent by twitter from actual data.

I've tried it out successfully on a combination of calls on (public stream HTTP1.0 | user stream HTTP1.1chunked) x (not blocking | blocking | with timeout) x (low traffic | high traffic | high traffic with argument 'delimited') with python 2.6 & 2.7 only. I don't believe I've modified anything incompatible but any tryout from py3 users would be great.

I also took the occasion to fix a couple other issues, including fixing some hanging when using the timeout with low traffic, providing TwitterHTTPErrors on streaming calls errors, fixing the broken follow create/destroy with cmdline and the irc bot, and completing the POST methods list.

Considering the urge, I will personnally migrate my IRC bot gazouilleur to my branch until this is pulled or handled otherwise.

git+https://github.com/RouxRC/twitter.git@fix-broken-streams

Happy tweeting holidays!

PS: Twitter also announced an SSL change for Jan 6, but since all the calls are already using secure by default, we should't be at risk here.

edit 28/12:
I've completed this with fixing the tilde sending issue for python 2, and I added the support for API calls to send images as discussed in #127 , it now works when feeding the corresponding argument with raw read of a file. For instance:

twitter.update_with_media(status="Youpi !", 'media[]' = open('img.jpg', 'rb').read())

Twitter is moving all of its APIs towards HTTP1.1.
https://dev.twitter.com/blog/deprecating-http-1.0-streaming-api
Userstream passed already mid-november and has been borken for this lib since
On Jan 6, Twitter will apply it as weel for all the other stream methods,
meaning they probably would be all broken.

The problem was in two pieces.
First with this commit, we stop asking for gzip encoded data for streaming
calls: for the HTPP1.0 calls, this was not taken into account anyway, but with
HTTP1.1 the answer is now chunked and should therefore not caught zipped.

(First part of fix for userstream and soon all streams cf python-twitter-tools#190 python-twitter-tools#116)
Chunked data sent via HTTP1.1 by Twitter is not as straight forward to read,
chunks indicating size of tweets are present no matter what, so we find the
json start in the buffer and forget the rest for now.

(Second part of correction for userstream and soon all streams. Fix python-twitter-tools#190 python-twitter-tools#116)
Instead of trashing it as in previous commit, return the sizes of the messages
sent by the API as regular messages if requested through the normal API
argument 'delimited=length'
https://dev.twitter.com/docs/streaming-apis/parameters#delimited

(Last part of userstream and other streams fix)
When running a stream with a low traffic, the current timeout logic was
hanging indefinitely without processing.
This rewritten logic seems to be working fine now with both http1.0 and 1.1
and active or low traffic.

@dkanygin: would love to get your code review on this (cf python-twitter-tools#178)
The existing error was never reached, luckily since it was using
undefined arguments. moved it in its right place and added handling
ssl errors as TwitterHTTPError.

(Fix python-twitter-tools#136)
Added all missing methods from https://dev.twitter.com/docs/api/1.1
Also included some of the streaming methods which work with both GET
and POST but accept arguments like "track" which can quickly require
POST.

(Closes python-twitter-tools#187 python-twitter-tools#145 python-twitter-tools#188)
Oups, the default for all other methods should be True, not False
So the calls to send media (update_with_media, update_profile_image,
update_profile_background_image and update_profile_banner) require
to send queries differently, as multipart form, and oauth needs to be
signed without taking any parameter into account.
This does the trick, following the rules here and there:
https://dev.twitter.com/docs/uploading-media
https://dev.twitter.com/docs/api/1.1/post/statuses/update_with_media
https://dev.twitter.com/discussions/1059

Closes python-twitter-tools#127
The fix python-twitter-tools#64 by @grahame was fixing the issue of sending tweets with
'~' for python 3 but not for python 2 due to urllib.urlencode's lack
of safe functions. Here's a dirty hack to do it anyway.
Any better fix welcome :)
convert non string before replacing in it...
(follows da33e70 )
Even when there is a low traffic, we know we didn't lose the
connection yet when ready_to_read indicates a signal from Twitter,
so we don't yield any timeout message and continue, but we should
at least yield None like in the non-blocking mode to allow
iterator's use to get the hand back and exit cleanly if required.

Completes 3df3413
Ping @dkanygin python-twitter-tools#178
TwitterHTTPError should not be tried to be made from SSLError, the
original behavior of just raising this is best, my bad...

(Follows 040f13e )
@RouxRC
Copy link
Member Author

RouxRC commented Jan 6, 2014

The deadline was pushed by Twitter from Jan 6 to Jan 13 : https://dev.twitter.com/blog/deprecating-http-1.0-streaming-api

@rspeer
Copy link

rspeer commented Jan 13, 2014

Thank you, RouxRC.

Would you recommend that those of us who need the streaming API switch to your fork?

@RouxRC
Copy link
Member Author

RouxRC commented Jan 13, 2014

Hi @rspeer, well apparently my fix was not enough, userstream was still broken which I just fixed with the previous commit.
When I started working on it, the filter stream seemed to still be working with the regular release, but now I get tweets with my fork and no more with the release, so yeah I guess until it is fixed in the main branch or pulled you can switch to my fork:

pip uninstall twitter
pip install git+https://github.com/RouxRC/twitter.git@fix-broken-streams

@rspeer
Copy link

rspeer commented Jan 13, 2014

There was a terrifying moment today when both the HTTP 1.0 and HTTP 1.1 streams went down. But I can confirm that, after that interruption, I could access the streaming API through RouxRC's branch.

@kendlete
Copy link

I had no idea about this change until this morning when the stream stopped working. RouxRC branch worked for me too.

@adonoho
Copy link
Contributor

adonoho commented Jan 14, 2014

RouxRC,

First thank you for your changes. I have a few comments.

  1. You have a lot of changes fixing several things in one pull request. Some address problems I'm having, others don't. In the future, please make your pull requests smaller. This will help you build a reputation as an awesome engineer and help folks embrace your changes.

  2. twitter is a project that runs on both Python2 and Python 3. When I attempted to use it under Python 3, it fails. In particular your most recent patch involving the re system is the cause. Being a new Pythonista, I hesitate to dig into such a new addition to the stack while Twitter is changing the rules underneath us. Please endeavor to keep this project working with both environments. It isn't hard when you're writing new code and it will improve your engineering skills. Currently, I've rolled back to using Python 2 but doing so has caused me to add code that was unnecessary in Python 3. I would prefer to use my simpler Python3 stack.

  3. You use a deprecated method in stream.py:

    return iter(TwitterJSONIter(handle, uri, arg_data, block, timeout=timeout,
         display_sizes=("delimited=length" in req.get_data().lower())))

req.get_data() is deprecated in the url lib. You may wish to use req.data instead. Also, it sometime returns None. That needs to be caught and handled properly. Calling .lower() on None causes a crash. This crash was the major breakage causing me to step through your changes.

Again, thank you for making your changes. They are helpful.

Anon,
Andrew

@RouxRC
Copy link
Member Author

RouxRC commented Jan 14, 2014

Hi @adonoho,

First of all, yes I regret having done all of these changes altogether. The thing is I started working with it and therefore kept completing my fix after having immegred myself into the code. But I should definitely have separated the commits into different pull requests per functionality, my bad.

Regarding Python 3, as I explained at first, I did not get a chance to try it with python 3 and that's definitely a mistake. I'm surprised though that the use of "re" would break python3 use, it is fairly standard generic code, could you explain what's breaking?

Finally, thanks for req.data and the None on lower thing, it does indeed work with both urllib's python3 and urllib2's python 2 now that get_data was removed in python 3.4, I'll add this fix in a commit right away.

@adonoho
Copy link
Contributor

adonoho commented Jan 14, 2014

RouxRC,

Allow me to say again, thank you for your changes. My comments were intended to be constructive criticism and I was hoping that you took them as such.

Large pull requests are hard for any project owner to digest. As Mr. Verdone has been silent during this little Twitter crisis, I can only assume he's busy with more pressing issues. Regardless, when he returns, a smaller pull request that just addresses the streaming issue would probably be welcome. (I'm not opining on the worthiness of any of the changes. This is just a process observation about how to help the project owner digest your changes.)

I will dig into what was happening with Python 3 later today. In particular, there was something weird causing Python to crash using strings. As that is the major difference between the two languages and you did not explicitly test with Python 3, I just rolled back to Python 2 to get my stream back online. I'll get you more data soon. Allow me to add though that putting the regular expression engine into the middle of tweet processing is probably not an efficient solution. What are you trying to really do here? What was the motivation for this fix? And what other ideas did you examine before you fired up the re engine?

I will help you track this down. Nonetheless, when one submits pull requests to a library that publicly and not accidentally supports both Python 2 & 3, then good engineering dictates that one should test with both worlds. I want to encourage you to be the best engineer you can be and not just a coder.

Again, thank you for your changes.

Anon,
Andrew

@RouxRC
Copy link
Member Author

RouxRC commented Jan 14, 2014

Andrew, as I already said, I totally understand the pain of such a big pull request and I certainly should have try and make it more atomic. A few of these commits could certainly be taken totally separately as other pull requests. But others also build corrections on top of the new streaming handling. So splitting the lot is going to be time-consuming and I won't have the time in the coming days to resplit it soon unfortunately.

Regarding re, the choice was made in order to handle weird hexadecimals lines by which begin some paquets sent by Twitter through certain streams (the userstream especially) looking like that sometimes two sometimes four characters:

7d3
{status:"....

My issue was to remove them and only them (without removing the delimited length of tweets that twitter sends when delimited=true is set and which looks exactly the same). This regexp use does the trick by catching heading hexa and hexa-like on a separate line but not the delimited length that would start the packet after the removal of the first hexa head. But indeed using regexp can be core-consuming and maybe some other logic could replace it more efficiently. Any proposition is very welcome.

Again I did all of this as an urgent fix when realizing the library was gonna break, because I'm using the library and wanted my streams to keep working, there may be better way to address all these problems and I'll be happy to use them if they're implemented another way.

boogheta and others added 3 commits January 15, 2014 17:59
Update to use OAuth, take in command line arguments and modify the imports to function from within the module.
@geeknik
Copy link
Contributor

geeknik commented Jan 16, 2014

I removed the old twitter package using pip and installed your twitter package, and the ircbot connects and joins but does nothing. In the console, I see this error:

Exception while querying twitter:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twitter/ircbot.py", line 168, in check_statuses
updates = reversed(self.twitter.statuses.friends_timeline())
File "/usr/local/lib/python2.7/dist-packages/twitter/api.py", line 209, in call
return self._handle_response(req, uri, arg_data, _timeout)
File "/usr/local/lib/python2.7/dist-packages/twitter/api.py", line 240, in _handle_response
raise TwitterHTTPError(e, uri, self.format, arg_data)
TwitterHTTPError: Twitter sent status 410 for URL: 1/statuses/friends_timeline.json using parameters: (oauth_consumer_key=xxx&oauth_nonce=xxx&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1389846426&oauth_token=xxx&oauth_version=1.0&oauth_signature=xxx)
details: {"errors": [{"message": "The Twitter REST API v1 is no longer active. Please migrate to API v1.1. https://dev.twitter.com/docs/api/1.1/overview.", "code": 68}]}

I verified that /usr/local/lib/python2.7/dist-packages/twitter/ was gone after I ran 'pip uninstall twitter' so there shouldn't be any old files hanging around unless I'm missing something else.

@RouxRC
Copy link
Member Author

RouxRC commented Jan 16, 2014

Hello @geeknik I don't know much about the package's ircbot, but looking at the error and the code, it looks like it is specifying the api_version to 1 instead of using the default 1.1 (and therefore I guess the bot was probably not working much more with the old version).

It's easy to fix, I've just included a commit which should resolve it. You can also just delete line 151 of twitter/ircbot.py

@geeknik
Copy link
Contributor

geeknik commented Jan 16, 2014

Thanks @RouxRC, that solved one problem, but now we've got another one and I don't know anything about Twitter's changes to be able to fix this one on my own:

Exception while querying twitter:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twitter/ircbot.py", line 167, in check_statuses
updates = reversed(self.twitter.statuses.friends_timeline())
File "/usr/local/lib/python2.7/dist-packages/twitter/api.py", line 238, in call
return self._handle_response(req, uri, arg_data, _timeout)
File "/usr/local/lib/python2.7/dist-packages/twitter/api.py", line 269, in _handle_response
raise TwitterHTTPError(e, uri, self.format, arg_data)
TwitterHTTPError: Twitter sent status 404 for URL: 1.1/statuses/friends_timeline.json using parameters: (oauth_consumer_key=xxx&oauth_nonce=xxx&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1389890991&oauth_token=xxx&oauth_version=1.0&oauth_signature=xxx)
details: {"errors":[{"message":"Sorry, that page does not exist","code":34}]}

Any ideas? =)

@geeknik
Copy link
Contributor

geeknik commented Jan 16, 2014

Thanks to @boogheta for solving my last group of errors, my IRC bot is back in business! ;)

@RouxRC
Copy link
Member Author

RouxRC commented Jan 16, 2014

@geeknik I've never used it but it looks to me like the ircbot part of this lib hasn't been maintained for a while, this call to friends_timeline is a method that was deprecated by Twitter in API v1.1 : https://dev.twitter.com/docs/api/get-statusesfriends_timeline
I just replaced it by its equivalent but there may be other deprecated calls in the code that will keep on crashing it.

But if you want a proper IRC bot using twitter, I can only point you to the project I'm maintaining which is my reason for helping on this library :) It's called gazouilleur with a lot of other features: http://github.com/RouxRC/gazouilleur

PS: @boogheta and myself are the same, it's just my professional account

@geeknik
Copy link
Contributor

geeknik commented Jan 16, 2014

@RouxRC The issue I posted is fixed by @boogheta's commit. I'll check yours out though, thanks!

@RouxRC
Copy link
Member Author

RouxRC commented Jan 16, 2014

As said in PS :  @boogheta and myself are the same, my message was just intented to explain the commit :)

@geeknik
Copy link
Contributor

geeknik commented Jan 16, 2014

Oh, sorry, I missed the PS. Thanks for your help! =)

@ghost
Copy link

ghost commented Jan 20, 2014

I don't know if this is connected with just this version, but I found, that searching for keyword phrases stopped working, for instance, if I am searching for "icecream" I get results, but searching for "ice cream" ( i.e. [...].statuses.filter(track = 'ice+cream') returns no results.

@RouxRC
Copy link
Member Author

RouxRC commented Feb 3, 2014

@gnembon Have you tried track='ice cream' ? I do not recall ever having to use "+" in the stream queries and I do get results without it when I do not with the +.

By the way, I believe @adonoho's way of dealing with the streams is definitely more efficient than what I did here and if you only need the streams fixed, you should probably use his branch (see #196).
Since I've been using the other features I added here in production myself, I will keep staying on this branch until @sixohsix pulls @adonoho's request, so I can then propose the other changes like image support on top of it.

@sixohsix
Copy link
Collaborator

sixohsix commented Feb 3, 2014

Hi @RouxRC , I cherry-picked your update to twitter-globals, and your authorship change since those were pretty free-standing. For the rest, resubmit whenever you have the time and inclination.

Thanks for your work! It is appreciated.

@RouxRC
Copy link
Member Author

RouxRC commented Feb 3, 2014

Closing this here and reopening with merged cherrypicks of the only needed commits (see #197)

@RouxRC RouxRC closed this Feb 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants