-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
9.4 + 9.7 Twitter Search + Saving to MongoDB #212
Comments
In terms of why you never get more than 200 results, it is entirely possible that Twitter is limiting the search results to 200 as maximum value at this point in time (all subject to their platform operational capacity.) Per their own API docs[1], the code looks for the 'next_results' node in the response and bails out when it doesn't find it, since that's the way you're supposed to navigate to the next batch of results. In terms of why you always get 200 results instead of fewer results (say, 10 results or 100 results, or 142 results as specified by the Does that help? So, in the former, I think it's just a current (possibly semi-permanent -- who knows?) limitation of the Search API where Twitter has been known to adjust API responses as needed to maintain platform performance. I can't see a problem with the code as written, though maybe I just have a blind spot... In the latter, it's a mostly harmless bug where the list slice is missing. [1] https://dev.twitter.com/docs/api/1.1/get/search/tweets On Aug 10, 2014, at 11:45 AM, curtiswallen notifications@github.com wrote:
|
That makes sense. Thanks! So then, follow-up question: If I ran the request multiple times (scraping 200 tweets at a time), can I prevent the collection of duplicate results? Is there a way to pull a 'next_results' node from the last tweet stored to the DB? So I could crawl back through the history of the query? Or is that something I'll need to figure out on my own? ;-) |
The best advice I could offer at this very moment would be to carefully review the official Search API docs at https://dev.twitter.com/docs/api/1.1/get/search/tweets since the API client used in the code is literally just providing Pythonic wrapper to this API. In other words, that API doc is the authority, and we'd need to do the same tinkering experimenting that it sounds like you're already doing to get to the bottom of some of these things. I think your best bet is to probably make sure that tweets are keyed on their tweet_id so that you can trivially avoid duplicate results by effectively overwriting any pre-existing info that you'd get in subsequent batches. Or filter out duplicates at query time. Whichever is easiest for you. On Aug 10, 2014, at 12:05 PM, curtiswallen notifications@github.com wrote:
|
Cheers! Thanks so much, Matthew. Love love love the book, and I tremendously admire/appreciate both your activity on github and all the work you've done to make the concepts and content so accessible. Can't wait to see what's next! |
Thanks! So glad to hear it. Once you work through things some more, I'd On Sun, Aug 10, 2014 at 12:13 PM, curtiswallen notifications@github.com
|
Thank you Matthew for your amazing work. I had exactly the same issue of Curtis for twitter api exercises: 200 statuses returned. Now it seems working: indeed my problem was due to the fact the next_results was url-encoded twice and the hashtag #Obama became: So I replaced the statements below: kwargs = dict([ kv.split('=') with the following ones: next_results = urlparse.parse_qsl(next_results[1:]) Hope it can help. Waiting for your next books! |
Thanks so much for this update. I'll take a closer look and update the code in the repo soon. |
Thanks a lot LisaCastellano. Your solution works great for me! |
For some reason, no matter what value I pass for max_results it always collects 200 tweets, no more, no less.
Code:
In the terminal I get:
Then when I check the DB, 200 results. Every time.
I've tried passing "10" for max_results, still 200.
I've tried passing "1000" for max_results (as shown), still 200.
Thoughts?
The text was updated successfully, but these errors were encountered: