Skip to content

jcaguirre89/twitter-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter search tool

Simple command line tool to interact with Twitter's search API. Built on top of the python-twitter to provide a simpler interface just to the GetSearch method, built mostly because I needed a tool for historical searches. It accepts a list of terms, a language and a start ID and searches historical tweets (can't go back further than 7 days as that's the oldest the open Search API will go).

Once done it saves a pickled pandas dataframe with the resulting tweets. Also saves intermediate checkpoints (every 50k by default) in case the program crashes for any reason. It can take a long time to run, as the API has a rate limit and python-twitter will sleep when it's reached, which happens about every 5k tweets downloaded. for a 1.5M download it took around 20 hours to run (probably 90% of this time was spent sleeping anyway).

Sample usage:

# Search for all tweets that have the terms 'Chile' or 'Santiago', in spanish, going as far back as possible
python get_tweets.py --terms Chile,Santiago --lang es

Only mandatory argument is --terms, which must be a comma-separated string. Additional arguments are --lang for the language and --start_id to define how far back to search. The defaults are:

  • start_id: As far back as possible (around 7 days)
  • lang: en

Requirements

Must create a secrets.py file in the working directory with the following form:

from collections import namedtuple


ApiKey = namedtuple('ApiKey', [
    'CONSUMER_KEY',
    'CONSUMER_SECRET',
    'ACCESS_TOKEN',
    'ACCESS_TOKEN_SECRET'
])

# Replace these strings with the corresponding keys/tokens
api_key = ApiKey(
    'consumer-key',
    'consumer-secret',
    'access-token',
    'access-token-secret',
)

About

Scrapes twitter for a given term

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages