Skip to content
Scrapes twitter for a given term
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
README.md
get_tweets.py
requirements.txt
sample.env

README.md

Twitter search tool

Simple command line tool to interact with Twitter's search API. Built on top of the python-twitter to provide a simpler interface just to the GetSearch method, built mostly because I needed a tool for historical searches. It accepts a list of terms, a language and a start ID and searches historical tweets (can't go back further than 7 days as that's the oldest the open Search API will go).

Once done it saves a pickled pandas dataframe with the resulting tweets. Also saves intermediate checkpoints (every 50k by default) in case the program crashes for any reason. It can take a long time to run, as the API has a rate limit and python-twitter will sleep when it's reached, which happens about every 5k tweets downloaded. for a 1.5M download it took around 20 hours to run (probably 90% of this time was spent sleeping anyway).

Sample usage:

# Search for all tweets that have the terms 'Chile' or 'Santiago', in spanish, going as far back as possible
python get_tweets.py --terms Chile,Santiago --lang es

Only mandatory argument is --terms, which must be a comma-separated string. Additional arguments are --lang for the language and --start_id to define how far back to search. The defaults are:

  • start_id: As far back as possible (around 7 days)
  • lang: en

Requirements

Must create a secrets.py file in the working directory with the following form:

from collections import namedtuple


ApiKey = namedtuple('ApiKey', [
    'CONSUMER_KEY',
    'CONSUMER_SECRET',
    'ACCESS_TOKEN',
    'ACCESS_TOKEN_SECRET'
])

# Replace these strings with the corresponding keys/tokens
api_key = ApiKey(
    'consumer-key',
    'consumer-secret',
    'access-token',
    'access-token-secret',
)
You can’t perform that action at this time.