Twitter search tool

Simple command line tool to interact with Twitter's search API. Built on top of the python-twitter to provide a simpler interface just to the GetSearch method, built mostly because I needed a tool for historical searches. It accepts a list of terms, a language and a start ID and searches historical tweets (can't go back further than 7 days as that's the oldest the open Search API will go).

Once done it saves a pickled pandas dataframe with the resulting tweets. Also saves intermediate checkpoints (every 50k by default) in case the program crashes for any reason. It can take a long time to run, as the API has a rate limit and python-twitter will sleep when it's reached, which happens about every 5k tweets downloaded. for a 1.5M download it took around 20 hours to run (probably 90% of this time was spent sleeping anyway).

Sample usage:

# Search for all tweets that have the terms 'Chile' or 'Santiago', in spanish, going as far back as possible
python get_tweets.py --terms Chile,Santiago --lang es

Only mandatory argument is --terms, which must be a comma-separated string. Additional arguments are --lang for the language and --start_id to define how far back to search. The defaults are:

start_id: As far back as possible (around 7 days)
lang: en

Requirements

Must create a secrets.py file in the working directory with the following form:

from collections import namedtuple


ApiKey = namedtuple('ApiKey', [
    'CONSUMER_KEY',
    'CONSUMER_SECRET',
    'ACCESS_TOKEN',
    'ACCESS_TOKEN_SECRET'
])

# Replace these strings with the corresponding keys/tokens
api_key = ApiKey(
    'consumer-key',
    'consumer-secret',
    'access-token',
    'access-token-secret',
)

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_tweets.py		get_tweets.py
requirements.txt		requirements.txt
sample.env		sample.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

get_tweets.py

get_tweets.py

requirements.txt

requirements.txt

sample.env

sample.env

Repository files navigation

Twitter search tool

Sample usage:

Requirements

About

Releases

Packages

Languages

License

jcaguirre89/twitter-scrape

Folders and files

Latest commit

History

Repository files navigation

Twitter search tool

Sample usage:

Requirements

About

Resources

License

Stars

Watchers

Forks

Languages