Twint is a Python package that provides a very high level abstraction layer to Twitter API and provides the preprocessed results in a pandas dataframe. The system have been built with several key concerns in mind.
- availability of key scores (influence, reach, etc)
- applicability of the data for neural networks
- identification of spam and other bots
- singleline commands for all four important methods:
- streaming API for both keywords and users
- REST API for for keywords
- REST API for user timelines
- Flatfile ingestion from JSON (from Twitter API)
- all methods return identical dataframe
pip install git+https://github.com/mikkokotila/twint.git
search('deep learning',500)
timeline('realdonaldtrump')
stream('cars')
flatfile('some_tweets.json')
SIGNAL | SOURCE |
---|---|
influence_score | Twint |
reach_score | Twint |
quality_score | Twint |
compound | NLTK |
neu | NLTK |
neg | NLTK |
pos | NLTK |
days_since_creation | |
user_tweets | |
user_favourites | |
user_followers | |
user_following | |
user_listed | |
handle | |
created_at | |
default_profile | |
egg_account | |
description | |
location | |
timezone | |
expanded_url | |
url | |
site_url | |
retweet_count | |
text |