Skip to content

Latest commit

 

History

History
113 lines (97 loc) · 5.4 KB

README.md

File metadata and controls

113 lines (97 loc) · 5.4 KB

Get Old Tweets Programatically

A project written in Python to get old tweets, it bypass some limitations of Twitter Official API.

Details

Twitter Official API has the bother limitation of time constraints, you can't get older tweets than a week. Some tools provide access to older tweets but in the most of them you have to spend some money before. I was searching other tools to do this job but I didn't found it, so after analyze how Twitter Search through browser works I understand its flow. Basically when you enter on Twitter page a scroll loader starts, if you scroll down you start to get more and more tweets, all through calls to a JSON provider. After mimic we get the best advantage of Twitter Search on browsers, it can search the deepest oldest tweets.

Prerequisites

This package assumes using Python 2.x. The Python3 "got3" folder is maintained as experimental and is not officially supported.

Expected package dependencies are listed in the "requirements.txt" file for PIP, you need to run the following command to get dependencies:

pip install -r requirements.txt

Components

  • Tweet: Model class to give some informations about a specific tweet.

    • id (str)
    • permalink (str)
    • username (str)
    • text (str)
    • date (date)
    • retweets (int)
    • favorites (int)
    • mentions (str)
    • hashtags (str)
    • geo (str)
  • TweetManager: A manager class to help getting tweets in Tweet's model.

    • getTweets (TwitterCriteria): Return the list of tweets retrieved by using an instance of TwitterCriteria.
  • TwitterCriteria: A collection of search parameters to be used together with TweetManager.

    • setUsername (str): An optional specific username from a twitter account. Without "@".
    • setSince (str. "yyyy-mm-dd"): A lower bound date to restrict search.
    • setUntil (str. "yyyy-mm-dd"): An upper bound date to restrist search.
    • setQuerySearch (str): A query text to be matched.
    • setTopTweets (bool): If True only the Top Tweets will be retrieved.
    • setNear(str): A reference location area from where tweets were generated.
    • setWithin (str): A distance radius from "near" location (e.g. 15mi).
    • setMaxTweets (int): The maximum number of tweets to be retrieved. If this number is unsetted or lower than 1 all possible tweets will be retrieved.
  • Main: Examples of how to use.

  • Exporter: Export tweets to a csv file named "output_got.csv".

Examples of python usage

  • Get tweets by username
	tweetCriteria = got.manager.TweetCriteria().setUsername('barackobama').setMaxTweets(1)
	tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
	  
    print tweet.text
  • Get tweets by query search
	tweetCriteria = got.manager.TweetCriteria().setQuerySearch('europe refugees').setSince("2015-05-01").setUntil("2015-09-30").setMaxTweets(1)
	tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
	  
    print tweet.text
  • Get tweets by username and bound dates
	tweetCriteria = got.manager.TweetCriteria().setUsername("barackobama").setSince("2015-09-10").setUntil("2015-09-12").setMaxTweets(1)
	tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
	  
    print tweet.text
  • Get the last 10 top tweets by username
	tweetCriteria = got.manager.TweetCriteria().setUsername("barackobama").setTopTweets(True).setMaxTweets(10)
	# first one
	tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
	  
    print tweet.text

Examples of command-line usage

  • Get help use
    python Exporter.py -h
  • Get tweets by username
    python Exporter.py --username "barackobama" --maxtweets 1
  • Get tweets by query search
    python Exporter.py --querysearch "europe refugees" --maxtweets 1
  • Get tweets by username and bound dates
    python Exporter.py --username "barackobama" --since 2015-09-10 --until 2015-09-12 --maxtweets 1
  • Get the last 10 top tweets by username
    python Exporter.py --username "barackobama" --maxtweets 10 --toptweets

Extensions for retrieving tweets with Dow Jones cashtags

Getting tweets

  • To loop through all dates from Sept 1-Dec 31 in 2016 for all Dow Jones symbols, run sh loop_through_symbols_dates.sh
  • To loop through only a subset of symbols or a subset of dates (or dates in 2017), you can change the date ranges and symbol ranges easily in the file.
    • You can also run sh loop_through_dates.sh if you are only interested in tweets about one symbol or search term. You will need to change the parameters in that file.
  • To then merge the files for each symbol into one CSV file, run sh concatenate_files.sh

Sentiment analysis of tweets

  • The script get_sentiment_scores.py saves the sentiment scores for tweets for a stock. Usage (once you have all the CSV files for that stock in data/: python get_sentiment_scores $STOCKCASHTAG. E.g. python get_sentiment_scores \$aapl
    • In its current form the first characted is discarded: what we really want is 'aapl'. So you could run python get_sentiment_scores 0aapl and it'd be the same. You can change this in the Python file.
  • The script sentiment_for_all_symbols loops through all 30 Dow Jones index symbols and runs python get_sentiment_scores $STOCKCASHTAG for each one. As before, you need to have all the CSVs ready in data/ before you run this.
    • Usage: sh sentiment_for_all_symbols.sh.
  • The scores are in scores/ and the mean scores are in mean_scores/