Skip to content
Visualize geo-located tweets in real time, parse them, use them to write bot-assisted poetic-text, then ship that text to people within certain geographic boundaries
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Sentence Parse.PNG

Saito Group 1-17-2017


This library is composed of several tools for scraping geolocated tweets and visualizing data gleaned from these tweets. It also has a robotic assistant tool, called suggest_bot which can help you write poems in the style of a document you pass in. Another tool, called scan_and_respond allows you to scan an area for search terms and then tweet at those people!

Geo-tag your tweets!

We rely on geo-tagged tweets. Please allow your location to be seen when tweeting, especially when using this application! You can modify this by logging into your main twitter account and under "Security and Privacy" check the box next to "Tweet location". THANKS!


git, python 2.7.X, pip Python packages required: tweepy, nltk, matplotlib, geopy, argparse, curses, bs4 (beautiful soup), locale

On Windows: upgrade powershell (you may still have unicode problems when printing to command line)

python -m pip install

For each required package listed above run:

pip install <package>

Now we need some data, so we’ll use the nltk downloader Run a python shell from the command line:

import nltk

On main page, highlight book, click download and that should be it... These are the exact packages from nltk that are required in case you want less data:

  1. under corpora -> highlight stopwords
  2. under corpora -> highlight treebank
  3. under all packages -> highlight punkt
  4. under models -> highlight averaged-perceptron-tagger

This created a folder called “nltk_data” in your home folder which is used by the program

Navigate to the folder where you want getweets to be

git clone

get consumerkeyandsecret (see below) and put that in the folder cd into folder run from the command line (see below)

Consumer Key and Secret:

The program looks for a file in the geotweets folder called consumerkeyandsecret This should have at least 2 lines, with the consumer key on the first line, the secret (the longer one) on the next and then (for streaming and posting) 2 more lines. An access token on the 3rd and the access token secret on the 4th. You can get these by going to in a web browser and creating an app. Then hit the button to create access tokens. You may have to set the app permissions to "read and write" if you want to use this to send tweets on your behalf. After creating the app, copy the 4 alphanumeric keys into a blank file called "consumerkeyandsecret" as described above and put this file in your "geotweets" folder.



A simple tool, called 'sample' allows you to scrape and save up to 100 geolocated tweets in batch form. You can optionally search within this set for specific words or hash tags and visualize the top word frequency. See for details or from command line run:

python --help
python --doc


python [-h][-d][-v][-f FILENAME][-o OUTPUT][-vis]


Given a URL this will scrape a website and save the text to scraped_text.txt [-d][-h][-u URL][-o OUTPUT_FILE]

real time visualizer:

Another tool, called 'real_time_vis' creates a word frequency distribution chart which can grow and change in near real time as more tweets are grabbed. If you use -s, you'll get streaming results, which are currently being tweeted. Otherwise you will get batched quotes, every 5 seconds using the REST API, which will return tweets that are from the recent past. See for details or from the command line run:

python --help
python --doc


python [-h][-d][-f FILENAME][-n NUMBER][-s][-a ADDRESS]

Both files use a parameter file with geolocation and search terms. See params.txt for an example.

You may have to adjust your PYTHONPATH variable to run the program from the command line. Otherwise, using the python interpreter you can run it.


This is a robotically assisted poetry engine. The user can create poems using a large supplied word corpus or use their own. It can also add words to the corpus from the twitter-sphere using the search option. It can also parse those twitter messages into phrases using natural language processing.


python [-d][-h][-p PARAMS][-i INPUT | -m INPUT][-o OUTPUT][-a ADDRESS]
  1. Once you are running the program, if you call the 's' command, you can search twitter. This will use the parameters in the params.txt file as usual.

  2. If you want to parse the tweets and then use phrases, simply repond 'y' to the query after you hit 's'. There is also a default corpus.

3)This is also a default set of words, that you can use by calling the 'd' command.

4)You can also load your own corpus, which will then just use those words randomly as suggestions.

  1. Finally, while using the word suggester, if you ever find that you made an error, simply hit e and an inline editor will pop up. There is currently an error that was patched but hasn't been pushed to all python versions, so you currently cannot insert words. Sorry!

  2. Finally, I would suggest trying out the markov chain poetry assistant. It can help create poems that mimic the natural statistics of the input text. Simply supply the progra m with a grammatical text of poems or literature.

python -m <your_text_file_here.txt>


This tool scans tweets and asks the user to verify them before sending a tweet response. The relevant tweets are also saved to a JSON file. This requires write access, which means the consumerkeyandsecret file must contain all 4 lines. [-h] [-d] [-f FILENAME] [-a ADDRESS] [-o OUTPUT]


All programs can be run from the command line (a.k.a. terminal in OS X).

By typing python <program_name> -h you will get help on the various command line tool options. By typing python <program_name> -d you will get the programs documentation string. If a parameter says something like: -o OUTPUT Then simply substitute a file for the capitalized word, like so:

python -m my_poetic_text.txt

If a USAGE says something like [-x | -y] then you can only use parameter x OR y but not both.


Grabbing geo-located tweets using paramter file params.txt (default), print to command line and write to output.txt (default):

python --verbose

Visualizing the data, using params.txt (default):


Streaming real time data to create a word frequency chart using a local address:

python -a "175 5th Avenue NYC" -s

Scraping a website and saving to an output file:

python -u -o scraped_text.txt

Using suggest_bot with a file of random words, which will NOT be a markov chain:

python -i random_not_necessarily_grammatical_text.txt


These modules contain methods to assist the "tools" listed above: this allows you to tweet at people, programmatically this returns geocoordinates for a given address searches the REST API : creates a multithreaded twitter API streamer : creates a command line editor : creates a markov chain ngram word generator


This program classifies tweets into phrase types and produces a JSON array containing these, called phrases.json. It uses parameters from params.txt. This requires quite a bit of processing time, which can be reduced by using a lower "count".

The below two modules run unit tests:

You can’t perform that action at this time.