TweetMotif is a faceted/topic/summarizing search system for Twitter, built on top of the search.twitter.com API. http://tweetmotif.com
Do you just want the tokenizer?
All you need is two files:
If you use it in research, please cite:
- Brendan O'Connor, Michel Krieger, and David Ahn. TweetMotif: Exploratory Search and Topic Summarization for Twitter. ICWSM-2010.
There is now a Scala port: https://bitbucket.org/jasonbaldridge/twokenize/
More on TweetMotif
By Brendan O'Connor, Michel Krieger, and David Ahn. Written over April-May 2009 and released April 2010.
The TweetMotif paper (inside
EXAMPLES_AND_WRITING, or a
copy at this link) overviews the system.
- Tokyo Cabinet
- Tokyo Tyrant
- Python: version 2.5 works
There are precompiled versions of the Tokyo infrastructure in
platform/, for Mac
OSX 10.5 and Ubuntu 8.04-ish. In the off-chance they will work for your system,
uncomment the code that specifies to use them (
grep platform *.py). You may also
have to muck around with
ldconfig (on Linux) to get
mod_wsgi, which is inside Apache, to see them.
You also need to be running Tokyo Tyrant for the query cache. This is usually inconvenient for just getting started; in which case, disable it by commenting out the lines
# the_cache = .... # @the_cache.wrap
There is a backend and frontend. The backend talks to search.twitter.com and does all text processing, clustering, etc. The frontend is a Django web site with normal and iPhone versions.
The backend makes extensive use of Tokyo Cabinet and Tyrant databases: for the language model, and the query cache.
Both the backend and frontend are WSGI apps. Everything is set up to run through
mod_wsgi. They communicate via JSON-over-HTTP.
The backend is run through, confusingly enough, frontend.py. It also has a primitive frontend for development purposes there.
The frontend is Django. See djfrontend/.
TweetMotif is licensed under the Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0.html
Copyright Brendan O'Connor, Michel Krieger, and David Ahn, 2009-2010.