Topical search for Twitter. See twokenize.py, emoticons.py for tokenization.
Switch branches/tags
Nothing to show
Pull request Compare This branch is 2 commits behind brendano:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
everything_else
README.md
emoticons.py
twokenize.py

README.md

TweetMotif

TweetMotif is a faceted/topic/summarizing search system for Twitter, built on top of the search.twitter.com API. http://tweetmotif.com

Do you just want the tokenizer?

All you need is two files:

If you use it in research, please cite:

  • Brendan O'Connor, Michel Krieger, and David Ahn. TweetMotif: Exploratory Search and Topic Summarization for Twitter. ICWSM-2010.

Scala port

There is now a Scala port: https://bitbucket.org/jasonbaldridge/twokenize/

More on TweetMotif

By Brendan O'Connor, Michel Krieger, and David Ahn. Written over April-May 2009 and released April 2010.

The TweetMotif paper (inside EXAMPLES_AND_WRITING, or a copy at this link) overviews the system.

Running TweetMotif

Prerequisites

  • Tokyo Cabinet
  • Tokyo Tyrant
  • mod_wsgi
  • Python: version 2.5 works

There are precompiled versions of the Tokyo infrastructure in platform/, for Mac OSX 10.5 and Ubuntu 8.04-ish. In the off-chance they will work for your system, uncomment the code that specifies to use them (grep platform *.py). You may also have to muck around with ld.so.conf.d and ldconfig (on Linux) to get mod_wsgi, which is inside Apache, to see them.

You also need to be running Tokyo Tyrant for the query cache. This is usually inconvenient for just getting started; in which case, disable it by commenting out the lines

# the_cache = ....
# @the_cache.wrap

In query_cache.py

Architecture

There is a backend and frontend. The backend talks to search.twitter.com and does all text processing, clustering, etc. The frontend is a Django web site with normal and iPhone versions.

The backend makes extensive use of Tokyo Cabinet and Tyrant databases: for the language model, and the query cache.

Both the backend and frontend are WSGI apps. Everything is set up to run through mod_wsgi. They communicate via JSON-over-HTTP.

Backend

The backend is run through, confusingly enough, frontend.py. It also has a primitive frontend for development purposes there.

Frontend

The frontend is Django. See djfrontend/.

License

TweetMotif is licensed under the Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0.html

Copyright Brendan O'Connor, Michel Krieger, and David Ahn, 2009-2010.