Extract content from tweets
Ruby
Latest commit 334026e Sep 21, 2010 @threedaymonk threedaymonk Version bump.
Permalink
Failed to load latest commit information.
lib
test
.gitignore
.gitmodules
COPYING
README.md
Rakefile

README.md

tweetparser

Extract content from tweets in the form of an s-expression.

Usage

require "tweetparser"
tweet = "Hey @threedaymonk, here is a tweet with #hashtags and a http://example.com/url"
TweetParser.parse(tweet)

This gives:

[[:text, "Hey"], [:space, " "],
 [:atref, "@threedaymonk"], [:text, ","], [:space, " "],
 [:text, "here"], [:space, " "],
 [:text, "is"], [:space, " "],
 [:text, "a"], [:space, " "],
 [:text, "tweet"], [:space, " "],
 [:text, "with"], [:space, " "],
 [:hashtag, "#hashtags"], [:space, " "],
 [:text, "and"], [:space, " "],
 [:text, "a"], [:space, " "],
 [:url, "http://example.com/url"]]

The full list of tweet parts recognised is as follows:

  • :url (http://example.com/ or www.example.com)
  • :username (@username) This was :atref in version 0.1.0
  • :list (@username/listname)
  • :hashtag (#hashtag)
  • :slash (/via)
  • :text
  • :newline
  • :html (pre-composed HTML)

Dependencies

  • treetop
  • polyglot

After checking out the code via git, you need to fetch the conformance test submodule:

git submodule init
git submodule update

Known bugs

  • The maximum length of a username or list is not checked.
  • A username etc. immediately following punctuation is not recognised.
  • Japanese text is not handled correctly.
  • Hashtags containing accents are not supported.