Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Extract content from tweets

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 lib
Octocat-spinner-32 test
Octocat-spinner-32 .gitignore
Octocat-spinner-32 .gitmodules
Octocat-spinner-32 COPYING
Octocat-spinner-32 README.md
Octocat-spinner-32 Rakefile
README.md

tweetparser

Extract content from tweets in the form of an s-expression.

Usage

require "tweetparser"
tweet = "Hey @threedaymonk, here is a tweet with #hashtags and a http://example.com/url"
TweetParser.parse(tweet)

This gives:

[[:text, "Hey"], [:space, " "],
 [:atref, "@threedaymonk"], [:text, ","], [:space, " "],
 [:text, "here"], [:space, " "],
 [:text, "is"], [:space, " "],
 [:text, "a"], [:space, " "],
 [:text, "tweet"], [:space, " "],
 [:text, "with"], [:space, " "],
 [:hashtag, "#hashtags"], [:space, " "],
 [:text, "and"], [:space, " "],
 [:text, "a"], [:space, " "],
 [:url, "http://example.com/url"]]

The full list of tweet parts recognised is as follows:

  • :url (http://example.com/ or www.example.com)
  • :username (@username) This was :atref in version 0.1.0
  • :list (@username/listname)
  • :hashtag (#hashtag)
  • :slash (/via)
  • :text
  • :newline
  • :html (pre-composed HTML)

Dependencies

  • treetop
  • polyglot

After checking out the code via git, you need to fetch the conformance test submodule:

git submodule init
git submodule update

Known bugs

  • The maximum length of a username or list is not checked.
  • A username etc. immediately following punctuation is not recognised.
  • Japanese text is not handled correctly.
  • Hashtags containing accents are not supported.
Something went wrong with that request. Please try again.