As part of my learning process, a rewrite of Storm's wordcount topology in Scala with some extra toys: uses Lucene's ShingleFilter to count the word 2-grams in the Twitter sample firehose. An output bolt pushes the results into Redis.
Sample results:
Uses language detection code from