#BirdWatch
BirdWatch is a reactive web application for visualizing a stream of live Tweets making use of AngularJS, BootStrap, Crossfilter, D3.js, ElasticSearch and Play Framework (in alphabetical order).
Here is an overview of the information flow in the system:
A Play application connects to the Twitter Streaming API and receives all Tweets that include at least one of a set of configured words. Twitter caps this to 1% of the FireHose, which basically means that the application will not receive more than one percent of all Tweets at any given moment of time. This limit still falls in the range of millions of Tweets per day; a well-defined area of interest should comfortably fit in.
Incoming Tweets are inserted into an ElasticSearch index where they are almost instantly available for querying. Each Tweet is also compared with what is called a percolation query, a pre-registered query for each connected client. Every thus pre-registered query is run on every new Tweet. For every Tweet on which the query matches the client will immediately be informed by means of Server Sent Events (SSE).
AngularJS clients hold a local data copy of all the Tweets they have asked for using the ElasticSearch query syntax, with 'AND' being the default operator. Every query is not only run on the existing Tweets in the ElasticSearch index but is also registered as a percolation query. A user selectable amount of previous Tweets is loaded, and then every new Tweet for which the query matches is appended immediately, allowing Tweets analysis in near-real-time. Queries are bookmarkable, making it easy to frequently look at interesting and potentially complex queries.
Client-side analysis of the (live) search result is performed using Crossfilter.
A live version of this application is available. This instance listens to a bunch of software and data related terms, see the application.conf file for details. Interesting queries on this data set include:
Please feel free to contribute, pull requests are happily accepted. I use this project to study the technologies involved and I would appreciate learning better ways of doing things.
A detailed description of the application can be found on my blog.
##Setup
Play Framework. You need a JVM on your machine. On a Mac the easiest way is to then install play using HomeBrew:
brew install play
If brew was installed on your machine already you want to run this first:
brew update
brew upgrade
You also need ElasticSearch:
brew install elasticsearch
You then run
elasticsearch -f
Before running the play application for the first time, you need to create the percolation index:
curl -XPUT localhost:9200/queries
And inside the application folder:
play run
Twitter API consumer key and access token are required to consume the Twitter Streaming API. You need to create a Twitter application and store keys and secrets in a twitter.conf file, using the commented out section in the application.conf as a template.
That should be all there is to it before you can run your own instance listening on localhost:9000.
You may want to remove Google Analytics script in main.scala.html or adapt the Analytics setting in the application.conf according to your own needs.
###Streaming API limitations Please be aware that only one connection to the Twitter Streaming API is possible from any one public IP address. Starting a connection to the Streaming API will potentially end other connections from the same network if NAT is in place using the same public IP address. Access from mobile networks is discouraged and most likely won't work.
This software is licensed under the Apache 2 license, quoted below.
Copyright © 2013 Matthias Nehlsen.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this project except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.