Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time
** webigator
**   by Graham Neubig
**   neubig at gmail dot com

Webigator is a program to extract useful information from large amounts of text.
In particular, it was developed considering the information needs after
disasters, such as:

* Locations at which people can evacuate, get water or supplies
* Text about the safety of people in the disaster-affected areas
* Requests for rescue or disaster supplies

At the moment, we are mainly focusing on filtering this information from Twitter.

-------------- Install -------------------

The latest version of webigator can be found here:
under the Eclipse Common License.

webigator works on linux, and will likely work on Mac or Cygwin. It is
dependent on a number of libraries including Boost and XML-RPC, which can be
downloaded as follows on Ubuntu:

$ sudo apt-get install libboost-all-dev libxmlrpc-c3-dev

In addition, for the Perl interface you will want XML::RPC:

$ cpan XML::RPC

Next, we build the server

$ autoreconf -i
$ ./configure
$ make

If this works properly, you should see a help message as follows:
src/bin/webigator --help
If this doesn't work, please contact the authors.

---------- Setting up the Web Interface ----------

Webigator runs best as a web interface, so we will want to install this too.
First, you will need to locate two directories. Your web server home directory,
and your cgi-bin directory. If you are running Apache on Ubuntu with the default
settings, these are /var/www and /usr/lib/cgi-bin respectively. So, to set up
the web server, you will run the following commands:

Create the webigator-demo directory:

$ mkdir /var/www/webigator-demo

Move all of the images, css files, etc to this directory, and all of the program
files to the cgi-bin directory:

$ cp -r script/cgi/{css,img,js} /var/www/webigator-demo/
$ cp script/cgi/*.{pl,cgi,tpl,pm} /usr/lib/cgi-bin

If this works, you will be able to access the webigator tool at:


(Or replace "localhost" with a different name if you are running on a server.)
There are also number of settings in /usr/lib/cgi-bin/ that you
can change, such as the UI language, tokenization method (the default is by
characters for Japanese), etc.

--------------- Using the tool -------------------

Once install has finished, run the server:

$ src/bin/webigator

Next, we can create a task by going to the web interface and clicking "create

We start a program that streams data to the webigator server. This should eventually
do something like gather data from the Twitter stream, but for the time being, let's
assume that you have some data in tab-separated format in the file data.csv:

$ script/ -data data.csv

Next, we access the web interface again, and start labeling the data for the task.

-------------------- API -------------------------

The perl programs and server interact through an XML API:

***** Add a keyword
    id = Data ID (int)
    text = Data (string)
    lab = Is the keyword for useful info or not useful info (1 or 0)

***** Add an unlabeled example to be scored
    id = Data ID (int)
    text = Data (string)

***** Add an instance that has been labeled
    id = Data ID (int)
    text = Data (string)
    lab = Is the data for useful info or not useful info (1 or 0)

***** Return the highest scoring value from the server

***** Rescore all of the values on the server:

--------------- TODO ------------------

* Improve the scoring function so it doesn't over-value short text at the beginning
* Create a program to stream tweets directly from Twitter


A program to aggregate, rank, and search text information




No releases published


No packages published