Skip to content
A program to aggregate, rank, and search text information
Perl C++ Objective-C C CSS JavaScript
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
script
src
.gitignore
AUTHORS
COPYING
ChangeLog
INSTALL
Makefile.am
NEWS
README
README.ja
configure.ac

README

** webigator
**   by Graham Neubig
**   neubig at gmail dot com

Webigator is a program to extract useful information from large amounts of text.
In particular, it was developed considering the information needs after
disasters, such as:

* Locations at which people can evacuate, get water or supplies
* Text about the safety of people in the disaster-affected areas
* Requests for rescue or disaster supplies

At the moment, we are mainly focusing on filtering this information from Twitter.

-------------- Install -------------------

The latest version of webigator can be found here:
http://www.phontron.com/webigator
http://www.github.com/neubig/webigator
under the Eclipse Common License.

webigator works on linux, and will likely work on Mac or Cygwin. It is
dependent on a number of libraries including Boost and XML-RPC, which can be
downloaded as follows on Ubuntu:

$ sudo apt-get install libboost-all-dev libxmlrpc-c3-dev

In addition, for the Perl interface you will want XML::RPC:

$ cpan XML::RPC

Next, we build the server

$ autoreconf -i
$ ./configure
$ make

If this works properly, you should see a help message as follows:
src/bin/webigator --help
If this doesn't work, please contact the authors.

---------- Setting up the Web Interface ----------

Webigator runs best as a web interface, so we will want to install this too.
First, you will need to locate two directories. Your web server home directory,
and your cgi-bin directory. If you are running Apache on Ubuntu with the default
settings, these are /var/www and /usr/lib/cgi-bin respectively. So, to set up
the web server, you will run the following commands:

Create the webigator-demo directory:

$ mkdir /var/www/webigator-demo

Move all of the images, css files, etc to this directory, and all of the program
files to the cgi-bin directory:

$ cp -r script/cgi/{css,img,js} /var/www/webigator-demo/
$ cp script/cgi/*.{pl,cgi,tpl,pm} /usr/lib/cgi-bin

If this works, you will be able to access the webigator tool at:

http://localhost/cgi-bin/webigator-demo.cgi

(Or replace "localhost" with a different name if you are running on a server.)
There are also number of settings in /usr/lib/cgi-bin/settings.pl that you
can change, such as the UI language, tokenization method (the default is by
characters for Japanese), etc.

--------------- Using the tool -------------------

Once install has finished, run the server:

$ src/bin/webigator

Next, we can create a task by going to the web interface and clicking "create
task."

We start a program that streams data to the webigator server. This should eventually
do something like gather data from the Twitter stream, but for the time being, let's
assume that you have some data in tab-separated format in the file data.csv:

$ script/data-streamer.pl -data data.csv

Next, we access the web interface again, and start labeling the data for the task.

-------------------- API -------------------------

The perl programs and server interact through an XML API:

***** Add a keyword
add_keyword:
    id = Data ID (int)
    text = Data (string)
    lab = Is the keyword for useful info or not useful info (1 or 0)

***** Add an unlabeled example to be scored
add_unlabeled:
    id = Data ID (int)
    text = Data (string)

***** Add an instance that has been labeled
add_labeled:
    id = Data ID (int)
    text = Data (string)
    lab = Is the data for useful info or not useful info (1 or 0)

***** Return the highest scoring value from the server
pop_best:

***** Rescore all of the values on the server:
rescore:

--------------- TODO ------------------

* Improve the scoring function so it doesn't over-value short text at the beginning
* Create a program to stream tweets directly from Twitter
You can’t perform that action at this time.