Skip to content
forked from lintool/warcbase

A web archive browser built on HBase

Notifications You must be signed in to change notification settings

graemon/warcbase

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WarcBase

A web archive browser built on HBase.

Getting Started

Once you check out the repo, build WarcBase:

mvn clean package appassembler:assemble

Ingesting WARC files:

$ setenv CLASSPATH_PREFIX "/etc/hbase/conf/"
$ sh target/appassembler/bin/IngestWarcFiles -dir /path/to/warc/ -name archive_name -create

Command-line options:

  • Use the -dir option to specify directory containing WARC files.
  • Use the -name option to specify the name of the archive (will correspond to the HBase table name).
  • Use the -create option to create a new table (and drop the existing table if a table with the same name exists already). Alternatively, use -append to add to an existing table.

Starting the browser:

$ setenv CLASSPATH_PREFIX "/etc/hbase/conf/"
$ sh target/appassembler/bin/WarcBrowser -port 9191 -server http://myhost:9191/

Navigate to http://myhost:9191/ to browse the archive.

About

A web archive browser built on HBase

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Java 100.0%