paulhoule edited this page Oct 1, 2014 · 90 revisions
Clone this wiki locally

Version 2

Build Status

To address the challenge of larger files, such as recent Freebase dumps, the Infovore framework is being rebuilt on the Hadoop framework. Progress is rapid on this now, and Join our Google Group to follow it.

You will probably control Infovore 2 with the Haruhi command line application. Installed on your command line, Haruhi can launch jobs on a locally available hadoop cluster (i.e. the "hadoop" command is in the $PATH) or it can provision a cluster in Amazon EMR for prices starting from 7.5 cents an hour.

Most of Infovore 2 is packaged in the Bakemono super jar -- a jar that contains multiple Hadoop applications. Infovore contains another Haruhi super jar that can deploy a bakemono configuration to any Hadoop-compatible platform for ease of manual use and automation.

Bakemono contains a number of applications (see the list of applications), such as freebaseRDFPrefilter and pse3, sieve3, and ranSample.

  • perfecting the automation of a process that deploys :BaseKB Lime in the AMZN cloud weekly
  • developing tools for rapidly exploring large RDF files with Pig (the Chopper project)
  • developing bakemono apps for further processing of data sets.

To get started quickly, see Hadoop for the Impatient, how to run your own Jar and how to use Persistent AWS Clusters Paul's gave a talk about the application of Infovore to Freebase at SemtechBiz 2013 NY -- see the slides.

See Academic Papers About BaseKB and Projects that use :BaseKB

Infovore 2 Documentation

Historical Versions of Infovore

The development of Infovore has passed through three phases so far.

  • Infovore 1.0 -- a proprietary system for converting the old Freebase quad dump to RDF that was later released as open source
  • Infovore 1.1 -- an open source toolset for processing data sets such as Wikipedia and Freebase; like Infovore 1.0, one-computer concurrency was enabled with our own Millipede framework.
  • Infovore 2 -- as growing data sets broke components of Millipede, we switched to Hadoop

Documentation for era 1 Infovore