To address the challenge of larger files, such as recent Freebase dumps, the Infovore framework is being rebuilt on the Hadoop framework. Progress is rapid on this now, and Join our Google Group to follow it.
You will probably control Infovore 2 with the Haruhi command line application. Installed on your command line, Haruhi can launch jobs on a locally available hadoop cluster (i.e. the "hadoop" command is in the
$PATH) or it can provision a cluster in Amazon EMR for prices starting from 7.5 cents an hour.
Most of Infovore 2 is packaged in the Bakemono super jar -- a jar that contains multiple Hadoop applications. Infovore contains another Haruhi super jar that can deploy a bakemono configuration to any Hadoop-compatible platform for ease of manual use and automation.
To get started quickly, see Hadoop for the Impatient, how to run your own Jar and how to use Persistent AWS Clusters Paul's gave a talk about the application of Infovore to Freebase at SemtechBiz 2013 NY -- see the slides.
Notes on Hadoop
The development of Infovore has passed through three phases so far.