Clone this wiki locally
To address the challenge of larger files, such as recent Freebase dumps, the Infovore framework is being rebuilt on the Hadoop framework. Progress is rapid on this now, and Join our Google Group to follow it.
You will probably control Infovore 2 with the Haruhi command line application. Installed on your command line, Haruhi can launch jobs on a locally available hadoop cluster (i.e. the "hadoop" command is in the
$PATH) or it can provision a cluster in Amazon EMR for prices starting from 7.5 cents an hour.
Most of Infovore 2 is packaged in the Bakemono super jar -- a jar that contains multiple Hadoop applications. Infovore contains another Haruhi super jar that can deploy a bakemono configuration to any Hadoop-compatible platform for ease of manual use and automation.
- perfecting the automation of a process that deploys :BaseKB Lime in the AMZN cloud weekly
- developing tools for rapidly exploring large RDF files with Pig (the Chopper project)
- developing bakemono apps for further processing of data sets.
To get started quickly, see Hadoop for the Impatient, how to run your own Jar and how to use Persistent AWS Clusters Paul's gave a talk about the application of Infovore to Freebase at SemtechBiz 2013 NY -- see the slides.
Infovore 2 Documentation
- Editions of :BaseKB
- Developer's Notes
Notes on Hadoop
- Particulars for Freebase
Historical Versions of Infovore
The development of Infovore has passed through three phases so far.
- Infovore 1.0 -- a proprietary system for converting the old Freebase quad dump to RDF that was later released as open source
- Infovore 1.1 -- an open source toolset for processing data sets such as Wikipedia and Freebase; like Infovore 1.0, one-computer concurrency was enabled with our own Millipede framework.
- Infovore 2 -- as growing data sets broke components of Millipede, we switched to Hadoop
Documentation for era 1 Infovore
- Command Line Utilities
- Configuring Tools
- Converting a Freebase Quad Dump to :BaseKB Pro
- Creating :BaseKB Lite
- Grounded SPARQL
- Infovore and Jena
- Installing :BaseKB
- Understanding the millipede framework
- Running Integration Tests
- System requirements
- UNA and the :BaseKB Point of View
- User documentation for :BaseKB itself