Skip to content

Converting the quad dump

paulhoule edited this page Mar 4, 2013 · 5 revisions

You can convert a Freebase quad dump to RDF with the Hydroxide application, which is built on the Millipede parallel processing framework. The use of this product, :BaseKB Pro, is documented here.

Get Started

You'll need a copy of the Freebase quad dump from

http://download.freebase.com/datadumps/

Hydroxide has been tested against the 2012-11-04 quad dump as well as selected prior quad dumps.

You'll need to select a base directory and an instance name for your copy of Freebase. You'll configure these using shell environment variables, for instance

$ export INFOVORE_BASE=/freebase 
$ export INFOVORE_INSTANCE=2012-11-04

you should install your data dump at

/freebase/data/2012-11-04/input/freebase-datadump-quadruples.tsv.bz2

Hydroxide will write temporary files to the work subdirectory of the instance directory and will write final output to the output subdirectory.

As currently configured, the instance directory grows to 80GB in the process of creating baseKBLite and baseKBPro. Future versions of Infovore may reduce disk consumption, but currently intermediate files are saved in case they are necessary for research and debugging

Running Hydroxide

First you should build and run the script that installs the path and environment variables to run infovore

$ mvn clean install 
$ source hydroxide-apps/path.sh

then do

$ createPro.sh

to create :BaseKB Pro, a complete rendition of Freebase in RDF. This process takes about ten hours on computers with a four core processor.

Clone this wiki locally