Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Java Shell
Branch: master

This branch is 1 commit ahead, 2 commits behind castagna:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
.settings
conf
src
.classpath
.gitignore
.project
LICENSE.txt
README.txt
freebase2rdfschema.txt
hadoop-ec2.properties
pom.xml
run.sh

README.txt

  Freebase 2 RDF
  ==============

  Freebase, thanks to Google, still publishes a dump of their data, 
  here: http://download.freebase.com/datadumps/latest/
  
  Freebase2RDF is a small Java program which transform the Freebase 
  data dump into RDF. The conversion is naive and no attempt is made 
  to do clever stuff with literals (such as infer data types) nor 
  extract a schema from the usage of 'properties'. (These are all 
  possible improvements, contributions welcome!)


  Requirements
  ------------

  The only requirements are a Java JDK 1.6 and Apache Maven.

  Instructions on how to install Maven are here:
  http://maven.apache.org/download.html#Installation 


  How to run it
  -------------

  First, download the Freebase latest data dump:
  wget http://download.freebase.com/datadumps/latest/freebase-datadump-quadruples.tsv.bz2

    cd freebase2rdf
    mvn package
    java -cp target/freebase2rdf-0.1-SNAPSHOT-jar-with-dependencies.jar cmd.freebase2rdf </path/to/freebase-datadump-quadruples.tsv.bz2> </path/to/filename.nt.gz>


  See also
  --------

   - http://basekb.com/ and http://code.google.com/p/basekb-tools/
   - http://code.google.com/p/freebase-quad-rdfize/
   - http://markmail.org/thread/mq6ylzdes6n7sc5o
   - http://markmail.org/thread/jegtn6vn7kb62zof


  MapReduce and how to use Apache Whirr
  -------------------------------------

  If you have an Hadoop cluster, here is how you can use 
  mvn hadoop:pack
  hadoop --config ~/.whirr/hadoop jar target/hadoop-deploy/freebase2rdf-hdeploy.jar cmd.freebase2rdf4mr </path/to/freebase-datadump-quadruples.tsv.bz2> </output/path>

  If you do not have an Hadoop cluster, here is how to use Apache Whirr:

    export KASABI_AWS_ACCESS_KEY_ID=...
    export KASABI_AWS_SECRET_ACCESS_KEY=...
    cd /opt/
    curl -O http://archive.apache.org/dist/whirr/whirr-0.7.1/whirr-0.7.1.tar.gz
    tar zxf whirr-0.7.1.tar.gz
    ssh-keygen -t rsa -P '' -f ~/.ssh/whirr
    export PATH=$PATH:/opt/whirr-0.7.1/bin/
    whirr version
    whirr launch-cluster --config hadoop-ec2.properties --private-key-file ~/.ssh/whirr
    . ~/.whirr/hadoop/hadoop-proxy.sh
    # Proxy PAC configuration here: http://apache-hadoop-ec2.s3.amazonaws.com/proxy.pac

  To shutdown the cluster:

    whirr destroy-cluster --config hadoop-ec2.properties

Something went wrong with that request. Please try again.