Paul Houle edited this page Feb 21, 2014 · 7 revisions

Weekly releases of Basekb Now are stored in the following location

s3://basekb-now/{year}-{month}-{day}-{x}-{y}

where {x} and {y} are usually 00. The most recent :BaseKB Now edition, as I write this, is

s3://basekb-now/2013-10-13-00-00/

:BaseKB Now is broken into a number of files to allow parallel processing and selective download of the facts that you need. The tree structure looks like

  rejected/
  sieved/
    a/                     facts with predicates rdfs:type
    description/           facts with long-winded text descriptions
    key/                   facts that spell out keys completely "/Wikipedia/en/Bon_Jovi"
    keyNs/                 the directed acyclic graph that most completely represents Freebase name resolution
    label/                 textual labels for concepts
    links/                 relationships where the object is a URI
    literals/              relationships where a the object is a literal
    name/                  this information is largely duplicative of "label"
    notability/            information about notable types
    notableForPredicate/   notable type information in an alternative representation
    other/                 relationships that match no other category (this is currently empty)
    text/                  text blobs other than descriptions
    webpages/              links to webpages that document concepts

If you look at one of these directories, you'll see something like

paul@amefurashi:~$ s3cmd ls s3://basekb-now/2013-10-13-00-00/sieved/a/
2013-10-14 18:09         0   s3://basekb-now/2013-10-13-00-00/sieved/a/_temporary_$folder$
2013-10-14 18:09  36825229   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00000.nt.gz
2013-10-14 18:09  36832880   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00001.nt.gz
2013-10-14 18:09  36829571   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00002.nt.gz
2013-10-14 18:09  36813141   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00003.nt.gz
2013-10-14 18:09  36825014   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00004.nt.gz
2013-10-14 18:09  36869664   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00005.nt.gz
2013-10-14 18:09  36828550   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00006.nt.gz
2013-10-14 18:09  36801478   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00007.nt.gz
2013-10-14 18:09  36831605   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00008.nt.gz
2013-10-14 18:09  36834452   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00009.nt.gz
2013-10-14 18:09  36834333   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00010.nt.gz
2013-10-14 18:09  36836216   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00011.nt.gz
2013-10-14 18:09  36843485   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00012.nt.gz
2013-10-14 18:09  36828278   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00013.nt.gz
2013-10-14 18:09  36831094   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00014.nt.gz
2013-10-14 18:09  36831899   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00015.nt.gz
2013-10-14 18:09  36820131   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00016.nt.gz
2013-10-14 18:09  36816911   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00017.nt.gz
2013-10-14 18:09  36780096   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00018.nt.gz
2013-10-14 18:09  36850616   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00019.nt.gz
2013-10-14 18:09  36824101   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00020.nt.gz
2013-10-14 18:09  36834614   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00021.nt.gz
2013-10-14 18:09  36819400   s3://basekb-now/2013-10-13-00-00/sieved/a/a-m-00022.nt.gz

If you wish to download a weekly release, I recommend that you get the s3cmd program and write something like

s3cmd get --recursive s3://basekb-now/2013-10-13-00-00/

If you're interested in downloading some specific subset, you can write

s3cmd get --recursive s3://basekb-now/2013-10-13-00-00/sieved/a/

to retrieve all of the rdfs:type relationships.

The rejected/ directory contains ill-formed statements as well as statements that we believe are incorrect.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.