Transition to Jena 2.12

@paulhoule paulhoule released this Nov 12, 2014 · 11 commits to master since this release

This release updates to a more recent version of the Jena framework. This release also contains bug fixes (consistently enable detailed modelling on all instances) and other operational improvement.


Transition to Hadoop 2.0

@paulhoule paulhoule released this Nov 8, 2014 · 26 commits to master since this release

The weekly job now runs in Hadoop 2 on EMR. The 3.0 is series is experimental; 4.0 will happen when either things settle down or if there is another breaking change of dependencies.


Fall 2014 BaseKB Prototype

@paulhoule paulhoule released this Oct 22, 2014 · 73 commits to master since this release

This release contains a number of changes to the : BaseKB output, most importantly:

  • $-escapes are now converted to Unicode in keys and almost all raw strings
  • there is no longer a sieve3 horizontal subdivision
  • output triples are grouped and sorted by subject and divided into 210 shards

Numerous changes have happened behind the scenes, the most important of which is that the Spring XML that defines the weekly job has been moved into the bakemono project and is exported in a small JAR file that haruhi reads.

This release has cleared away obstacles to some big changes in dependencies which will happen soon.


Centipede Bump and Miscellaneous Apps

@paulhoule paulhoule released this Jul 7, 2014 · 122 commits to master since this release

This version of Infovore is linked against Centipede 99.6 and includes a version bump to Spring 4.0.5.

In other news, several half-baked utilities have been checked in, for instance, you can do

haruhi run ssh i-598b673e

to ssh to a machine using an AMZN instance id instead of an ip address.


Job cost accounting

@paulhoule paulhoule released this Apr 14, 2014 · 133 commits to master since this release

The major feature in this release is a job cost accounting function.


Support for Hadoop Job-Level Accounting

@paulhoule paulhoule released this Apr 8, 2014 · 142 commits to master since this release

Haruhi now writes a tag with the Hadoop job id to all line items for the job so we can add up line items with this tag to calculate that cost of a job after the fact. When running a flow (multiple jobs), Haruhi now uses the command line arguments of the flow to determine the name of the flow.