Skip to content

Releases: paulhoule/infovore

Transition to Jena 2.12

12 Nov 21:22
Compare
Choose a tag to compare

This release updates to a more recent version of the Jena framework. This release also contains bug fixes (consistently enable detailed modelling on all instances) and other operational improvement.

Transition to Hadoop 2.0

08 Nov 21:58
Compare
Choose a tag to compare

The weekly job now runs in Hadoop 2 on EMR. The 3.0 is series is experimental; 4.0 will happen when either things settle down or if there is another breaking change of dependencies.

Fall 2014 BaseKB Prototype

22 Oct 14:59
Compare
Choose a tag to compare

This release contains a number of changes to the : BaseKB output, most importantly:

  • $-escapes are now converted to Unicode in keys and almost all raw strings
  • there is no longer a sieve3 horizontal subdivision
  • output triples are grouped and sorted by subject and divided into 210 shards

Numerous changes have happened behind the scenes, the most important of which is that the Spring XML that defines the weekly job has been moved into the bakemono project and is exported in a small JAR file that haruhi reads.

This release has cleared away obstacles to some big changes in dependencies which will happen soon.

Centipede Bump and Miscellaneous Apps

07 Jul 19:53
Compare
Choose a tag to compare

This version of Infovore is linked against Centipede 99.6 and includes a version bump to Spring 4.0.5.

In other news, several half-baked utilities have been checked in, for instance, you can do

haruhi run ssh i-598b673e

to ssh to a machine using an AMZN instance id instead of an ip address.

Job cost accounting

14 Apr 20:36
Compare
Choose a tag to compare

The major feature in this release is a job cost accounting function.

Support for Hadoop Job-Level Accounting

08 Apr 21:29
Compare
Choose a tag to compare

Haruhi now writes a tag with the Hadoop job id to all line items for the job so we can add up line items with this tag to calculate that cost of a job after the fact. When running a flow (multiple jobs), Haruhi now uses the command line arguments of the flow to determine the name of the flow.

smushObject tool and weekly flow optimization

19 Mar 15:45
Compare
Choose a tag to compare

Tuning job parameters has sped up the weekly flow from 2.5 hours to about 57 minutes with a small cost reduction. A job to smush objects has been created so it is now possible to import Dbpedia PageLinks into the
:BaseKB space.

sumRDF

12 Mar 22:00
Compare
Choose a tag to compare

This release include a sumRDF tool that will (for float type data) sum the object fields of the triples. This is necessary for the creation of a SubjectiveEye3D product that works with :BaseKB.

smushSubject tool

10 Mar 13:57
Compare
Choose a tag to compare

smushSubject uses a reduce-side join to change the vocabulary used in the subject field.

backport SelfAwareTool from telepath project

18 Feb 14:42
Compare
Choose a tag to compare

This release moves the "SelfAwareTool" from the telepath project into infovore; this component automatically configures a Hadoop job based on introspection of the environment of the Tool object.