Releases · paulhoule/infovore

12 Nov 21:22

b8db550

Latest

This release updates to a more recent version of the Jena framework. This release also contains bug fixes (consistently enable detailed modelling on all instances) and other operational improvement.

Assets 2

08 Nov 21:58

paulhoule

v3.0

74cb6b6

Transition to Hadoop 2.0

The weekly job now runs in Hadoop 2 on EMR. The 3.0 is series is experimental; 4.0 will happen when either things settle down or if there is another breaking change of dependencies.

Assets 2

22 Oct 14:59

paulhoule

v2.4

5f43b64

Fall 2014 BaseKB Prototype

This release contains a number of changes to the : BaseKB output, most importantly:

$-escapes are now converted to Unicode in keys and almost all raw strings
there is no longer a sieve3 horizontal subdivision
output triples are grouped and sorted by subject and divided into 210 shards

Numerous changes have happened behind the scenes, the most important of which is that the Spring XML that defines the weekly job has been moved into the bakemono project and is exported in a small JAR file that haruhi reads.

This release has cleared away obstacles to some big changes in dependencies which will happen soon.

Assets 2

07 Jul 19:53

paulhoule

v2.3

79c235d

Centipede Bump and Miscellaneous Apps

This version of Infovore is linked against Centipede 99.6 and includes a version bump to Spring 4.0.5.

In other news, several half-baked utilities have been checked in, for instance, you can do

haruhi run ssh i-598b673e

to ssh to a machine using an AMZN instance id instead of an ip address.

Assets 2

14 Apr 20:36

paulhoule

t20140412

ae35842

Job cost accounting

The major feature in this release is a job cost accounting function.

Assets 2

08 Apr 21:29

paulhoule

t20140408

961c9a4

Support for Hadoop Job-Level Accounting

Haruhi now writes a tag with the Hadoop job id to all line items for the job so we can add up line items with this tag to calculate that cost of a job after the fact. When running a flow (multiple jobs), Haruhi now uses the command line arguments of the flow to determine the name of the flow.

Assets 2

19 Mar 15:45

paulhoule

t20140318

ba74c20

smushObject tool and weekly flow optimization

Tuning job parameters has sped up the weekly flow from 2.5 hours to about 57 minutes with a small cost reduction. A job to smush objects has been created so it is now possible to import Dbpedia PageLinks into the
:BaseKB space.

Assets 2

12 Mar 22:00

paulhoule

t20140312

5e32414

sumRDF

This release include a sumRDF tool that will (for float type data) sum the object fields of the triples. This is necessary for the creation of a SubjectiveEye3D product that works with :BaseKB.

Assets 2

10 Mar 13:57

paulhoule

t20140306

221fda0

smushSubject tool

smushSubject uses a reduce-side join to change the vocabulary used in the subject field.

Assets 2

18 Feb 14:42

paulhoule

t20140217

e1e06a5

backport SelfAwareTool from telepath project

This release moves the "SelfAwareTool" from the telepath project into infovore; this component automatically configures a Hadoop job based on introspection of the environment of the Tool object.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: paulhoule/infovore

Transition to Jena 2.12

Transition to Hadoop 2.0

Fall 2014 BaseKB Prototype

Centipede Bump and Miscellaneous Apps

Job cost accounting

Support for Hadoop Job-Level Accounting

smushObject tool and weekly flow optimization

sumRDF

smushSubject tool

backport SelfAwareTool from telepath project