Latest release

Metafacture Core Distribution 5.0.0

@cboehme cboehme released this Jan 9, 2018 · 10 commits to master since this release

This is the first release of the Metafacture 5 line. With Metafacture 5 the migration from a monolithic library to smaller domain-specific libraries is completed.

Important: This release is published with new Maven coordinates and uses a new package root. As Metafacture has been split-up into domain-specific libraries there is no longer a single Maven dependency. Instead there is one for each domain-specific library. The readme explains how to find the Maven coordinates of these dependencies. The root package has been changed from org.culturegraph.mf to org.metafacture.

Additionally, metafacture-runner has been merged back into the metafacture-core repository. This makes it easier to keep it up-to-date with new releases of metafacture-core.

Updating to Metafacture 5

  1. If you are still using metafacture-core 3.5.0 or older you should first update to metafacture-core 4.0.0. In this release many classes were relocated. By updating first to metafacture-core 4.0.0 you avoid having to handle the relocation and the split-up of the metafacture-core library at the same time.
  2. Search for org.culturegraph.mf in your project and replace it with org.metafacture. This should mostly affect import statements.
  3. Remove the Maven depedency on metafacture-core and follow the explanation in the readme to add the new domain-specific library dependencies.

Changes

Breaking changes

  • Split up the monolithic library into smaller domain-specific libraries and changed the Maven group id from org.culturegraph to org.metafacture. The readme explains how to find the Maven dependencies (see 2234874)
  • Changed package root from org.culturegraph.mf to org.metafacture (see 2234874)
  • Merge metafacture-runner (see 7d07e91)

Bug fixes

  • Fixed #267: XmlUtil.escape does not handle Unicode corretly for codepoints above U+10000 (see 95ff1ab)
  • Changed JsonEncoder to not prefix pretty-print with spaces (thanks @blackwinter, see de8e7b3)
  • Included metafacture-files in Metafacture Distribution (see a6dc848)

New Features

Build infrastructure

  • Migrated from Maven to Gradle (see 0f7a3b2)
  • Automated the release process. A release is now triggered by pushing an annotated tag to the Github repository. The release is automatically published on Maven Central and the distribution files are uploaded to Github (see 04276e4, b544371, 52ab09b)
  • Version numbers are generated from SCM information (see bc4a3d3)
  • Added Sonarqube analysis to the CI build process (see 8a1f500, 91a4448)
Pre-release

Metafacture Core Distribution 5.0.0-rc2

@cboehme cboehme released this Jan 4, 2018 · 13 commits to master since this release

This is the second and last release candidate for Metafacture 5. With Metafacture 5 the migration from a monolithic library to smaller domain-specific libraries is completed.

Changes

Bug fixes

  • Changed JsonEncoder to not prefix pretty-print with spaces (thanks @blackwinter, see de8e7b3)
  • Included metafacture-files in Metafacture Distribution (see a6dc848)
Pre-release

Metafacture Core Distribution 5.0.0-rc1

@cboehme cboehme released this Oct 13, 2017 · 19 commits to master since this release

This is the first release candidate for Metafacture 5. With Metafacture 5 the migration from a monolithic library to smaller domain-specific libraries is completed.

Changes

Breaking changes

  • Split up the monolithic library into smaller domain-specific libraries and changed the Maven group id from org.culturegraph to org.metafacture. The readme explains how to find the Maven dependencies (see 2234874)
  • Changed package root from org.culturegraph.mf to org.metafacture (see 2234874)
  • Merge metafacture-runner (see 7d07e91)

Bug fixes

  • Fixed #267: XmlUtil.escape does not handle Unicode corretly for codepoints above U+10000 (see 95ff1ab)

New Features

Build infrastructure

  • Migrated from Maven to Gradle (see 0f7a3b2)
  • Automated the release process. A release is now triggered by pushing an annotated tag to the Github repository. The release is automatically published on Maven Central and the distribution files are uploaded to Github (see 04276e4, b544371, 52ab09b)
  • Version numbers are generated from SCM information (see bc4a3d3)
  • Added Sonarqube analysis to the CI build process (see 8a1f500, 91a4448)

Metafacture Runner Distribution 4.0.0

@cboehme cboehme released this Jul 26, 2017 · 820 commits to master since this release

This release updates the metafacture-core dependency to version 4.0.0.

This is the last release of the Metafacture Runner Distribution. Starting with Metafacture 5 the distribution will be named Metafacture Core Distribution.

Changes

  • Minimum required Java version is now Java 8
  • Issues with running flux.sh on MacOS have been resolved (thanks @miku, see #9, #10)
  • visualizeMorphDefs.sh has been removed as support for Metamorph visualisation is no longer available in metafacture-core 4.0.0 (see 5b6dd56).

Please see the release notes for metafacture-core for a list of changes.

Metafacture Core 4.0.0

@cboehme cboehme released this Jan 9, 2017 · 225 commits to master since this release

This is the last release of metafacture-core as a single Maven artifact. The release concentrates on reorganising the sources to prepare for the split-up of metafacture-core in the next release.

Important: This release contains a number of breaking changes and updating from metafacture-core-3.5.0 is not trivial. The most notable changes are the update to Java 8 and the reorganisation of the package structure which changed the qualified names of all Metafacture modules and moved many of the other classes to new packages. Please refer to the list of modules per package at the end of the release notes to find the new module locations. All other relocated and renamed classes are listed below.

Changes

New features

Flux

  • Added the @FluxCommand annotation to all modules which can be use in Flux (see 4ff1ce5, b297e6d)

Metamorph

  • Fix #256: Support sameEntity in none and all. Add support for the sameEntity attribute to none and all statements. The attribute does not make sense in any statements (see 93bc48d)
  • The RestMap which looks up values by doing a REST request works now (thanks @philboeselager, see 704d4ad)
  • The new helper class InlineMorph simplifies embedding Metamorph scripts directly in Java. It was introduced to help writing test cases for Metamorph functions and collectors (see 9b6e7f1)

Metafacture modules

  • Added ForwardingStreamPipe as a base class for modules which only need to intercept some events but forward all others unmodified (see 4412623)
  • Added NullFilter which replaces null values with a replacement string or discards them (see c282c74)
  • Added JsonToElasticsearchBulk module to create Elasticsearch bulk import data from JSON (thanks @fsteeg and @blackwinter, see 9d85194, 056fe60)
  • Added AlephMabXmlHandler for the widely used Mab-Xml derivative created by Aleph exports (thanks @dr0i, see 482af42)
  • Added AseqDecoder. A decoder for aseq data (thanks @larsgsvensson, see f77b1ae)
  • Added XmlElementSplitter. The module splits an xml document at acertain element (thanks @dr0i, see 5517beb)
  • Added XmlFilenameWriter: The module extracts a file name from an xml document and saves the document to the extracted file. It's possible to store the file uncompressed and as bz2 (thanks @dr0i, see 40d290e)
  • Added XmlTee. A tee implementation for XML event streams. Allows to forward a stream to more than one downstream module (thanks @dr0i, see 3fb3b4d)
  • PojoEncoder: Added support for populating maps in POJOs (thanks @thomasseidel, see e23bc13)
  • IdChangePipe (now RecordIdChanger): Added corresponding getters for setters (see e8300a8)
  • Utf8Normalizer (now UnicodeNormalizer): Made normalisation form configurable (see 314a641)
  • PicaDecoder: Added record ids for level 1 & 2 records. Local system records (level 1) and holding records (level 2) do not store their record id in field 003@ $0 but in field 107F $0 or 203@ $0 (the latter may include an occurrence specification). These ids are now emitted as record ids in start-record events when level 1 or level 2 records are processed (see c955f4d, a9529dd)
  • Marc21Encoder: The record identifier field (001) can now be automatically created from the record identifier of the start record event. This is configured by the setGenerateRecordId(boolean) parameter (see 6d04d69)

Other

  • Added a framework reading and writing ISO 2709:2008 records (see 3b24df5)
  • ResourceUtil: Added readAll(InputStream, Charset) and readAll(Reader) to read a full stream into a string (see 9a70936)
  • XmlUtil: Added escape(String) method for escaping strings for xml output (see 9fb4154)

Bug fixes

Flux

  • The flux grammar supported octal escape sequence but failed to convert them into characters after parsing (see b4da5bf)

Metamorph

  • Fix #255: Metamorph emits null as entity name (see 8adafef)
  • Fix #257: Do not reset entity if reset is false (see a680a28)
  • Fix #265: split and switch-name-value functions emitted wrong source (see a7f6785)
  • Fixed resource leak in Metamorph file-maps (see d519e8c)

Metamorph-Test

  • Fix #6: Fix test names for Metamorph Test in Intellij. Intellij did not show the name of the xml files containing the tests but only the string "xml". This was caused by Intellij interpreting the test class name as a fully qualified java class name and attempting to extract the class name from it. By making sure that the Metamorph Test names do not contain any dots this problem can be avoided (see 033e6d0)

Metafacture modules

  • StreamUnicodeNormalizer no longer fails on null values in literals but simply forwards them (see 12e3420)
  • Marc21Encoder produced invalid MARC 21 records if the record data contained unicode codepoints which required more than one byte in UTF-8 encoding (see 6d04d69)

Removed features

Flux

  • Removed generic-xml Flux command. Use decode-xml followed by handle-generic-xml instead (see 53edfdb)
  • Removed MorphVisualizer. The tool was outdated and not well maintained. There is currently no replacement for it (see 73efd63)

Metamorph

  • Fix #226: Remove miss-spelled options from occurrence function. The occurrence function in Metamorph no longer supports lessThen and moreThen in its only attribute. Instead lessThan and moreThan must be used (see 800108e)
  • The constructors of the collector helper classes expected a reference to the Metamorph object. The parameter has been removed as collectors should not access the Metamorph object (see f76f39e)
  • CollectFactory, FunctionFactory and MapsFactory have been made package-private in the metamorph package. The are considered to be an internal part of Metamorph (see c805690)

Metamorph-Test

  • TestConfigurationException is removed and replaced with JUnit's InitializationError (see 35dfaf5)
  • MetamorphTestCase, MetamorphTestLoader and MetamorphTestRunner are made package-private as the are not required for using Metamorph-Test (see 35dfaf5)

Metafacture modules

  • Removed CGEntityDecoder, CGEntityEncoder, CGTextDecoder, CGEntityReader and the helper class CGEntity. Use FormetaDecoder and FormetaEncoder instead. To convert data cg-entity or cg-text format to Formeta use metafacture-3.5.0 which contains support for both formats (see ac1c71d)
  • Removed Bzip2Opener and GzipOpener. The generic FileOpener automatically recognises and handles compressed files (see c3f24eb)
  • Removed RecordBounderyRemover. The functionality provided by this module is also provided by StreamEventDiscarder (see 305c8d4)
  • Removed ObjectExceptionLogger in favour of ObjectExceptionCatcher which provides the same functionality (see 919349b)
  • Removed RecordBatcher. Use implementations of AbstractBatcher instead (see 52da508)
  • Removed StreamFormatter. The FormetaEncoder and StreamLogger modules provide very similar functionality and should be used instead (see 2f6bc2a)
  • Removed WrappingStreamPipe. Modules with nested pipelines should manage them themselves (see 52a5f21)
  • Removed SimpleJsonEncoder. The JsonEncoder module provides the same functionality (see 7d5e467)
  • Removed EventListSource. If used with EventList it can be often replaced with a StreamBuffer (see f77539b)
  • Removed MultiOpener. There were no concrete opener implementations registered to be used by MultiOpener. So, the class was completely useless (see 566022f)
  • Removed CsvReader, GenericXmlReader, LidoReader, MabReader, MarcReader, MetsModsReader and PicaXmlReader. Users who use these readers should replace them by the corresponding combination of a record splitting module and a record decoding module (see 53edfdb)
  • CGXmlReader, FormetaReader, MarcXmlReader, PicaReader, Reader, ReaderBase, XmlReaderBase and ReaderFactory have been moved to Metamorph-Test and should not be used elsewhere. The classes might be changed or removed in future versions without notice. Users who use these readers should replace them by the corresponding combination of a record splitting module and a record decoding module (see 53edfdb)
  • Removed MultiFormatReader. There is no replacement for this module (see 53edfdb)
  • StreamValidator and WellformednessChecker have been moved to Metamorph-Test and should not be used elsewhere. The classes might be changed or removed in future versions without notice. Test cases that used the two modules should be changed to use Mockito to verify the test result. Metafacture's test cases contain many examples showing how to do this (see 497586e)
  • Removed AbstractStreamBatcher. There is no replacement (see d612287)
  • MabDecoder: Removed static process() method (see 17ab3b0)
  • LiteralExctractor (now LiteralToObject): Removed the single-argument constructor. Use setPattern(String) instead (see 1494cbe)
  • SimpleXmlEncoder: Removed setNamespace(MultiMap) setter. It can be replaced with setNamespaces(myMultiMap.getMap("namespaces")) if required (see 5bdfb85)

Other

  • tries package: ACNode is now package-private as it is considered to be an internal part of the tries package (see 4814059)
  • xml package: The XML filter classes CDataFilter, CommentsFilter, IgnorableWhitespaceFilter and LexicalHandlerXmlFilter have been made package-private as they are an implementation detail of the xml package (see 6788bac)
  • Removed ReflectionUtil. There is no replacement for this utility class as it was considered an internal class (see 10338d9)
  • Removed StreamConstants.SERIALIZED (now StandardEventNames). Code that uses this constant should define it itself (see f3c28fd)
  • Removed MultiHashMap. There is no replacement for this class. Use HashMap<String, HashMap<String, String>> if necessary (see 3b534e7)
  • The MemoryWarningSystem has been made package-private. There is no replacement for it (see 53d99ca)
  • The constructors of Event are now package-private as instances of the class should only be created by EventList (see 4f94377)
  • Removed ShouldNeverHappenException. Throw AssertionError instead (see 5a01e43)
  • Removed FormatException(Throwable) and MissingIdException(Throwable) constructors. These exceptions should not be created without providing an error message (see 040058f)
  • Removed exception IllegalEncodingException. This exception is not used by Metafacture. There is no replacement (see 41f3aa3)

Moved and renamed items

Metafacture framework

  • Moved Default... classes from framework package to framework.helpers (see 579065b)
  • Moved Triple to `framework.objects' package (see 2c7e22a)
  • Moved @FluxCommand into framework package (see 5adfc8a)
  • Moved MetafactureException, FormatException and MissingIdException to framework package (see 6aeeec6)
  • Renamed StreamConstants to StandardEventNames and moved the class to the framework package (see f3c28fd)
  • Renamed DefaultXMLReceiver to DefaultXmlReceiver (see b3995af)

Flux

  • Moved FluxParseException to flux package (see ba6225d)
  • Moved StringSender module to flux package (see 27dd303)

Metamorph

  • Moved MorphDefException and MorphException to metamorph package (see ba6225d)
  • Moved the interfaces for defining collectors, functions, maps and interceptors to the metamorph.api package. The abstract classes which implement common functionality for functions and collectors are moved the metamorph.api.helpers sub package (see d940461)
  • Moved Entity from metamorph.collectors to metamorph package (see 5e35703)
  • Moved DomLoader to metamorph.xml package. The class is not part of the public API of Metafacture and should not be used (see a3265af)
  • Renamed morph package to metamorph (see 85ffca1)
  • Renamed MultiMap to Maps and moved it to metamorph.api (see 3b534e7)
  • Renamed Function.setMultiMap(MultiMap) to Function.setMaps(Maps) (see 3b534e7)
  • Renamed MapFile to FileMap (see 447ae60)

Metafacture-Test

  • Renamed TestSuite to MetamorphTestSuite (see 35dfaf5)

Metafacture modules

  • A new package structure for Metafacture modules has been devised. This means that all modules have been moved to new packages. The new structure groups the modules based on their functionality and business domain in which they are typically used. This should make it easier to find modules. Please refer to the list of modules per package at the end of the release notes (see 53edfdb, 6e2af14, 96bf155, 483103b, c8c1dd7, f0883cd, 9894dfe, d11c4ad, 50402af, 5926120, 32e0a10, 483103b, b92035f, eab3739, 6fe07a8, c567ee8, ca7c607, 8852fd4, 47956d4, 5308ef4, fb9c254, 042c158, a016b23, dfe6b78, 8ebddc9, ac64e1f, f0883cd)
  • Renamed StreamLiteralFormater to StreamLiteralFormatter (see ecd5602)
  • Renamed CloseSupressor to CloseSuppressor (see 802dc28)
  • Renamed BatchLogger to StreamBatchLogger (see 970be0c)
  • Renamed IdChangePipe to RecordIdChanger (see 7e8da65)
  • Renamed LiteralExctractor to LiteralToObject (see eab074a)
  • Renamed AbstractBatcher to AbstractStreamBatcher (see 4820850)
  • Renamed BatchResetter to StreamBatchResetter (see 4820850)
  • Renamed Utf8Normalizer to UnicodeNormalizer (see 314a641)
  • Renamed ObjectBuffer to ObjectCollector (see e826098)
  • Moved DigestAlgorithm into FileDigestCalculator (see 7a08bfd)
  • Moved Event into EventList (see 4f94377)
  • IdChangePipe (now RecordIdChanger): Renamed setKeepIdless(boolean) to setKeepRecordsWithoutIdLiteral(boolean) (see e8300a8)
  • RegexDecoder: Renamed defaultLiteralName to rawInputLiteral (see 44a2251)
  • StreamUnicodeNormalizer: Renamed setNormalizationType() to setNormalizationForm(). The getter is renamed accordingly (see 36cba90)
  • ObjectPipeDecoupler: Renamed DEFUALT_CAPACITYto DEFAULT_CAPACITY (see 4c6e797)

Other

  • Moved contents of util and types package to commons package (see 9904262)
  • Moved RecordIdentifier and FilenameExtractor to the xml package (see 533a559)

Changed behaviour

Flux

  • Added missing @In and @Out annotations (see b297e6d)

Metafacture framework

  • Triple now throws IOExceptioninstead of ShouldNeverHappenException (see 5a01e43)
  • Changed the serialVersionUID of FormatException, MetafactureException and MissingIdException from a random number to zero (see 040058f)

Metamorph

  • AbstractReadOnlyMap now throws an UnsupportedOperationException instead of aNotImplementedException (see a766bca)
  • DomLoader does not throw Metamorph specific exception in DomLoader anymore. A generic MetafactureException is thrown instead instead of a MorphDefException (see 5043295)
  • Replaced util.xml.Location with a generic SourceLocation class (see 2f0ef71)
  • JndiSqlMap: Removed log messages informing about errors while closing a JDBC resource (see b36d7cf)
  • Metamorph does not throw MorphDefException and MorphException exceptions any more but instances of a new exception class named MetamorphException instead. Additionally, the two exceptions have been renamed from MorphDefException to MorphBuildException and from MorphException to MorphExecutionException.The two exceptions do no longer inherit from MetafactureException but directly from RuntimeException. The are only used by Metamorph internally (see 8773d33)
  • Changed exception thrown by the occurrence function if the only attribute is invalid is changed from type MetafactureException to MorphBuildException. The latter is wrapped in a MetamorphExceptionby Metamorph (see a40eff5)
  • Changed exception thrown if a function, collector or map type is not known from IllegalArgumentException to MorphBuildException.The latter is wrapped in a MetamorphException by Metamorph (see 168d1ea)
  • ResourceUtil: IOExceptions thrown by the methods in ResourceUtil are no longer wrapped in MetafactureExceptions (see a3554ad)

Metamorph-Test

  • WellformednessChecker: Improved error reporting and recovery. Instead of throwing a WellformednessException the module now invokes a user-supplied error handler. Additionally, the module automatically recovers from invalid input. Errors caused by invalid data do no longer break the structure of the event stream (see 8a9b23a, 8557717)
  • StreamValidator: Improved error reporting. Instead of throwing a ValidationException the module now invokes a user-supplied error handler. This makes it possible to directly throw an AssertionError when using the StreamValidator in the Metamorph test runner. This solves the problem of the old implementation which caught all subclasses of FormatException. This resulted in difficult to understand error messages when an error in the input data was reported as a failed validation instead of broken data (see e2b5054)

Metafacture modules

  • Replaced MarcDecoder with Marc21Decoder. This module is a complete reimplementation based on the ISO 2709 parser. The new module made the check of the charset encoding mandatory. The static process() methods have been removed. The old MarcDecoder used a simple string-splitting based parsing algorithm which made many assumptions about the structure of the MARC 21 records. In constrast, the new module now parses MARC 21 records properly. Because of this records with errors which can be decoded by MarcDecoder may produce errors if parsed with the new Marc21Decoder (see 3b24df5, c6fd42f, 6d04d69)
  • CGXmlHandler: The handler is now namespace aware and checks the version attribute in the CG-XML root tag. It does no longer emit null as record id if it is missing but instead outputs an empty string. If an entity or literal name is missing a FormatException is now thrown (see abdbf75)
  • StreamBatchMerger: The class does not inherit from AbstractBatcher anymore. The methods for returning record and batch counts are removed (see 52a5f21)
  • ObjectBatchLogger: Use qualified class name of ObjectBatchLogger as logger name instead of class BatchLogger (see 32ced23)
  • IdChangePipe (now RecordIdChanger): The constructor which accepted the name of the id literal has been removed in favour of the setIdLiteral(String) method (see e8300a8)
  • PreambleEpilogueAdder: The epilogue string is now only emitted if an object was received. The preamble is only emitted if it is not empty (as it was already the case with the epilogue string). After resetting the stream a new preamble string is emitted when the next object is received.
  • CsvDecoder now uses the opencsv library for parsing. As a consequence the decoder does no longer allow using a regular expression for defining the delimiters, only single characters are supported. The new default separator is ',' (thanks @fsteeg, see 1e51114)
  • StreamFlattener: Removed balanced entities check. Metafacture modules generally assume that the event stream is wellformed (see 7490216, 9dfb892)
  • JDomDocumentToStream: The module now implements ObjectPipe instead of ObjectReceiver and Sender separately (see 225e63b)
  • StreamToJDomDocument: The module now implements StreamPipe instead of StreamReceiver and Sender separately (see 225e63b)

Other

  • iso2709 package: The framework for processing ISO 2709 records has been heavily changed. Support for decoding data in ISO 2709 has been added. The public API of the package is now much simpler than before. Please refer to the commit comment for detailled information on the changes to the API (see 6d04d69)
  • CharMap now throws an UnsupportedOperationException instead of a NotImplementedException (see a766bca)
  • XmlUtil now throws AssertionErrorinstead of ShouldNeverHappenException (see 5a01e43)
  • Improved utilities for dynamically loading classes and initialising them. This functionality was implemented completely in ObjectFactory. Now it is separated into three classes. ResourceUtil provides methods for loading classes and wrapping them in a ConfigurableClass. This new class manages class initialisation via constructor arguments and setters. It replaces the static ObjectFactory.newInstance and ObjectFactory .applySetters methods. Additionally, it has methods for querying the list of setters and there types. The only responsiblity left in ObjectFactory is to manage a mapping from custom names to fully qualified class names. The exception type thrown on errors by these classes has been changed to ReflectionException (see 3e4a240, bad5b7c)

Other improvements

  • Metafacture 4.0.0 requires Java 8 (see 361723b)
  • Refactored many test cases in Metafacture (thanks @emopers for fixing usage of PrintStream in tests, see 428e579)
  • Updated all dependencies and Maven plugins to their latest version (see 7486d20, dfc1233, ab6b26d; thanks @sschuepbach for updating jackson-core, see 1dc4406)
  • Improved many Javadoc comments.
  • Reduced the number of dependencies on external libraries.
  • Small updates and improvement for the project presentation on github and continuous integration with Travis.

New module packages

In Metafacture 4.0.0 the structure of the Metafacture module package has been heavily changed. The following lists show the new locations of the modules. If modules were renamed the old name is given in parentheses.

Package org.culturegraph.mf.biblio

This package contains modules for working with bibliographic data formats: marc21.Marc21Decoder, marc21.Marc21Encoder, marc21.MarcXmlHandler, pica.PicaDecoder, pica.PicaEncoder, pica.PicaMultiscriptRemodeler, pica.PicaXmlHandler, AlephMabXmlHandler, AseqDecoder, MabDecoder

Package org.culturegraph.mf.csv

This package contains modules for working with comma separated value (csv) files: CsvDecoder

Package org.culturegraph.mf.elasticsearch

This package contains modules for working with elasticsearch: JsonToElasticsearchBulk

Package org.culturegraph.mf.files

This package contains modules that perform file operations: DirReader, FileDigestCalculator

Package org.culturegraph.mf.flowcontrol

This package contains modules for controlling the flow of stream events and objects in a Metafacture pipeline: CloseSuppressor (was: CloseSupressor), ObjectExceptionCatcher, ObjectPipeDecoupler, StreamBatchResetter (was: BatchResetter), StreamBuffer, StreamDeferrer, StreamExceptionCatcher

Package org.culturegraph.mf.formatting

This package contains modules for formatting streams for string output: ObjectTemplate, PreambleEpilogueAdder, StreamLiteralFormatter (was StreamLiteralFormater)

Package org.culturegraph.mf.formeta

This package contains modules for the formeta data format: FormetaDecoder, FormetaEncoder, FormetaRecordsReader

Package org.culturegraph.mf.io

This package contains modules for reading and writing data: FileOpener, HttpOpener, LineReader, ObjectFileWriter, ObjectJavaIoWriter, ObjectStdoutWriter, ObjectWriter, RecordReader, ResourceOpener, StdInOpener, TarReader

Package org.culturegraph.mf.javaintegration

This package contains modules for integrating Metafacture pipelines with Java: pojo.PojoDecoder, pojo.PojoEncoder, EventList, MapToStream, NamedValueList, NamedValueSet, ObjectCollector (was: ObjectBuffer), SingleValue, StringListMap, StringListMapToStream, StringMap, ValueSet

Package org.culturegraph.mf.jdom

This package contains modules for working with JDOM documents: JDomDocumentToStream, StreamToJDomDocument

Package org.culturegraph.mf.json

This package contains modules for working with the JSON data format: JsonEncoder

Package org.culturegraph.mf.linkeddata

This package contains modules for working with linked data and linked data formats: BeaconReader, OreAggregationAdder, RdfMacroPipe

Package org.culturegraph.mf.mangling

This package contains modules that modify the object or event stream: DuplicateObjectFilter, EntityPathTracker, LiteralToObject (was LiteralExtractor), NullFilter, ObjectToLiteral, RecordIdChanger (was IdChangePipe), RecordToEntity, StreamEventDiscarder, StreamFlattener

Package org.culturegraph.mf.metamorph

This package contains metamorph and modules building directly on the the Metamorph language: Filter, Metamorph, Splitter

Package org.culturegraph.mf.monitoring

This package contains modules for monitoring and debugging a Metafacture pipeline: ObjectBatchLogger, ObjectLogger, ObjectTimer, StreamBatchLogger (was BatchLogger), StreamLogger, StreamTimer

Package org.culturegraph.mf.plumbing

This package contains modules for constructing Metafacture pipelines with branches: IdentityStreamPipe, ObjectTee, StreamBatchMerger, StreamMerger, StreamTee, XmlTee

Package org.culturegraph.mf.scripting

This package contains modules that enable users to define processing steps using scripting languages such as Javascript: JScriptObjectPipe

Package org.culturegraph.mf.statistics

This package contains modules for computing various statistics: CooccurrenceMetricCalculator, Counter, Histogram, UniformSampler

Package org.culturegraph.mf.strings

This package contains modules which modify strings: LineSplitter, RegexDecoder, StreamUnicodeNormalizer, StringConcatenator, StringDecoder, StringFilter, StringMatcher, StringReader, UnicodeNormalizer (was: Utf8Normalizer)

Package org.culturegraph.mf.triples

This package contains modules for working with Triples: StreamToTriples, TripleCollect, TripleCount, TripleFilter, TripleObjectRetriever, TripleObjectWriter, TripleReader, TripleReorder, TripleSort, TriplesToStream, TripleWriter

Package org.culturegraph.mf.xml

This package contains modules for working with xml data: CGXmlHandler, GenericXmlHandler, SimpleXmlEncoder, XmlDecoder, XmlElementSplitter, XmlFilenameWriter

Metafacture Core 3.5.0

@cboehme cboehme released this Jul 8, 2016 · 464 commits to master since this release

This is a bug fix release with some small new features

Bug fixes
  • Fix #255: Metmorph emits null as entity name (9821220)
  • Fix #257: Do not reset entity if reset is false (ee9ef99)
New Features
  • Fix #256: Support sameEntity in none and all (f8ef044)

See commits for details

Maven Coordinates

Metafacture core is available on Maven Central:

<dependency>
  <groupId>org.culturegraph</groupId>
  <artifactId>metafacture-core</artifactId>
  <version>3.5.0</version>
</dependency>

Metafacture Core 3.4.0

@cboehme cboehme released this Jul 8, 2016 · 464 commits to master since this release

This is a feature release which adds some new Metafacture modules

New Features
  • Add module RecordToEntity to turn records into entities (c337ce9)
  • Add module EntityPathTracker for returning the current entity path (0a3ff6f)
  • Add module StreamEventDiscarder (e613a26)
  • Add StreamDeferrer module (3acf7a1)
  • PicaDecoder: Make whitespace removal in field names optional (3547a43)

See commits for details.

Maven Coordinates

Metafacture core is available on Maven Central:

<dependency>
  <groupId>org.culturegraph</groupId>
  <artifactId>metafacture-core</artifactId>
  <version>3.4.0</version>
</dependency>

Metafacture Core 3.3.1

@cboehme cboehme released this Mar 16, 2016 · 475 commits to master since this release

This is a small bug fix release.

Bug fixes
  • Fix #191: TriplesCollect Should not output endRecord() events in closeStream() and resetStream() events if no triples were received.

Maven Coordinates

Metafacture core is available on Maven Central:

<dependency>
  <groupId>org.culturegraph</groupId>
  <artifactId>metafacture-core</artifactId>
  <version>3.3.1</version>
</dependency>

Metafacture Core 3.3.0

@cboehme cboehme released this Dec 17, 2015 · 480 commits to master since this release

This is a minor update.

Bug fixes
  • SimpleXmlEncoder did write a closing root tag if resetStream() was called even if no root tag was open (see #249, which describes the same issue in closeStream() for details)
New features
  • Fix #252: Add parameter (setWriteRootTag) to make output of root tag in SimpleXmlEncoder optional.

Maven Coordinates

Metafacture core is available on Maven Central:

<dependency>
  <groupId>org.culturegraph</groupId>
  <artifactId>metafacture-core</artifactId>
  <version>3.3.0</version>
</dependency>

Metafacture Core 3.2.0

@cboehme cboehme released this Dec 16, 2015 · 486 commits to master since this release

This is mainly a bug fix release but it has some new features.

Changed Behaviour
  • The reset behaviour of the choose collector has changed when it is used with an if-condition: In the old implementation the chosen value was only cleared after it was actually emitted (so only if the condition was met). Now it is also cleared if it was attempted to be emitted but was not due to the condition not being met. This is in-line with the behaviour of the other collectors.
Bug fixes
  • Disable javadoc doclint in order to be able to create release builds and submit snapshots builds to http://oss.sonatype.org/ from Travis CI.
  • Fix #249: emits closing tag without opening tag in SimpleXmlEncoder
  • Fix #235, #237: Add PicaXmlHandler to flux-command.properties
New features
  • Resolve #49, #210, #250: The choose collector now supports reset and sameEntity and is reset even if an if-condition prevents emitting a value.
  • Resolve #247: set namespace map directly in SimpleXmlEncoder. It is no longer neccessary to use a MultiMap with a specially named key that contains the actual namespace map. Instead the namespace map can be passe directly to the SimpleXmlEncoder.
  • Resolve #248: support default namespace in SimpleXmlDecoder
  • Resolve #187: Make delimiter in concat is now optional
  • Resolve #238: the Travis build status is now shown in the readme file on github

Maven Coordinates

Metafacture core is available on Maven Central:

<dependency>
  <groupId>org.culturegraph</groupId>
  <artifactId>metafacture-core</artifactId>
  <version>3.2.0</version>
</dependency>