Skip to content

searchisko/structured-content-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

structured-content-tools

Build Status Coverage Status

This framework contains tools useful to process/manipulate structured content represented in Java as Map of Maps structure. This structure is used often to represent variable JSON data.

We use this framework to allow highly configurable manipulation with content before store into Elasticsearch search index, for example in JIRA River Plugin for Elasticsearch, Universal Remote API River Plugin for Elasticsearch or Searchisko.

Content manipulation is performed over chain of Preprocessors. Each preprocessor must implement org.jboss.elasticsearch.tools.content.StructuredContentPreprocessor interface. You can use org.jboss.elasticsearch.tools.content.StructuredContentPreprocessorBase as base class for your preprocessor implementation. Chain of preprocessors can be loaded using methods in org.jboss.elasticsearch.tools.content.StructuredContentPreprocessorFactory.

You can use methods from org.jboss.elasticsearch.tools.content.ValueUtils and org.jboss.elasticsearch.tools.content.StructureUtils to simplify preprocessors implementation.

Framework contains some generic configurable preprocessors implementation:

  • AddValuePreprocessor - allows to add value to some target field. Value can be constant or contain pattern with keys for replacement with other data from content.
  • AddMultipleValuesPreprocessor - allows to add multiple value to some target fields. Value can be constant
  • RemoveMultipleFieldsPreprocessor - allows to remove one or more fields from data structure.
  • AddCurrentTimestampPreprocessor - allows to add current timestamp to some target field.
  • SimpleValueMapMapperPreprocessor - allows to perform mapping of simple value from source field over configured Map mapping structure to targed field. Optional default value can be used for values not found in mapping Map.
  • ValuesCollectingPreprocessor - collects values from multiple source fields (some of them can contain lists), remove duplicities, and store values as List in target field.
  • ESLookupValuePreprocessor - uses defined value from data to lookup document in ElasticSearch search index and put defined fields from it into defined target fields in data.
  • MaxTimestampPreprocessor - selects max timestamp value from array in source field and store it into target field
  • RequiredValidatorPreprocessor - checks defined source field for 'required' condition and throws exception if not match
  • TrimStringValuePreprocessor - trim String value from source field to the configured maximal length (whitespaces at the beginning and end are removed too) and store it into target field
  • StripHtmlPreprocessor - strip HTML tags and unescape HTML entities from String value of source field and store it into target field
  • LongToTimestampValuePreprocessor - interprets number value of source field as millis from 1.1.1970 timestamp and stores it into target field as string value with ISO formatted timestamp.
  • RegExpCapturingGroupPreprocessor - allows to extract substrings from source string value by use of Regular expression Capturing groups.
  • ScriptingPreprocessor - allows to run script by some Java Scripting API provided engine to manipulate processed data.
  • IsDateInRangePreprocessor - checks whether a particular date is in the given range of one or two constraining dates.
  • RESTCallPreprocessor - performs REST request (values from the data can be used in it) and put defined fields from JSON response into defined target fields in data.

structured-content-tools jar file is available from JBoss.org maven repository, you can use this dependency snippet in your pom.xml.

For Elasticsearch 1.x series and java 1.7

<dependency>
  <groupId>org.jboss.elasticsearch</groupId>
  <artifactId>structured-content-tools</artifactId>
  <version>1.3.11</version>
</dependency>

For Elasticsearch 0.90.5 series and java 1.6

<dependency>
  <groupId>org.jboss.elasticsearch</groupId>
  <artifactId>structured-content-tools</artifactId>
  <version>1.2.11</version>
</dependency>

Please note that Elasticsearch jars are necessary to run structured-content-tools as defined by provided dependency in the pom file.

About

Tools usefull to process/manipulate structured content in Java

Resources

License

Stars

Watchers

Forks

Packages

No packages published