Modify character streams on the fly
Branch: master
Clone or download
Rodrigo Witzel
Rodrigo Witzel fixed: travis image URL
Latest commit ad8bb96 Jun 17, 2015
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.settings eclipse prefs Jun 22, 2012
src updated: developer section Mar 24, 2015
streamflyer-core improved: example in documentation Apr 16, 2015
streamflyer-experimental updated: developer section Mar 24, 2015
streamflyer-support improved: code to save line and column (in support module). not entir… Mar 27, 2015
.gitignore improved: example in documentation Apr 16, 2015
.project added: example that shows how to find line and column of a match in s… Mar 26, 2015
.travis.yml removed: temporarily the coverage calculation Mar 26, 2015
LICENSE added: Mar 24, 2015
README.md
pom.xml

README.md

Travis build status Coveralls coverage status Apache 2

+++ 2015-03-24: Streamflyer 1.2.0 released with a new groupId. New package names everywhere! +++

+++ 2015-03-24: Streamflyer has a new home on GitHub because Google Code is closing. +++

+++ 2014-10-08: Streamflyer 1.1.3 released. Available in Maven Central. +++

+++ 2013-11-10: New wiki page: How to implement a custom modifier for release 1.1.1 +++

+++ 2013-03-10: Regular expression on InputStream: Differences to Java Regex explained +++

../../blob/wiki/images/streamflyer-body.png

What it does

Wraps Java's Reader and Writer to modify characters in a stream - to apply regular expressions, to fix XML documents, whatever you want to do. Streamflyer is a convenient alternative to Java's FilterReader and FilterInputStream.

Contents

Usage

An example:

// choose the character stream to modify
Reader originalReader = ... // this reader is connected to the original data source

// select the modifier of your choice
Modifier myModifier = new RegexModifier("edit(\\s+)stream", Pattern.CASE_INSENSITIVE, "modify$1stream");

// create the modifying reader that wraps the original reader
Reader modifyingReader = new ModifyingReader(originalReader, myModifier);

... // use the modifying reader instead of the original reader

In this example the chosen Modifier replaces the string "edit stream" with "modify stream" while preserving the white space between edit and stream. You can write your own custom modifier or use a modifier that is shipped with Streamflyer, like the RegexModifier that replaces characters by using regular expressions.

The same can be done with a Writer instead of a Reader.

More information about the usage you find in the API documentation.

Implement custom modifiers

Read ImplementCustomModifier.

Compatibility to Java's Regular Expressions package

RegexModifier internally uses Java's Regex package. This is why it supports pattern flags, quantifiers, capturing groups the same way as Java does. An exception are look-behinds, see Section Known Limitations.

There is a small tutorial: AdvancedRegularExpressionsExample

Speed up your regular expressions

Have a look at streamflyer-regex-fast.

Fix invalid characters in XML streams

Sometimes you have to open XML documents that contain characters that are allowed in XML 1.1 documents but not allowed in XML 1.0 documents. And sometimes you have to open XML documents that contain characters that are entirely forbidden. For these kind of documents some pre-defined modifier exist so that the modified stream can be opened by standard XML parsers:

Modify byte streams

Streamflyer does not support modifications of byte streams out of the box. But you can convert your byte stream to a character stream, wrap the character stream by a modifying character stream, and then convert the character stream back to a byte stream. Don't expect an outstanding performance by this approach.

You find examples for modifying both InputStream and OutputStream on HowToModifyByteStreams.

Download

Go to the Installation page to get the latest release. This page provides also the Maven coordinates, prerequisites, and information about dependencies to other libraries.

Known limitations

RegexModifier

Look-behind constructs

If your regular expression contains look-behind constructs like

  • ^
  • \b
  • \B
  • (?<=X
  • (?<!X)

then Streamflyer's behaviour (version 1.1.1) differs from the behaviour of Java's Regex package.

What exactly is the difference? Java's String.replaceAll() finds all matches in the original string and creates a modified string in parallel. In contrast to this, Streamflyer looks for the next match, applies the replacement on the original string, then looks for the next match behind the replacement. Therefore, if the regular expression contains look-behind constructs this can lead to varying results.

Examples:

Regex Replacement Input Output (Java Regex) Output (Streamflyer)
^a (the empty string) aaabb aabb bb
(?<=foo)bar foo foobarbar foofoobar foofoofoo

Streamflyer's behaviour is unexpected for Java users and ,therefore, this behaviour could be changed by the next major release. But as long nobody asks for a new release, as long no new major release is planned.

If you want to use look-behind constructs, please keep in mind that you can replace them with other expressions in many cases. As Streamflyer reads the entire stream, look-behind constructs are not of big use.

Boundary matcher \G

The boundary matcher that matches the end of the previous match (\G) is not supported yet.

XmlVersionModifier

This modifier does not work for XML documents with a prolog that contains more than 4096 characters.

Questions, Suggestions, Issues

Questions and suggestions are welcome and can be sent to the discussion group. Issues can be reported on the Issues page of this project.

Some answered questions can be found in the FAQ.

Please give me feedback of any kind. It is highly appreciated.

Future enhancements, third party modifiers

The next major release will change the behaviour of RegexModifier regarding Look-behind constructs.

Please let us know if you made a modifier that could be useful for others. Such modifiers could ...

  • normalize unicode, i.e. transform characters into their canonical composed or decomposed form
  • include nested content, i.e. markup in the stream is replaced with the content of another stream which itself can contain such markup

If you find typos in the API documentation let me know.

Acknowledgments

The logo is based on drafts by K. Dabels.