Skip to content

reugn/gravity

Repository files navigation

Gravity

Build

Gravity is a Java string matching library with a rich multi-pattern and simple string match interfaces. It provides an ability to match against an InputStream without loading the whole target into the memory.

Gravity wraps Apache Tika to detect and extract metadata and structured text content from various document formats.

Getting Started

Build from source

./gradlew clean build

Install as a dependency

Read on how to install the gravity package from GitHub Packages.

Installation

  • Install TikaOCR to search text inside an image.

Usage examples

A single pattern matcher:

InputStream is = Utils.readResource("/apache-license-2.0.txt");
Pattern pattern = new Pattern("within third-party archives", 1);

Matcher m = new BMHMatcher();
int res = m.match(pattern, is, 2);
assertEquals(1, res)

Use an ad hoc matcher for multiple patterns:

InputStream is = Utils.readResource("/apache-license-2.0.txt");
List<Pattern> patterns = new ArrayList<>();
patterns.add(new Pattern("within third-party archives", 10, 5));
patterns.add(new Pattern("modified files", 5, 5));
patterns.add(new Pattern("class name", 2, 5));

MultiMatcher m = new ConcurrentMultiMatcher(ContainsMatcher::new, 32);
int res = m.match(patterns, is).get().get();
assertEquals(17, res);

Load patterns from a CSV file and use the Trie matcher:

InputStream patterns_is = Utils.readResource("/patterns.csv");
List<Pattern> patterns = Patterns.fromCSV(patterns_is, Filters::specialChars);
SpecifiedMatcher sm = new TrieMatcher(patterns, 3);
InputStream is = Utils.readResource("/apache-license-2.0.txt");
int res = sm.match(is).get().get();
assertEquals(18, res);

License

Licensed under the Apache 2.0 License.