InvertLight is a simple implementation of a fulltext search engine based on inverted indices.
InvertLight itself does no content extraction, we have to use another framework for this purpose. It just provides the required data structures and allows you to query the fulltext index.
Everything is available under the Apache License, Version 2.0 .
public static void main(String[] args) {
// Initialize the index
InvertedIndex theIndex = new InvertedIndex();
UpdateIndexHandler theIndexHandler = new UpdateIndexHandler(theIndex);
Tokenizer theTokenizer = new Tokenizer(new ToLowercaseTokenHandler(theIndexHandler));
// Add some content
theTokenizer.process(new Document("doc1", "this is a test"));
// Token query
Result theResult1 = theIndex.query(new TokenQuery("this", "is"));
// Token Query with wildcards
Result theResult2 = theIndex.query(new TokenQuery("th*", "i?"));
// And perform a query for a sequence of token
Result theResult3 = theIndex.query(new TokenSequenceQuery("this", "is"));
// Sequence query with wildcards
Result theResult4 = theIndex.query(new TokenSequenceQuery("th*s", "i?"));
}
InvertLight supports the following query types:
Query type | Description |
---|---|
Token Query |
Searches for documents containing all tokens in any order. |
Token Sequence Query |
Searches for documents containing all tokens in exact order |
InvertLight supports wildcards as query tokens. The following wildcards are supported:
Wildcard | Meaning |
---|---|
? |
Any character |
* |
Many characters |
Examples:
-
"*at" matches "that", "hat", "damnrat" and so son
-
"th?s" matches "this"
InvertLight supports suggestion based on Token Sequences. Here is an example:
InvertedIndex theIndex = new InvertedIndex();
UpdateIndexHandler theIndexHandler = new UpdateIndexHandler(theIndex);
Tokenizer theTokenizer = new Tokenizer(theIndexHandler);
theTokenizer.process(new Document("doc1", "this is a test"));
theTokenizer.process(new Document("doc2", "this is another test"));
theTokenizer.process(new Document("doc3", "welcome home"));
theTokenizer.process(new Document("doc4", "nothing here"));
theTokenizer.process(new Document("doc4", "nothing already"));
theTokenizer.process(new Document("doc4", "nothing already there"));
Suggester theSuggester = new TokenSequenceSuggester("nothing");
SuggestResult theResult = theIndex.suggest(theSuggester);
assertEquals(2, theResult.getSuggestions().length);
assertEquals("already", theResult.getSuggestions()[0]);
assertEquals("here", theResult.getSuggestions()[1]);