InvertLight

Overview

InvertLight is a simple implementation of a fulltext search engine based on inverted indices.

InvertLight itself does no content extraction, we have to use another framework for this purpose. It just provides the required data structures and allows you to query the fulltext index.

Everything is available under the Apache License, Version 2.0 .

API Usage

Initialize and query an inverted index

public static void main(String[] args) {
    // Initialize the index
    InvertedIndex theIndex = new InvertedIndex();
    UpdateIndexHandler theIndexHandler = new UpdateIndexHandler(theIndex);
    Tokenizer theTokenizer = new Tokenizer(new ToLowercaseTokenHandler(theIndexHandler));

    // Add some content
    theTokenizer.process(new Document("doc1", "this is a test"));

    // Token query
    Result theResult1 = theIndex.query(new TokenQuery("this", "is"));

    // Token Query with wildcards
    Result theResult2 = theIndex.query(new TokenQuery("th*", "i?"));

    // And perform a query for a sequence of token
    Result theResult3 = theIndex.query(new TokenSequenceQuery("this", "is"));

    // Sequence query with wildcards
    Result theResult4 = theIndex.query(new TokenSequenceQuery("th*s", "i?"));
}

Query types

InvertLight supports the following query types:

Query type	Description
Token Query	Searches for documents containing all tokens in any order.
Token Sequence Query	Searches for documents containing all tokens in exact order

Wildcards

InvertLight supports wildcards as query tokens. The following wildcards are supported:

Wildcard	Meaning
?	Any character
*	Many characters

Examples:

"*at" matches "that", "hat", "damnrat" and so son
"th?s" matches "this"

Suggestion queries

InvertLight supports suggestion based on Token Sequences. Here is an example:

InvertedIndex theIndex = new InvertedIndex();
UpdateIndexHandler theIndexHandler = new UpdateIndexHandler(theIndex);
Tokenizer theTokenizer = new Tokenizer(theIndexHandler);

theTokenizer.process(new Document("doc1", "this is a test"));
theTokenizer.process(new Document("doc2", "this is another test"));
theTokenizer.process(new Document("doc3", "welcome home"));
theTokenizer.process(new Document("doc4", "nothing here"));
theTokenizer.process(new Document("doc4", "nothing already"));
theTokenizer.process(new Document("doc4", "nothing already there"));

Suggester theSuggester = new TokenSequenceSuggester("nothing");
SuggestResult theResult = theIndex.suggest(theSuggester);

assertEquals(2, theResult.getSuggestions().length);
assertEquals("already", theResult.getSuggestions()[0]);
assertEquals("here", theResult.getSuggestions()[1]);

Performance Metrics (Beta)

Number of Documents	Size on Disk(plain text)	Number of Tokens	Java Heap Size	Phrase Query time
846	73 MB	383477	86 MB	960ms / 100000 Queries

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
core		core
ui		ui
README.adoc		README.adoc
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core

core

ui

ui

README.adoc

README.adoc

pom.xml

pom.xml

Repository files navigation

InvertLight

Overview

API Usage

Initialize and query an inverted index

Query types

Wildcards

Suggestion queries

Performance Metrics (Beta)

About

Releases

Packages

Languages

mirkosertic/InvertLight

Folders and files

Latest commit

History

Repository files navigation

InvertLight

Overview

API Usage

Initialize and query an inverted index

Query types

Wildcards

Suggestion queries

Performance Metrics (Beta)

About

Resources

Stars

Watchers

Forks

Languages