Termvectors #158

Merged
merged 8 commits into from Apr 26, 2012

Conversation

Projects
None yet
2 participants
Contributor

nesteffe commented Apr 24, 2012

Added fast vector highlighter plugin with parameters to configure it's use.

highlights<int, default 0> number of highlight matches to return.
highlight_length<int, default & min=18> number of characters to include in a highlight line.

Also added parameters to control if term vectors or fields are included in the result set. I am not sure if this is needed or not, but it made debugging easier :)

include_termvectors<boolean, default false> include term vectors in results.
include_fields<boolean, default false> include fields in results.

@rnewson rnewson commented on an outdated diff Apr 24, 2012

@@ -73,6 +73,11 @@
<version>${tika-version}</version>
</dependency>
<dependency>
+ <groupId>org.apache.lucene</groupId>
+ <artifactId>lucene-fast-vector-highlighter</artifactId>
+ <version>3.0.3</version>
@rnewson

rnewson Apr 24, 2012

Owner

I believe the vector highlighter has moved to the contrib-highlighter package (which is available in 3.6.0). I would require we use the one that matches the other lucene dependencies.

Contributor

nesteffe commented Apr 25, 2012

I have updated the dependency. Is this the package you are referring to?

Owner

rnewson commented Apr 25, 2012

that's the one. Did you try it out?

Contributor

nesteffe commented Apr 25, 2012

Apparently I didn't restart lucene when I tested it... my bad. Give me a little while and I will debug it.

Owner

rnewson commented Apr 25, 2012

Looks good to me. Would you also update the README.md to reflect the new options? Then I can merge this puppy!

Owner

rnewson commented Apr 25, 2012

Oh, and pop your name in the THANKS.md file too!

Contributor

nesteffe commented Apr 25, 2012

Weird, I thought I had a bug but after rebuilding everything worked fine. I am pretty much done with the documentation updates but I was wondering about the include_storedfields parameter I added (previously include_fields until I realized I had duplicated that name). This constitutes a change from the current default behavior of including stored fields if they exist. Should I find a way to change this behavior or should this just be noted in the documentation?

Owner

rnewson commented Apr 25, 2012

Ooh, glad you mentioned that. The include_storedfields thing should be in a separate pull request.

Contributor

nesteffe commented Apr 25, 2012

OK, that is updated. I will create a new PR for that option.

@rnewson rnewson pushed a commit that referenced this pull request Apr 26, 2012

Robert Newson Merge pull request #158 from nesteffe/termvectors
Termvectors
0a03bd3

@rnewson rnewson merged commit 0a03bd3 into rnewson:master Apr 26, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment