Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute term vectors on the fly if not stored in index #6567

Merged

Conversation

Projects
None yet
5 participants
@alexksikes
Copy link
Contributor

alexksikes commented Jun 19, 2014

...index.

Adds an option called generate to the term vectors and multi-termvectors API,
which generates the term vectors for some chosen fields, if they have not been
explicitely stored in the index.

Relates to #5184

@jpountz

View changes

docs/reference/docs/termvectors.asciidoc Outdated
=== Example with `generate` coming[1.3.0]

Additionally, for term vectors not stored in the index, they can be computed with
`generate` parameter set to `true`.

This comment has been minimized.

Copy link
@jpountz

jpountz Jun 19, 2014

Contributor

Why do we need an option? Can't we just generate them on all fields that don't have term vectors stored?

@jpountz

View changes

src/main/java/org/elasticsearch/index/termvectors/ShardTermVectorService.java Outdated
List<String> validFields = new ArrayList<>();
for (String field : request.selectedFields()) {
FieldMapper fieldMapper = indexShard.mapperService().smartNameFieldMapper(field);
if (fieldMapper == null || fieldMapper.fieldDataType().getType() != "string") {

This comment has been minimized.

Copy link
@jpountz

jpountz Jun 19, 2014

Contributor

Beware that strings need to be compared with .equals, != might return false even if both strings contain the same bytes

This comment has been minimized.

Copy link
@jpountz

jpountz Jun 19, 2014

Contributor

Maybe this should actually check for fieldMapper instanceof StringFieldMapper. I don't think term vectors make sense on other mappers?

This comment has been minimized.

Copy link
@jpountz

jpountz Jun 19, 2014

Contributor

Should we also check that the field is indexed and tokenized?

@jpountz

View changes

src/main/java/org/elasticsearch/index/termvectors/ShardTermVectorService.java Outdated
// TODO: support for fetchSourceContext?
GetResult getResult = indexShard.getService().get(
get, request.id(), request.type(), validFields.toArray(Strings.EMPTY_ARRAY), null);
get.release();

This comment has been minimized.

Copy link
@jpountz

jpountz Jun 19, 2014

Contributor

I think this should be called after the get is not used anymore, and in a finally block?

@jpountz

View changes

src/main/java/org/elasticsearch/index/termvectors/ShardTermVectorService.java Outdated
String field = getField.getName();
Analyzer analyzer = indexShard.mapperService().smartNameFieldMapper(field).indexAnalyzer();
if (analyzer == null) {
analyzer = indexShard.mapperService().searchAnalyzer();

This comment has been minimized.

Copy link
@jpountz

jpountz Jun 19, 2014

Contributor

It feels wrong to me to use the default search analyzer here. Under what circumstances can the analyzer be null at this point?

@jpountz

View changes

src/main/java/org/elasticsearch/index/termvectors/ShardTermVectorService.java Outdated
}

// Poached from Lucene ParallelAtomicReader
private final class ParallelFields extends Fields {

This comment has been minimized.

Copy link
@jpountz

jpountz Jun 19, 2014

Contributor

can it be static?

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jun 19, 2014

This looks good. I'm wondering if we actually need a generate option? Should we rather just generate when vectors are not stored without providing any option?

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

alexksikes commented Jun 19, 2014

Thanks for the throughout comments. I was wondering that too, perhaps because there could be a cost in performance in computing term vectors?

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jun 19, 2014

There would indeed. But I tend to think that the fewer options an API has, the better it is. :-)

@alexksikes alexksikes added review and removed review labels Jun 20, 2014

@areek areek assigned rmuir and unassigned rmuir Jun 23, 2014

@s1monw

View changes

src/main/java/org/elasticsearch/index/termvectors/ShardTermVectorService.java Outdated
}
}
// and read vectors from it
return ((AtomicReader) index.createSearcher().getIndexReader()).fields();

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 26, 2014

Contributor

you can just use MultiFileds.getFields(index.createSearcher().getIndexReader());

@s1monw

View changes

src/main/java/org/elasticsearch/index/termvectors/ShardTermVectorService.java Outdated
private Fields mergeFields(String[] fieldNames, Fields... fieldsObject) throws IOException {
ParallelFields parallelFields = new ParallelFields();
for (Fields fieldObject : fieldsObject) {
if (fieldObject == null) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 26, 2014

Contributor

if this is null something is wrong here no? I mean should we rather assert on non null?

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Jun 26, 2014

I left some small comments - looks good otherwise 👍

@s1monw s1monw removed the review label Jun 26, 2014

@alexksikes alexksikes added the review label Jun 30, 2014

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Jul 2, 2014

I really like the feature a lot but I think we need way more tests. I'd love to see boundary & error cases where TV do not exists but the field doesn't either etc. Some weird combinations and we shoudl make sure the right errors are coming back. I also like duel test. For instance you create two indices and enable TV on one but not on the other index random documents and fetch the TV API from both. They should be identical.

@s1monw s1monw removed the review label Jul 2, 2014

@s1monw s1monw added v1.4.0 and removed v1.3.0 labels Jul 14, 2014

@alexksikes alexksikes added the review label Jul 16, 2014

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Jul 17, 2014

LGTM

@s1monw s1monw removed the review label Jul 17, 2014

Term Vectors API: Computes term vectors on the fly if not stored in t…
…he index.

Adds the ability to the Term Vector API to generate term vectors for some
chosen fields, even though they haven't been explicitely stored in the index.

Relates to #5184
Closes #6567

@alexksikes alexksikes merged commit f22f3db into elastic:master Jul 17, 2014

alexksikes added a commit that referenced this pull request Jul 17, 2014

Term Vectors API: Computes term vectors on the fly if not stored in t…
…he index.

Adds the ability to the Term Vector API to generate term vectors for some
chosen fields, even though they haven't been explicitely stored in the index.

Relates to #5184
Closes #6567

@clintongormley clintongormley changed the title Term Vectors API: Computes term vectors on the fly if not stored in the ... Term Vectors: Computes term vectors on the fly if not stored in index Sep 8, 2014

@clintongormley clintongormley changed the title Term Vectors: Computes term vectors on the fly if not stored in index Compute term vectors on the fly if not stored in index Jun 6, 2015

@clintongormley clintongormley removed the >feature label Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.