Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to specify the analyzer used for each Field #6329

Closed

Conversation

Projects
None yet
3 participants
@alexksikes
Copy link
Contributor

commented May 28, 2014

When using multiple items, the user may want to specify which analyzer to use
for each field. Previously, either the analyzer specified by 'analyzer' would
be used for all the fields, or if not set, the analyzer associated with the
field would be chosen. This commit provides the ability to fine grain which
analyzer should be used for each field by providing a new 'fields_analyzer'
parameter to the More Like This Query.

More Like This Query: Adds the ability to specify the analyzer used f…
…or each field.

When using multiple items, the user may want to specify which analyzer to use
for each field. Previously, either the analyzer specified by 'analyzer' would
be used for all the fields, or if not set, the analyzer associated with the
field would be chosen. This commit provides the ability to fine grain which
analyzer should be used for each field by providing a new 'fields_analyzer'
parameter to the More Like This Query.
fieldsAnalyzer = Maps.newHashMap();
while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
String field = parseContext.indexName(parser.text());
parser.nextToken();

This comment has been minimized.

Copy link
@s1monw

s1monw May 28, 2014

Contributor

can we assert here that the the next token is actually a value?

if (fieldsAnalyzer.containsKey(fieldName)) {
mlt.setAnalyzer(fieldsAnalyzer.get(fieldName));
} else {
mlt.setAnalyzer(fieldsAnalyzer.get("__default__"));

This comment has been minimized.

Copy link
@s1monw

s1monw May 28, 2014

Contributor

if the field is not in that map should we just go and get the actual search analyzer for that field from the mapper?

This comment has been minimized.

Copy link
@s1monw

s1monw May 28, 2014

Contributor

oh I see I think we can just use the mltQuery.getAnalyzer() instead since it will resolve the fieldname. so you can just do

Analyzer a = fieldsAnalyzer.get(fieldName);
if (a == null) {
  a = mltQuery.getAnalyzer();
}
mlt.setAnalyzer(a);

This comment has been minimized.

Copy link
@alexksikes

alexksikes May 28, 2014

Author Contributor

Yes unless mltQuery.analyzer has actually been set. The question is what is the default analyzer, the one set at 'analyzer' or the default for the field?

}
mltQuery.setAnalyzer(analyzer);
if (fieldsAnalyzer != null) {

This comment has been minimized.

Copy link
@s1monw

s1monw May 28, 2014

Contributor

I don't think we should do __default__ we can pass the default separately...

alexksikes added some commits May 28, 2014

@alexksikes alexksikes added the review label May 28, 2014

alexksikes added some commits May 28, 2014

MoreLikeThisQuery mlt = new MoreLikeThisQuery();
mlt.setMoreLikeFields(new String[] {fieldName});
mlt.setLikeText(likeText);
mlt.setAnalyzer(mltQuery.getAnalyzer());
if (fieldAnalyzers != null) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2014

Contributor

so I only have a style problem here. IMO this is harder than it needs to be though... IMO we should require that fieldAnalyzers is non-null ie. use Collections.emptyMap() as a default. then we can just do this:

Analyzer analyzer = fieldAnalyzer.get(fieldName);
if (analyzer == null) {
  analyzer = defaultAnalyzer;
}
mlt.setAnalyzer(analyzer);

I think the mlt.setAnalyzer(mltQuery.getAnalyzer()); is obsolete then?

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jun 12, 2014

Author Contributor

This makes the analyzer of the fields for each item always default to the one associated with the field, regardless of the value of analyzer. Essentially this makes analyzer the analyzer of the like_text only. Previously we would be using analyzer for all fields if specified. So what should be the desired behavior?

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2014

Contributor

wait, today we can specify an analyzer that overrides everything. If it is set it can only be beaten by a fieldAnalyzer associated with a field. If the analyzer is not set we use the default analyzer. IMO what I suggested is equivalent with what you had? do I miss something?

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jun 12, 2014

Author Contributor

if we pass the var analyzer to addMoreLikeThis instead of DefaultAnalyzer then it is almost right, only that I wanted to have that if field_analyzers is specified (even if empty) then the default analyzer is always overridden by the one associated with the field.

This comment has been minimized.

Copy link
@s1monw

s1monw Jun 12, 2014

Contributor

I don't get it sorry :)

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jun 12, 2014

Author Contributor

Suppose the user specifies analyzer and some fields in field_analyzers. What should be the default analyzer associated to the unspecified fields in field_analyzers? Should it be the default for the field or the one specified by analyzer? Maybe @clintongormley has some ideas.

@s1monw s1monw removed the review label Jun 12, 2014

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2014

left a small comment - it's close

@alexksikes alexksikes added the review label Jun 12, 2014

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2014

LGTM

@s1monw s1monw removed the review label Jun 18, 2014

@s1monw s1monw added v1.4.0 and removed v1.3.0 labels Jul 14, 2014

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2014

This should be integrated to the term vector APIs? Closing for now.

@alexksikes alexksikes closed this Aug 23, 2014

@clintongormley clintongormley changed the title More Like This Query: Adds the ability to specify the analyzer used for each Field Add the ability to specify the analyzer used for each Field Jun 6, 2015

@clintongormley clintongormley removed the >feature label Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.