Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use analyzer set on the field #8943

Closed
wants to merge 1 commit into from
Closed

Conversation

masaruh
Copy link
Contributor

@masaruh masaruh commented Dec 14, 2014

Make highlighter use analyzer set on the field.

Closes #8757

alexksikes pushed a commit to alexksikes/elasticsearch that referenced this pull request Dec 15, 2014
This change adds a systemd service configuration file, and adds systemd logic
to installation and de-installation scripts. The upcoming Debian 8 "Jessie"
release will use systemd.

fixes elastic#8943

Signed-off-by: Thilo Fromm <github@thilo-fromm.de>
@clintongormley clintongormley added review :Search/Highlighting How a query matched a document labels Dec 15, 2014
vkroz pushed a commit to loggly/elasticsearch that referenced this pull request Dec 18, 2014
This change adds a systemd service configuration file, and adds systemd logic
to installation and de-installation scripts. The upcoming Debian 8 "Jessie"
release will use systemd.

fixes elastic#8943

Signed-off-by: Thilo Fromm <github@thilo-fromm.de>
@rjernst
Copy link
Member

rjernst commented Dec 22, 2014

I don't think we should do this. _analyzer should be removed altogether with 2.0. Setting a different analyzer per document doesn't really work except in extreme cases, and that edge case can be worked around by having multiple fields and setting a query over those multiple fields.

@javanna
Copy link
Member

javanna commented Dec 23, 2014

I agree with @rjernst.

Also, I wonder if this PR relates to #5497 and #6267 somehow. What confuses me is that the issue is around highlighting but the fix is not isolated to highlighting, but in generic mappings code.

@masaruh
Copy link
Contributor Author

masaruh commented Dec 24, 2014

Fair enough. If _analyzer is going away, we wouldn't need to worry about it.
(I agree that using multiple fields is usually better than analyzer per document)

@masaruh masaruh closed this Dec 24, 2014
@dakrone
Copy link
Member

dakrone commented Dec 24, 2014

Re-opening, _analyzer is still used by many people that need to support more than a few languages.

An example is a company that runs language analysis and would like to use all of the 33 language-specific analyzers that ES provides. Duplicating the entire type 33 times is not a good solution, neither is re-analyzing each field 32 additional times for a multi-field with each language. Using _analyzer with path is helpful in this narrow case (when the language being searched is known in advance also).

While _analyzer is not the best choice for this (ngrams would be much better) it's still used and should not be removed.

@dakrone dakrone reopened this Dec 24, 2014
@rjernst
Copy link
Member

rjernst commented Dec 29, 2014

I do not think because a feature is used by very few that it should be precluded from the possibility of removal. In this case, I think this feature is trappy. With your example, this would mean these 33 languages are indexed into the same field, but with different analyzers (ie the purpose of _analyzer). But how is the field then queried? It would have to either choose a specific language's analyzer (in which case it is no different than having a separate field for each language) or have some frankenstein analyzer that is "good" for most languages, in which case there could be terms which might never be found.

Having 33 types, one for each language, may seem like a lot, but this is an extreme edge case, and I don't see any problem with edge cases needing to do more work in configuration. However, I also think this can be accomplished by having different fields for each language, and selecting all these fields to query over like in the simple query parser's fields list.

@javanna
Copy link
Member

javanna commented May 1, 2015

We removed the per document _analyzer on master, so this commit won't go there anymore. In 1.x, we still support it though, and we did add support for it to highlighting with #6267. Seems that the highlighting support was incomplete though given this other PR. As far as I understand we only support it when specified through a path in the mapping? @masaruh can you confirm?

If we do support this in 1.x but it's buggy I think we should fix it by getting this in 1.x. @clintongormley thoughts?

@clintongormley
Copy link

@javanna it seems like a small enough fix

@masaruh
Copy link
Contributor Author

masaruh commented May 4, 2015

@javanna I should have put more descriptive comment...
The issue is that highlight doesn't work when mapping has a field with an analyzer specified and _analyzer. In this case, we should use analyzer specified on the field for highlighting. But currently, we use analyzer specified in a field of _analyzer path.
So, yes, highlighting support was incomplete (it works when analyzer isn't set on a field).

With this fix, it first look for analyzer set on the field. If not found, try _analyzer's path. If not found or _analyzer's path is null, try analyzer set on type level.

.get();
assertHighlight(response, 0, "text", 0, 1, equalTo("Look at me, I'm eating <1>cars</1>."));
assertHighlight(response, 0, "field2", 0, 1, equalTo("Look at me, I'm eating <1>cars</1>."));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure that this test fails without your fix? Seems like it still use the analyzer path, which used to work before too. I think we should write a different test for _analyzer without path defined in the mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it fails.

Since field2 uses standard analyzer, cars is analyzed as cars. Without this fix, query string "cars" is analyzed by standard analyzer (set on the field) and field's value "Look at me, I'm eating cars." is analyzed by "language_analyzer" (specified by _analzyer) during highlighting. The former becomes cars and the latter becomes car. So, highlight doesn't work.

Yes, this test is a bit confusing... but it may be because of it's nature. We see this issue when _analyzer path is set AND analyzer is set on the field. But you are right. better to write separate test for it. Will push another commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks for explaining.

@javanna javanna self-assigned this May 4, 2015
@javanna
Copy link
Member

javanna commented May 4, 2015

Thanks for the explanation @masaruh . I reviewed this and left one comment around testing.

Make highlighter use analyzer set on the field.

Highlight with PlainHighlighter doesn't work on a field with an analyzer specified
when a document has document level analyzer specified by _analyzer.

With this fix, it first look for analyzer set on the field. If not found, try _analyzer's path.
If not found or _analyzer's path is null, try analyzer set on type level.

Closes elastic#8757
@masaruh
Copy link
Contributor Author

masaruh commented May 4, 2015

@javanna thanks for the review. Made a separate test with comment. Hopefully, it's less confusing.

@javanna
Copy link
Member

javanna commented May 5, 2015

LGTM thanks @masaruh

@masaruh
Copy link
Contributor Author

masaruh commented May 7, 2015

Thanks @javanna. Pushed to 1.5 and 1.x branch.

@masaruh masaruh closed this May 7, 2015
@kevinkluge kevinkluge removed the review label May 7, 2015
@masaruh masaruh deleted the fix/8757 branch May 14, 2015 00:23
@clintongormley clintongormley changed the title Highlight: Use analyzer set on the field Use analyzer set on the field May 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with higlighting and analyzed tokens
6 participants