New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
token_count type : add an option to count tokens (and not positions) #23227
Comments
This limitation is documented:
|
@nik9000 how feasible would it be to change this behaviour? |
Maybe we could add a |
I'm not sure! It has been years since I looked at the APIs this is built on. |
Is it possible to have an option so that :
By this way, we are sure that token_count equals the analyzer tokens count |
@fbaligand that's the current behavior except that we count the number of positions created by the analyzer chain. The |
@jimczi |
Yeah, like |
That would be perfect ! |
Hi @clintongormley, @jpountz, @nik9000, @jimczi I just submitted a PR (#24175) which adds the new boolean option "enable_position_increments", as described by @nik9000. |
Add option "enable_position_increments" with default value true. If option is set to false, indexed value is the number of tokens (not position increments count)
* master: (61 commits) Build: Move plugin cli and tests to distribution tool (elastic#24220) Peer Recovery: remove maxUnsafeAutoIdTimestamp hand off (elastic#24243) Adds version 5.3.2 and backwards compatibility indices for 5.3.1 Add utility method to parse named XContent objects with typed prefix (elastic#24240) MultiBucketsAggregation.Bucket should not extend Writeable (elastic#24216) Don't expose cleaned-up tasks as pending in PrioritizedEsThreadPoolExecutor (elastic#24237) Adds declareNamedObjects methods to ConstructingObjectParser (elastic#24219) ESIntegTestCase.indexRandom should not introduce types. (elastic#24202) Tests: Extend InternalStatsTests (elastic#24212) IndicesQueryCache should delegate the scorerSupplier method. (elastic#24209) Speed up parsing of large `terms` queries. (elastic#24210) [TEST] make sure that the random query_string query generator defines a default_field or a list of fields token_count type : add an option to count tokens (fix elastic#23227) (elastic#24175) Query string default field (elastic#24214) Make Aggregations an abstract class rather than an interface (elastic#24184) [TEST] ensure expected sequence no and version are set when index/delete engine operation has a document failure Extract batch executor out of cluster service (elastic#24102) Add 5.3.1 to bwc versions Added "release-state" support to plugin docs Added examples to cross cluster search of using cluster settings ...
Add option "enable_position_increments" with default value true. If option is set to false, indexed value is the number of positions (position increments are not counted)
Add option "enable_position_increments" with default value true. If option is set to false, indexed value is the number of positions (position increments are not counted)
Elasticsearch version: 2.4.2 and 5.2.1
Description of the problem including expected versus actual behavior:
Currently, if I have a "token_count" field, based on an analyzer containing a stop filter, the indexed "token_count" field counts stop words.
It would be great to have an option to only count analysis result tokens.
Steps to reproduce:
The text was updated successfully, but these errors were encountered: