Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TokenStream expanded to 512 finite strings. Only <= 256 finite strings are supported #10192

Closed
maximTarleckiy opened this issue Mar 20, 2015 · 3 comments

Comments

@maximTarleckiy
Copy link

Have this exception during indexation documents for suggest completion:

    suggest_top_products:
        client: default
        index_name: my_index_suggest_top_products
        use_alias: true
        settings:
            number_of_shards:   6
            number_of_replicas: 0
            index:
                analysis:
                    analyzer:
                        completion:
                            type: custom
                            tokenizer: whitespace
                            filter: [short_word_delimiter, lowercase, asciifolding, apostrophe, op_synonyms, snow_english]
                            char_filter: html_strip

                    filter:
                        snow_english:
                            type: snowball
                            language: English

                        short_word_delimiter:
                            type: word_delimiter
                            generate_word_parts: true
                            generate_number_parts: true
                            catenate_words: true
                            catenate_numbers: true
                            catenate_all: true
                            split_on_case_change: true
                            preserve_original: true

                        op_synonyms:
                            type: synonym
                            ignore_case: true
                            expand: true
                            synonyms:
                              - "riflescopes, rifle scopes"
                              - "riflescope, rifle scope"

        types:
            auto_suggest:
                mappings:
                    text: {type: string, index: not_analyzed}
                    top_products:
                        type: completion
                        index_analyzer: completion
                        search_analyzer: completion
                        payloads: true
                        preserve_position_increments: false
                        preserve_separators: false

Document is:
{"text":"Sure-Fire Z59 Click-On Lock-Out Tailcap for C2 / C3 / D2 / D3 / G3 / M2 / M3 / M3T / M4 Flashlights","top_products":{"input":["Sure-Fire Z59 Click-On Lock-Out Tailcap for C2 C3 D2 D3 G3 M2 M3 M3T M4 Flashlights","C2 C3 D2 D3 G3 M2 M3 M3T M4 Flashlights"],"output":"Sure-Fire Z59 Click-On Lock-Out Tailcap for C2 / C3 / D2 / D3 / G3 / M2 / M3 / M3T / M4 Flashlights","weight":"144","payload":{"type":"top_product","product_id":"32378","weight":"144","primary_image":"opplanet-surefire-tailcaps-switches-z59","rating":"0.00","review_count":"0","url":"surefire-tailcaps-switches-z59"}}}

@clintongormley
Copy link
Contributor

Duplicate of #9466. Closing in favour of #8909

@missinglink
Copy link
Contributor

Sorry, late to the party on this one, I think you'll find that this is not a duplicate of #9466 but instead related to how your TokenStream expands synonyms.

For example, if you have a string which has ~26 tokens and they are expanded x10 times by synonym expansion or another method you get to a point where you have over 256 finite strings and 💥

We have also encountered this issue in pelias/pelias#33 and you should be able to mask/fix the error by increasing your max_token_length setting for the relevant analyzer.

@maximTarleckiy
Copy link
Author

clintongormley
How it can be related to #9466? I have not context here.
Also I have this error with context, but I indexed context < 256 value per document (max 111)!
But context have two keys:
context:
category_id:
type: category
default: "default"
department_id:
type: category
default: "default"
category_id has 111 values
department_id has 10 values

missinglink
If I ever remove 'op_synonyms' from token filter this error still appears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants