Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Putting an empty separatorTokens in the settings shouldn't make Meilisearch crash #4574

Closed
ManyTheFish opened this issue Apr 16, 2024 · 1 comment
Assignees
Labels
bug Something isn't working as expected v1.8.0 PRs/issues solved in v1.8.0 released on 2024-05-06
Milestone

Comments

@ManyTheFish
Copy link
Member

Describe the bug
Putting an empty separatorTokens in the settings shouldn't make Meilisearch crash

To Reproduce
Steps to reproduce the behavior:

  1. Create an index with documents containing Unicode characters like [{"id": 1, "title": "台北市中正區"}]
  2. Change the settings by putting an empty separatorTokens like {"separatorTokens": [""]}
  3. Meilisearch crashes when trying to reindex the documents
2024-04-16T08:37:41.936003Z ERROR meilisearch: info=panicked at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/charabia-0.8.8/src/segmenter/mod.rs:228:29: byte index 1 is not a char boundary; it is inside '台' (bytes 0..3) of `台北市中正區`

Expected behavior
Meilisearch should not crash and at least ignore the invalid separator tokens by filtering them when updating the settings.

Meilisearch version:
v1.7.- and v1.8.0-rc.0

Additional context
related to https://github.com/meilisearch/meilisearch-support/issues/156

@ManyTheFish ManyTheFish added the bug Something isn't working as expected label Apr 16, 2024
@ManyTheFish ManyTheFish added this to the v1.8.0 milestone Apr 16, 2024
@ManyTheFish ManyTheFish self-assigned this Apr 16, 2024
meili-bors bot added a commit to meilisearch/charabia that referenced this issue Apr 17, 2024
281: Fix char boundary panic r=Kerollmops a=ManyTheFish

Filter empty tokens before inserting them in the AhoCorasick automaton, avoiding a char boundary panic

## Related issue
Fixes partially meilisearch/meilisearch#4574


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
meili-bors bot added a commit that referenced this issue Apr 18, 2024
4583: Update charabia v0.8.9 r=irevoire a=ManyTheFish

# Pull Request
- Update Charabia v0.8.9
- Add the optional feature flag activating pinyin normalization

## Related issue
Fixes  #4574


Co-authored-by: ManyTheFish <many@meilisearch.com>
@curquiza
Copy link
Member

Fixed by #4583

@meili-bot meili-bot added the v1.8.0 PRs/issues solved in v1.8.0 released on 2024-05-06 label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected v1.8.0 PRs/issues solved in v1.8.0 released on 2024-05-06
Projects
None yet
Development

No branches or pull requests

3 participants