Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force japanese v1.1.0 #3588

Closed
wants to merge 20 commits into from
Closed

Conversation

ManyTheFish
Copy link
Member

@ManyTheFish ManyTheFish commented Mar 13, 2023

Pull Request

⚠️ this PR is not meant to be merged.

This PR deactivates Chinese tokenization forcing Japanese tokenization to be used by Meilisearch.
It's a hotfix provided to Japanese users allowing them to use Meilisearch without having language detection issues before a final fix is released.

How to use this prototype?

Docker images

Meilisearch v1.1.1 (latest):

$ docker pull getmeili/meilisearch:prototype-japanese-2

Meilisearch v1.1.0-rc.3:

$ docker pull getmeili/meilisearch:prototype-japanese-1

Meilisearch v1.1.0-rc.1:

$ docker pull getmeili/meilisearch:prototype-japanese-0

Related issues and discussions

@ManyTheFish ManyTheFish changed the base branch from main to release-v1.1.0 March 13, 2023 14:09
@github-actions
Copy link

github-actions bot commented Mar 13, 2023

Uffizzi Ephemeral Environment deployment-20826

☁️ https://app.uffizzi.com/github.com/meilisearch/meilisearch/pull/3588

📄 View Application Logs etc.

The meilisearch preview environment contains a web terminal from where you can run the
meilisearch command. You should be able to access this instance of meilisearch running in
the preview from the link Meilisearch Endpoint link given below.

Web Terminal Endpoint :
Meilisearch Endpoint : /meilisearch

@ManyTheFish ManyTheFish force-pushed the force-japanese-v1.1.0 branch 3 times, most recently from cdf1a01 to 8bcfbee Compare March 13, 2023 16:25
dureuill and others added 15 commits April 12, 2023 10:53
3667: Disable autobatching of additions and deletions r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #3664

## What does this PR do?
- Modifies the autobatcher to not batch document additions and deletions, as a workaround to the DB corruption in #3664 



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3659: stops receiving tasks once the task queue is full r=Kerollmops a=irevoire

Give 20GiB to the task queue + once 50% of the task queue is used, it blocks itself and only receives task deletion requests to ensure we never get in a state where we can’t do anything.

Also, create a new error message when we reach this case:
```
Meilisearch cannot receive write operations because the size limit of the tasks database has been reached. Please delete tasks to continue performing write operations.
```

Co-authored-by: Tamo <tamo@meilisearch.com>
3672: Update version for the next release (v1.1.1) in Cargo.toml r=dureuill a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: dureuill <dureuill@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
3673: Handle the task queue being full r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes a remaining issue with #3659 where it was not always possible to send tasks back even after deleting some tasks when prompted.

## Tests

- see integration test
- also manually tested with a 1MiB task queue. Was not possible to become unblocked before this PR, is now possible.

## What does this PR do?
- Use the `non_free_pages_size` method to compute the space occupied by the task db instead of the `real_disk_size` which is not always affected by task deletion.
- Expand the test so that it adds a task after the deletion. The test now fails before this PR and succeeds after this PR.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
@miiton
Copy link

miiton commented Apr 30, 2023

@ManyTheFish

It seems that a panic occurs when trying to index documents containing "ッー". This issue has occurred in both prototype-japanese-0 and prototype-japanese-1. Since it did not happen in v1.0.2 and v1.1.1, I decided to write about it here.

image

panic message

thread 'indexing-thread:1' panicked at 'could not find kana 'っ' in TO_ROMAJI map', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wana_kana-2.1.2/src/utils/katakana_to_hiragana.rs:62:36

Related: PSeitz/wana_kana_rust#13 (Reported by @kounoike)

@ManyTheFish
Copy link
Member Author

@miiton, Thank you for your report, I'll fix it ASAP!

@ManyTheFish
Copy link
Member Author

Hello @miiton, A new prototype up-to-date with v1.1.1 has been released hot fixing the issue you pointed out before:

  • prototype-japanese-2

I updated the PR description in consequence.

@miiton
Copy link

miiton commented May 4, 2023

I thought it was a nice feature to be able to search for katakana words in hiragana (and vice versa), but this fix has disabled that once and for all, correct?
Is this change planned to be reverted once wana_kana is fixed?

prototype-japanese-1

curl ... -X POST localhost:7700/indexes/hogehoge/documents -d '{"id":1,"name":"パイナップル"}'
# {"taskUid":0,"indexUid":"hogehoge","status":"enqueued","type":"documentAdditionOrUpdate","enqueuedAt":"2023-05-04T23:28:15.037778125Z"}

curl ... -X POST localhost:7700/indexes/hogehoge/search -d '{"q":"パイナップル"}'
# {"hits":[{"id":1,"name":"パイナップル"}],"query":"パイナップル","processingTimeMs":2,"limit":20,"offset":0,"estimatedTotalHits":1}

curl ... -X POST localhost:7700/indexes/hogehoge/search -d '{"q":"ぱいなっぷる"}'
# {"hits":[{"id":1,"name":"パイナップル"}],"query":"ぱいなっぷる","processingTimeMs":0,"limit":20,"offset":0,"estimatedTotalHits":1} -- hit

prototype-japanese-2

curl ... -X POST localhost:7700/indexes/hogehoge/documents -d '{"id":1,"name":"パイナップル"}'
# {"taskUid":0,"indexUid":"hogehoge","status":"enqueued","type":"documentAdditionOrUpdate","enqueuedAt":"2023-05-04T23:28:15.037778125Z"}

curl ... -X POST localhost:7700/indexes/hogehoge/search -d '{"q":"ぱいなっぷる"}'
# {"hits":[],"query":"ぱいなっぷる","processingTimeMs":0,"limit":20,"offset":0,"estimatedTotalHits":0}  -- not hit

@ManyTheFish
Copy link
Member Author

ManyTheFish commented May 11, 2023

Hello @miiton,
I'm pleased that you liked the Kana conversion tried in this prototype, and yes, if you feel that it's a good feature for the Japanese Language support, then I will reintegrate it after the fix!
Moreover, when we find a convenient way to handle Kanji texts properly, then we will consider integrating this feature in the official Meilisearch versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants