Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

碼表中相同code的字初始權重沒有按照預期工作 #561

Closed
2 of 6 tasks
Eliot00 opened this issue Jul 3, 2022 · 5 comments
Closed
2 of 6 tasks

碼表中相同code的字初始權重沒有按照預期工作 #561

Eliot00 opened this issue Jul 3, 2022 · 5 comments

Comments

@Eliot00
Copy link

Eliot00 commented Jul 3, 2022

Describe the bug

rime在部署時,是如何初始化一個字的權重呢?舉個例子,我的方案裏有:

# Rime dict
# encoding: utf-8

---
name: holo.abbreviation
version: "0.3.1"
use_preset_vocabulary: false
sort: by_weight
columns:
  - text
  - code
  - weight
...

# 簡碼
向	pa	99999
	pa	0
	pa	0

# 全碼
向	pao	999

部署過後,總是 這兩個我設置權重爲0的字排在前面,而且我設置了允許用戶字典,多次打字pa -> 向,仍然無法將向的優先級提高。

To Reproduce
Steps to reproduce the bug:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See bug

Expected behavior

默認按照我設置的權重排序,向排在生僻字前面

Log
If applicable, add crash log to quick focus your problem.

Screenshots
If applicable, add screenshots to help explain your problem.

Flavor(please complete the following information):
Select your flavor:

  • ibus-rime
  • fcitx-rime
  • fcitx5-rime(librime最新的1.7.3)
  • Squirrel(最新版0.15.2)
  • Trime
  • Weasel

Package:

  • OS: [e.g. openSUSE Tumbleweed]
  • Version: [e.g. 5.0.7]
  • URI: [e.g. https://software.opensuse.org/package/fcitx5-rime?search_term=fcitx5-rime]

Additional context
Add any other context about the problem here.

@Eliot00 Eliot00 added the bug label Jul 3, 2022
@lotem lotem added irreproducible and removed bug labels Jul 5, 2022
@lotem
Copy link
Member

lotem commented Jul 5, 2022

用樓主提供的代碼無法重現問題。
*.dict.yaml 的字頻只決定初始排序。用戶詞典是另計的,輸入對應編碼的字詞後,按照用戶詞典記錄的輸入習慣排序。

@lotem lotem closed this as completed Jul 5, 2022
@Eliot00
Copy link
Author

Eliot00 commented Jul 5, 2022

@lotem 現在又發現一個例子:

定	AL_宀疋	999
𠕸	AL_冖夂	0

我的碼表中只有這兩個字碼是al,因爲現在rime無法在碼表中定義comment列,所以我對碼表做了一些處理:

speller:
  alphabet: zyxwvutsrqponmlkjihgfedcba
  delimiter: " '"
  algebra:
    - xform/\.//
    - derive/^([a-zA-Z]+)_.+$/$1/
    - xlit/QWERTYOPASDFGHJKLZXCVBNM/qwertyopasdfghjklzxcvbnm/

translator:
  dictionary: holo
  prism: holo
  enable_completion: true
  enable_encoder: true
  enable_user_dict: true
  comment_format:
    - xform/^.+_(.+)$/〔$1〕/

現在初始排序𠕸永遠在weight爲999的定前面,並且不論輸入多少次,都無法改變順序

@Eliot00
Copy link
Author

Eliot00 commented Jul 5, 2022

schema文件在這裏,我的碼表中有一部分是簡碼,沒有拆分註解,一部分則是像這樣定 AL_宀疋 999這樣的全碼碼表,在方案配置中通過正則取到comment,目前發現有問題的都是全碼部分

@Ace-Who
Copy link
Contributor

Ace-Who commented Jul 11, 2022

实际码表与这里提供的信息不一致,编码混合了大写和小写字母,应该是被当作不同编码,而权重仅作用于相同编码。可以将简码一律改为大写字母。大写字母的字典序先于小写字母。

@Eliot00
Copy link
Author

Eliot00 commented Jul 17, 2022

实际码表与这里提供的信息不一致,编码混合了大写和小写字母,应该是被当作不同编码,而权重仅作用于相同编码。可以将简码一律改为大写字母。大写字母的字典序先于小写字母。

經過測試,將簡碼改成大寫,默認排序就會按照權重來了。謝謝

PS:其實碼表中的大寫表示的是字根碼,本質是註釋,用戶輸入還是小寫,如果issue#538實現了,就可以把碼表全部改成小寫,其餘信息放到comment列裏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants