Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix variant characters #19

Merged
merged 2 commits into from
Mar 1, 2020
Merged

Fix variant characters #19

merged 2 commits into from
Mar 1, 2020

Conversation

sgalal
Copy link
Member

@sgalal sgalal commented Mar 1, 2020

有用户反映,詞庫中的有些詞用了異體字「説」(說)。本次更新將這些字改為 OpenCC 標準用字。

按《香港小學學習字詞表》用「説」。要使用香港標準,在菜單中切換為「香港繁體」即可。

本次更新檢查了以下異體字(左異右正):

偽僞,兑兌,勛勳,卧臥,吿告,啟啓,囱囪,媪媼,媯嬀,嫻嫺,宂冗,悦悅,愠慍,户戶,抬擡,捝挩,揾搵,敍敘,敚敓,枱檯,枴柺,棁梲,榅榲,氲氳,涚涗,温溫,溈潙,潀潨,濕溼,灶竈,為爲,煴熅,痴癡,痺痹,皂皁,眾衆,税稅,稜棱,粧妝,粽糉,緼縕,缽鉢,脱脫,腽膃,葱蔥,蒀蒕,蒍蔿,藴蘊,蜕蛻,説說,贋贗,輼轀,醖醞,鈎鉤,鋭銳,閲閱,韁繮,鰛鰮,鼈鱉

目前,詞庫中大部分為異體字的單字已經標註了 0%(禁止組詞),因此本次更新只涉及詞組,不涉及單字。

另外,詞庫中有些詞組原來正異兼收,因此此次更新除將異體字修正外,還做了去重工作。為便於檢查,這兩步分開進行,分兩次提交。

因「台」字的對應關係較為複雜,本次更新尚未修復「台」。

偽僞,兑兌,勛勳,卧臥,吿告,啟啓,囱囪,媪媼,媯嬀,嫻嫺,宂冗,悦悅,愠慍,户戶,抬擡,捝挩,揾搵,敍敘,敚敓,枱檯,枴柺,棁梲,榅榲,氲氳,涚涗,温溫,溈潙,潀潨,濕溼,灶竈,為爲,煴熅,痴癡,痺痹,皂皁,眾衆,税稅,稜棱,粧妝,粽糉,緼縕,缽鉢,脱脫,腽膃,葱蔥,蒀蒕,蒍蔿,藴蘊,蜕蛻,説說,贋贗,輼轀,醖醞,鈎鉤,鋭銳,閲閱,韁繮,鰛鰮,鼈鱉
with open('jyut6ping3.phrase.dict.yaml') as f, open('jyut6ping3.phrase.dict.yaml2', 'w') as f2:
    for line in f:
        if line != '...\n':
            f2.write(line)
        else:
            break
    f2.write('...\n')

    s = set()

    for line in f:
        if line == '\n' or line[0] == '#':
            f2.write(line)
        else:
            parts = line.rstrip().split('\t')
            if tuple(parts[:2]) not in s:
                f2.write(line)
                s.add(tuple(parts[:2]))
@sgalal sgalal merged commit ff7970f into master Mar 1, 2020
@sgalal sgalal deleted the variants branch March 1, 2020 08:33
@sgalal sgalal mentioned this pull request Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant