You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.
I added a test at the C++ level in the issue_22 branch (SHA: 39f08dd) that shows it correctly detecting Japanese given that text. The output in this issue shows code[3] as 11, which should reflect the number of bytes read. That would imply it's only processing 1/15 HR Div, up to the first .. My Python is really rusty but adding a test at the Python level fails (SHA: fbab0a8). I'll have to spend some more time with it to figure out what's up at the Python level, unless some Python user can lend a hand.
I changed the Python test (SHA: a31a815) to call cld.detect with isPlainText=True and it passes. Can you try passing that argument and see if it works as expected?
CLD is unable to detect Japanese language for the following Text
text = '1/15 HR Div.Q&CS Dept.全体MTG 開催
1月15日(水)、赤溜オーディトリアムにてHR Div.Q&CS Dept.の全体MTGが開催されました。
アジェンダは以下のとおりです。
・Q&CSってそもそも何のための組織だっけ?:夏目通伸さん
・竹市さんより:竹市栄治さん
・製品顧客横断的な動きについて:伊藤秀也さん
・@SUPPORT案件管理について:渡部裕さん
その中から、今回は夏目通伸さんからのお話についてご紹介します。
2014年初めての全体MTGにて、「Q&CSってそもそも何のための組織だっけ?」というタイトルのもと、Q&CSが組織としてやろうとしていること、やるべきことを話されました。 '
Steps to reproduce -->
import cld
code = cld.detect(smart_str(text), pickSummaryLanguage=True, removeWeakMatches=False)
Output
code = ('ENGLISH', 'en', True, 11, [('ENGLISH', 'en', 100, 0.8103727714748784)])
The text contains Japanese text also. but it is not been detected
The text was updated successfully, but these errors were encountered: