Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

CLD unable to detect japanese language #22

Closed
leovivek opened this issue Jan 24, 2014 · 3 comments
Closed

CLD unable to detect japanese language #22

leovivek opened this issue Jan 24, 2014 · 3 comments

Comments

@leovivek
Copy link

CLD is unable to detect Japanese language for the following Text

text = '1/15 HR Div.Q&CS Dept.全体MTG 開催

1月15日(水)、赤溜オーディトリアムにてHR Div.Q&CS Dept.の全体MTGが開催されました。
アジェンダは以下のとおりです。
・Q&CSってそもそも何のための組織だっけ?:夏目通伸さん
・竹市さんより:竹市栄治さん
・製品顧客横断的な動きについて:伊藤秀也さん
・@SUPPORT案件管理について:渡部裕さん

その中から、今回は夏目通伸さんからのお話についてご紹介します。
2014年初めての全体MTGにて、「Q&CSってそもそも何のための組織だっけ?」というタイトルのもと、Q&CSが組織としてやろうとしていること、やるべきことを話されました。 '

Steps to reproduce -->

  1. import cld

  2. code = cld.detect(smart_str(text), pickSummaryLanguage=True, removeWeakMatches=False)

  3. Output
    code = ('ENGLISH', 'en', True, 11, [('ENGLISH', 'en', 100, 0.8103727714748784)])

The text contains Japanese text also. but it is not been detected

mzsanford added a commit that referenced this issue Jan 27, 2014
@mzsanford
Copy link
Owner

I added a test at the C++ level in the issue_22 branch (SHA: 39f08dd) that shows it correctly detecting Japanese given that text. The output in this issue shows code[3] as 11, which should reflect the number of bytes read. That would imply it's only processing 1/15 HR Div, up to the first .. My Python is really rusty but adding a test at the Python level fails (SHA: fbab0a8). I'll have to spend some more time with it to figure out what's up at the Python level, unless some Python user can lend a hand.

@mzsanford
Copy link
Owner

I changed the Python test (SHA: a31a815) to call cld.detect with isPlainText=True and it passes. Can you try passing that argument and see if it works as expected?

@leovivek
Copy link
Author

After passing the arguments "isPlainText=True", it worked.
Thanks mzsanford

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants