Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bengali to text-to-lexemes #48

Merged
merged 1 commit into from
Jul 31, 2019
Merged

Conversation

bodhisattwawiki
Copy link
Contributor

No description provided.

@bodhisattwawiki bodhisattwawiki changed the title Add support for Bengali language Add Bengali to text-to-lexemes Jul 31, 2019
@fnielsen
Copy link
Owner

Did you test it? There is sanitization in views.py too.

@fnielsen
Copy link
Owner

There may also be some issues with word tokenization.

@fnielsen fnielsen merged commit bef51b6 into fnielsen:master Jul 31, 2019
@fnielsen
Copy link
Owner

Here is a test: https://tools.wmflabs.org/ordia/text-to-lexemes?text=%E0%A6%AD%E0%A6%BE%E0%A6%B0%E0%A6%A4+%E0%A6%9C%E0%A6%BE%E0%A6%A8%E0%A7%81%E0%A6%AF%E0%A6%BC%E0%A6%BE%E0%A6%B0%E0%A6%BF&text-language=bn

You should tell me: Is the word tokenization ok? I think there might be a problem. For an Indian language it did not initially work as apparently a more complex regular expression was needed.

@fnielsen
Copy link
Owner

Thanks for your contribution!

@bodhisattwawiki bodhisattwawiki deleted the patch-1 branch July 31, 2019 16:05
@bodhisattwawiki
Copy link
Contributor Author

bodhisattwawiki commented Jul 31, 2019

Yes, there is a problem with the text জানুয়ারি in your link, where it has been split into two. Here it looks ok though, with the same texts.

@bodhisattwawiki bodhisattwawiki restored the patch-1 branch July 31, 2019 16:06
@fnielsen
Copy link
Owner

You second link also split the last word (there is something wrong with the way that I construct the URL.

@bodhisattwawiki
Copy link
Contributor Author

Strange, the second link was all right, before I posted the message. Here is the screenshot, where the text didn't split.
Screenshot from 2019-07-31 21-45-06

@fnielsen
Copy link
Owner

I think it is a problem with my handling of POST and GET requests.

@fnielsen
Copy link
Owner

fnielsen commented Jul 31, 2019

You can try and copy-paste the text from the edit field in this URL: https://tools.wmflabs.org/ordia/text-to-lexemes?text=<copy-paste-here>&text-language=bn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants