Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertical japanese doesn't translate well #20

Open
Meerkov opened this issue Apr 18, 2021 · 3 comments
Open

Vertical japanese doesn't translate well #20

Meerkov opened this issue Apr 18, 2021 · 3 comments

Comments

@Meerkov
Copy link

Meerkov commented Apr 18, 2021

Though uncommon, some games use top-to-bottom (and right-to-left) written Japanese. I found that the system struggled to identify a block of text that was only 1 character wide.

And let's say that there was a block of text that was 3 characters wide by 10 characters tall. Then it would translate it as if it was 10 lines of text that were 3 characters each, resulting in a garbled mess.

Furthermore, the translation would then try to fit this garbled mess of english into a vertical format that it doesn't really fit in...

I'm not sure how this should be fixed, as it's likely a failure on the Cloud side... but maybe there is a way to throw in a hack whenever a block of japanese text that is much taller than wide? Probably between the OCR step and the translate step?

@SethRobinson
Copy link
Owner

Yeah, Google's stuff can't handle this yet.

It should be possible to manually piece characters together and send that for translation (we do have the position of each character on the screen, not just each "word" or whatever) but I currently don't have plans to add this.

@SethRobinson
Copy link
Owner

SethRobinson commented Jul 21, 2021

Note, this has changed! Google does properly handle vertical text now, when testing with examples on https://w3c.github.io/i18n-drafts/articles/vertical-text/index.en the OCR does fine.

And while it's possible to click and hear the correct Japanese (or English translation) being spoken, it's formatted horizontally (by UGT) so it's difficult to read.

Will have to give that some thought on formatting, but good to know the Google side can do it now.

@Meerkov
Copy link
Author

Meerkov commented Jul 22, 2021

I found out apparently you can also try to set the model to "builtin/latest", which gives the newest features. Apparently vertical text detection was available in that model for 2 years, according to a blogpost I saw. It might be worth trying that setting to see if it makes a difference in the quality of the detection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants