Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Chinese Language #14

Open
lorisgir opened this issue Jan 14, 2021 · 1 comment
Open

Support for Chinese Language #14

lorisgir opened this issue Jan 14, 2021 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@lorisgir
Copy link

Hi, thanks for your hard work on this project. It's really cool!
I've seen issue #7 but I still have some doubts. I would like to try to replace english text with it's corresponding chinese translation, but how can I do so if characters are stored in jpg file named as ASCII numbers? Chinese it's not included in ASCII.
Another question regarding chinese is, do I also need to generate new images, one for each character, in the colornet directory?
Your help would be much appreciated!

@prasunroy prasunroy added enhancement New feature or request help wanted Extra attention is needed labels Jan 20, 2021
@prasunroy
Copy link
Owner

Hi, thanks for your interest in our work.

About FANnet:

At this point it's a bit difficult to replace text written in one language to another. The reason is that we've assumed a character to character translation but not word to word. So it expects source and target texts are equal in length (i.e. same number of characters). Translating a word to another language may not result in a word with same character count. Effectively we want a character to character mapping. This is a major limitation of the current approach as discussed in the paper. But you can still experiment with numeric values 0-9 in different languages where a one-to-one mapping is possible.

However, the current code needs some modifications before you will be able to use such scheme. For example assume the following translations from English to Chinese numerals 0-9:

Note that we are ignoring the fact that Chinese numbers can extend beyond 9.

0 -> 
1 -> 
2 -> 
3 -> 
4 -> 
5 -> 
6 -> 
7 -> 
8 -> 
9 -> 

If we cannot use ASCII values as filenames then we have to use some kind of indexing. Assume our filenames for English numeral images as en0.jpg, en1.jpg, ... , en9.jpg and the same for Chinese numeral images as cn0.jpg, cn1.jpg, ... , cn9.jpg. Also assume filenames for test pairs as 00_en0_cn0.jpg, 01_en1_cn9.jpg etc. Now make the following changes in fannet.py:

Lines 48 and 49:

SOURCE_CHARS = [f'en{i}' for i in range(10)]
TARGET_CHARS = [f'cn{i}' for i in range(10)]

Lines 106 and 107:

ch_src = str(perm[0])
ch_dst = str(perm[1])

Line 221:

idx_ch = self._charset.find(dst_ch)

Line 361:

charset=TARGET_CHARS,

PLEASE NOTE: I haven't checked this personally. So, some other minor issues may appear during training. I would like to provide a fully working notebook in future. But unfortunately for the next couple on months I won't be able to do so and I might be slow to respond.

About Colornet

Colornet doesn't depend on structure of the involved characters. So you might be able to use the provided pretrained weights without retraining! ;) But if you still want to train with new data then you should prepare your data in a format similar to the given dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants