truecasing #12

MaksymDel · 2019-02-21T14:59:59Z

Hi,

did you do truecasing/lowercasing in your MT experiments? From the code I can't find any signs of this.

Is there any specific reason to do / not do it?

Thanks

glample · 2019-02-21T15:02:26Z

Hi,

No, we never used truecasing / lowercasing. This was quite popular in PBSMT models, but for NMT the best is to use BPE. BPE are typically applied on sentences with regular casing.

glample · 2019-02-21T15:03:17Z

We used lowercasing + BPE for XNLI though, as in this case the task is to do sentence classification, and the casing is not very useful. But in MT, where you need to generate, it's good to directly generate the good casing and BPE does this very well.

MaksymDel · 2019-02-21T15:10:22Z

@glample Moses truecasing only modifies the case of the 1st word in a sentence (it does not modify things like Named Entities thou). This is to reduce the sparcity of the vocabulary (why to have both "Starting" and "starting" in the vocab?).

With BPE you have the same issue with the 1st wordpiece of the 1st word (e.g. start ing vs Start ing).

glample · 2019-02-21T15:13:06Z

Yes you could use truecasing in combination with BPE, but probably it wouldn't make a big difference. Also it's nice to limit the number of preprocessing steps in practice, I guess this is also why people don't use truecasing anymore. But it wouldn't hurt to use it for sure.

MaksymDel · 2019-02-21T15:15:26Z

@glample I didn't catch that people don't use truecasing anymore, so will look more into it. Thanks for pointing out!

MaksymDel closed this as completed Feb 21, 2019

JxuHenry mentioned this issue Oct 28, 2019

I train UNMT with multi-GPU got the following errors! #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

truecasing #12

truecasing #12

MaksymDel commented Feb 21, 2019 •

edited

Loading

glample commented Feb 21, 2019

glample commented Feb 21, 2019

MaksymDel commented Feb 21, 2019 •

edited

Loading

glample commented Feb 21, 2019

MaksymDel commented Feb 21, 2019

truecasing #12

truecasing #12

Comments

MaksymDel commented Feb 21, 2019 • edited Loading

glample commented Feb 21, 2019

glample commented Feb 21, 2019

MaksymDel commented Feb 21, 2019 • edited Loading

glample commented Feb 21, 2019

MaksymDel commented Feb 21, 2019

MaksymDel commented Feb 21, 2019 •

edited

Loading

MaksymDel commented Feb 21, 2019 •

edited

Loading