Skip to content
This repository has been archived by the owner on Sep 4, 2023. It is now read-only.

Use HTML translation feature including inline tag movement #52

Closed
kpu opened this issue Jan 26, 2022 · 4 comments
Closed

Use HTML translation feature including inline tag movement #52

kpu opened this issue Jan 26, 2022 · 4 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@kpu
Copy link
Contributor

kpu commented Jan 26, 2022

Currently the code sends text snippets for translation:

submitTranslation(node, key) {
if (this.messagesSent.has(key)) {
// if we already sent this message, we just skip it
return;
}
const text = node.textContent;
if (text.trim().length) {
/*
* send the content back to mediator in order to have the translation
* requested by it
*/
const payload = {
text,
type: "inpage",
attrId: [
this.processingNodeMap,
key
],
};
this.notifyMediator("translate", payload);
this.messagesSent.add(key);
}
}

This leads to very poor translation quality because the system does not have sentence context. Even if it were to have sentence context, keeping text spans as is prevents reordering and is an impossible translation problem. For example, chien translates to dog. In this HTML, what is the translation of h?
<span id="0">c</span><span id="1">h</span><span id="2">i</span><span id="3">e</span><span id="4">n</span>

Since block elements are sentence-breaking, individual block elements can be sent for translation using their innerHTML. The HTML parser also knows to break sentences at block boundaries so larger elements can also be sent in. It does assume well-formed HTML though; Firefox is better at fixing HTML and this ensures consistency between rendering and how the engine perceives tags. Well-formed implies tags that open also close inside the same block of text; #23 is a blocker.

#51 is a partial blocker. Specifically this part needs to be fixed first:

Even if HTML was being submitted, it would not be properly used (and cause an abort()) because the model doesn't produce alignment information. In the model configuration yaml, the line alignment: soft is missing.

Once that is fixed, HTML processing coming out of the engine should be consistent with https://translate.ikhoefgeen.nl/ .

Quality issues with HTML processing should be raised on https://github.com/browsermt/bergamot-translator

@andrenatal
Copy link
Contributor

andrenatal commented Jan 26, 2022

The team has very little confidence in the current bergamot-translator's embedded HTML translation capabilities due the very well documented issues and long time it took to have it implemented along the constant belittling displayed from your team in regards to the way we were trying to solve this problem while your team was still unable to provide the right tools (again also well documented).

But we decided to give it another try in a couple weeks and submit it to QA, but if we still have issues like page defacing, stripping of tags and etc, we will abandon this altogether and remain with textNodes which will be the approach utilized in the user test we are mandated to run internally.

@andrenatal andrenatal added the enhancement New feature or request label Jan 26, 2022
@andrenatal andrenatal added bug Something isn't working and removed enhancement New feature or request labels Feb 4, 2022
@andrenatal andrenatal added this to the W3 milestone Feb 10, 2022
@abhi-agg
Copy link
Collaborator

#111 imported the latest API changes in bergamot-translator which will enable the extension to use HTML translation feature.

Now, the extension needs to parse the content to be translated and send a boolean flag to indicate whether the content is html or not per batch item to get the translations.

@jelmervdl
Copy link
Contributor

Now, the extension needs to parse the content to be translated

The extension (developer) knows whether it is translating Node.textContent, Node.value or Node.innerHTML if I'm correct. I don't think there is ever a need for parsing the content to determine this flag's value.

@andrenatal
Copy link
Contributor

Fixed by #127

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants