Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[localize-tools] Validation of existing translation should not fail on changed expression #4502

Closed
augustjk opened this issue Jan 18, 2024 · 4 comments · Fixed by #4530
Closed

Comments

@augustjk
Copy link
Member

augustjk commented Jan 18, 2024

Which package(s) are affected?

Localize (@lit/localize)

Description

Reported by @kensternberg-authentik in #4489

https://lit.dev/docs/localization/overview/#message-ids states the rules for message id generation and how they're used to deduplicate translation targets. This can be interpreted that changing the code inside expressions in your source code should not fail on future extract and build calls but there seems to exist some validation that compares the value of equiv-text attribute.

Reproduction

See linked discussion above for sample and error message.

Workaround

Manually update the xliff to match the equiv-text

Is this a regression?

No or unsure. This never worked, or I haven't tried before.

Affected versions

@lit/localize-tools@0.7.1

Browser/OS/Node environment

n/a

@augustjk
Copy link
Member Author

According to the comment here

/**
* Check that for every localized message, the set of placeholders in the
* localized version is equal to the set of placeholders in the source version
* (no more, no less, no changes, but order can change).
*
* It is important to validate this condition because placeholders can contain
* arbitrary HTML and JavaScript template literal placeholder expressions, will
* be substituted back into generated executable source code. A well behaving
* localization process/tool would not allow any modification of these
* placeholders, but we can't assume that to be the case, so it is a potential
* source of bugs and attacks and must be validated.
*/
placeholders don't always contain just the expression code and can contain html too so we ignoring it might not be the best idea.

But perhaps this assumption doesn't hold when we consider multiple source msg() calls can map to a single trans-unit.

Need to investigate and consider this more carefully.

@augustjk
Copy link
Member Author

Found some more context here: #2405 (comment)

The churn in extraction was anticipated but looks like we didn't really handle that at the time, only addressing the more pressing problem with making sure the correct expressions are used in transform mode.

In the PR above, we decided to treat same id with the same description as canonically the same translation unit.

Firstly, this should be made clear in the docs in lit.dev.

Secondly, I think we need to update the validation function

export function validateLocalizedPlaceholders(

to match this behavior so we just check for the placeholder count. Might we need to also check for description in a <note from="lit-localize">? If so, changing the description, or adding a new one, might cause error and churn.

Another concern with this is we would no longer check equiv-text for discrepancy.. I think the comment above might be outdated with regards to it being "substituted back into generated executable source code", so perhaps there's no need to really check that. Need to confirm this.

If we make these pass validation, I think the extract will end up overwriting the previous equiv-text value with whatever the analysis first encountered. Need to check if that is the case and whether it does it for both <source> and <target>.

@augustjk augustjk changed the title [localize-tools] extract validation of existing translation should not fail on changed expression [localize-tools] Validation of existing translation should not fail on changed expression Jan 26, 2024
@augustjk
Copy link
Member Author

Got around to trying out some things and (re)discovering how lit-localize actually works. Thought dump here so I don't forget it all later.

When running extract, it'll always overwrite <source> of the translation unit with analysis from source code.

The validation and potential error happens when build is run, as it validates the placeholder in the translated <target> against the source code.

For runtime mode, it looks like localize currently uses the placeholder text from the <target> tag as it generates code. This is problematic if <target> tags have been modified and not verified against source.

For transform mode, localize only uses the index of the placeholder from <target> for positioning, but the actual placeholder content comes from source code.

Potential forward path here, we can make runtime mode behave like transform in that it'll always source placeholder text from source code. This removes the concern from the original comment above of arbitrary code injection from the translation targets. The only useful validation we'd really need then is just the number of placeholders so we know where to place them.

But without content validation, we won't detect any drift from source code to translation target for cases were the id was manually made. Would that actually matter? (This kind of drift wouldn't exactly happen with auto generated ID as it would be treated as a new translation unit then)

We could make the validation specifically ignore the expression (everything inside ${}) from the placeholder text too but still check for other things like html tags to catch cases with fixed custom ids.

@augustjk
Copy link
Member Author

Opted to implement

We could make the validation specifically ignore the expression (everything inside ${}) from the placeholder text too but still check for other things like html tags to catch cases with fixed custom ids.

This required the smallest change with the correct behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant