Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluralisation and templating support #20

Closed
benjamin-hodgson opened this issue Oct 20, 2021 · 11 comments · Fixed by #31
Closed

Pluralisation and templating support #20

benjamin-hodgson opened this issue Oct 20, 2021 · 11 comments · Fixed by #31
Labels
enhancement New feature or request pr welcome

Comments

@benjamin-hodgson
Copy link
Contributor

benjamin-hodgson commented Oct 20, 2021

I'm interested in using webpack-localize-assets-plugin to replace our (Stack Overflow's) homemade localisation system. Our existing system has built-in support for string templating - you write code like

__("$username$ sent you a message", { username: sender.name })

and at runtime the localisation system will return something like Benjamin sent you a message. This is important to us because the target language's grammar might require that the words in the sentence be moved around.

Our localisation system also supports pluralisation through a magic $count$ template variable:

__("$count$ widgets", { count: widgets.length })

will return 1 widget or 12 widgets, or an equivalent one/many pluralisation in the target language. (Again, the target language's grammar may require a different structure for plural sentences.)

Does webpack-localize-assets-plugin have a clean way to do these out of the box? I've considered an API like

formatString(__("$username$ sent you a message"), { username: sender.name })

for the first case - the syntax is a bit messy but it should work. I can't think of a clean way to do pluralisation though - not without changing the plugin itself or building a (messy) code rewriting shim.

Would you accept a pull request adding an API for either of these cases?

@privatenumber
Copy link
Owner

Thanks for your issue @benjamin-hodgson!

RE: templating

In my use-cases, I also swapped out the variables post-localization as you considered with:

formatString(__("$username$ sent you a message"), { username: sender.name })

To support an API like this:

__("$username$ sent you a message", { username: sender.name })

the plugin would have to compile it to something like this because sender.name can only be resolved at run-time.

"$username$ sent you a message".replace('$username$', sender.name)

RE: pluralisation

I think this would depend on what your source of pluralisation data looks like.

Assuming your localization system does not automate pluralisation, I'll guess the i18n system outputs something like:

{
	singular: "$count$ widget", // 1
	plural: "$count$ widgets", // 0, or 2 & more
}

Similar to the examples here.


To address both, I think it makes sense to add a localizationCompiler option that takes in a function that compiles __(string, data) to run-time code.

This function can use any micro-template compiler to convert the string to run-time code:

import { compile } from 'eta'

function localizationCompiler(localizedString, data) {
	if (!data) {
		return JSON.stringify(localizedString)
	}

	return compile(localizedString) + `(${JSON.stringify(data)})`
}

If pluralization data looks like the above:

localizedData = {
	singular: "$count$ widget", // 1
	plural: "$count$ widgets", // 0, or 2 & more
}

function localizationCompiler(localizedData, data) {
	if (!data) {
		return JSON.stringify(localizedData.singular)
	}
	
	const count = Object.entries(data)[0]

	let localizedString
	if (count === 0 || count > 1) {
		localizedString = localizedData.plural
	} else {
		localizedString = localizedData.singular
	}

	return compile(localizedString) + `(${JSON.stringify(data)})`
}

Template engines I looked at:

What do you think?

@privatenumber privatenumber added the enhancement New feature or request label Oct 21, 2021
@benjamin-hodgson
Copy link
Contributor Author

Thanks for the quick reply! That API is more or less exactly what I had in mind. (NB in your second example the code which examines the count would need to be emitted into the runtime bundle, not run in the compiler as you have it.)

Would you like me to send you a pull request for further API discussion?

@privatenumber
Copy link
Owner

Oh yes, you're right! Sorry about that. 😅

PR welcome!

@benjamin-hodgson
Copy link
Contributor Author

benjamin-hodgson commented Oct 23, 2021

I've done a little bit of work on this and it's shaping up to be a fairly substantial rewrite. How amenable are you to a largeish change? Would you prefer if I just forked the lib?

Concretely: locales aren't necessarily a Record<string, string> any more, since the values may contain pluralisation data. I could change the locale to be Record<string, string | { singular: string, plural: string }> or similar, but I'd actually prefer to use a type parameter so that clients can pass arbitrary data to their localizeCompiler function. So I'm ending up with something like

interface Options<LocalizedValue = string> {
    locales: Record<string, string | Record<string, LocalizedValue>>;
    localizeCompiler(value: LocalizedValue, arg: unknown): string;  // also considering passing in an `estree` node representing the `__` call
}
class LocalizeAssetsPlugin<LocalizedValue = string> {
    constructor(options: Options<LocalizedValue>) { ... }
}

zod is getting in my way a bit here, by the way - it doesn't seem to have very good support for type params.

@privatenumber
Copy link
Owner

I think I'm open to it as long as there are no breaking API changes. Worst case you can use your fork, but I'm still happy to review and try to work as much of it into this repo.

I'm OK with removing zod in favor of manually enforcing validation (TS and run-time).

Thanks for your contributions! Looking forward to the PR.

@benjamin-hodgson
Copy link
Contributor Author

Good to hear! I should have something for you tomorrow.

@benjamin-hodgson
Copy link
Contributor Author

benjamin-hodgson commented Oct 27, 2021

So, my idea from last night consists of two components:

  • The ability to inject arbitrary non-string (JSON) data into the bundle from the locale
  • a config option to leave the __ calls in the output

If the locale data looks like

{
    "widgets-key": {
        "singular": "A widget",
        "plural": "$__count$ widgets"
    }
}

and the JS looks like

__("widgets-key", { __count: widgets.length })

then we can generate code like

__({ "singular": "A widget", "plural": "$__count$ widgets" }, { __count: widgets.length })

(note that the __() call was not erased) - and everything else can happen at runtime. If locale-specific pluralisation needs to happen then __ can take care of it by inspecting the current locale or through dynamic dispatch.

We’d need to make sure there’s no way to inject code through the locale data, off the top of my head I think JSON.stringify doesn’t serialise functions but would need to test. Also, JSON injection would need to happen pre-minification I think.

What do you think?

@ethanius
Copy link

ethanius commented Nov 5, 2021

Hello, I am just trying to find out how to do this with this plugin at all and have to tell you - singular/plural is not enough for some languages. For example in czech it is different for 0, 1, 2-4, 5+ (where 0 and 5+ is usually the same but not necessarily). We use a custom solution now, but AFAIK it is based on messageformat.

For example:

{
"statsHelped": "Pomohl {total} {total, plural, one{fotkou} few{fotkami} other{fotkami}}",
}

Calling that in code:

_('statsHelped', { total: 1253 });

That translates to 1, 2-4 (few), and other covers 0 and 5+, because it is not 1 or 2-4. The system also enables writing something like =value, eg. =0 or =42 to cover any specific exception from the other rules. Pretty robust. Singular/plural is highly insufficient to be really used in production.

@privatenumber
Copy link
Owner

Thanks for your insight @ethanius .

We're moving forward with an API where you can pass in your own logic to handle pluralization so it would be up to the user to handle that.

@benjamin-hodgson
Copy link
Contributor Author

benjamin-hodgson commented Dec 8, 2021

@privatenumber I thought you might be pleased to hear that I shipped this to production today! We're now using your plugin across our localised Stack Overflow sites (eg japanese, russian, etc). Thank you very much for all your hard work!

@privatenumber
Copy link
Owner

Wow, that just made my day! 😄
Thanks for using the plugin and contributing your hard work! It was a pleasure collaborating with you @benjamin-hodgson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pr welcome
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants