Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: externalized messages support #4311

Closed
wants to merge 60 commits into from
Closed

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Dec 16, 2015

@nodejs/ctc @nodejs/collaborators ...

externalizing the error and output messages in lib

This introduces a new internal/messages module that exports a
template string tag function. The externalized messages are in
internal/messages_en.js for now but can be moved to a different
location.

Within the lib src, we can use this mechanism using tagged template
strings:

const I18N = require('internal/messages');

const foo = 'testing';
throw new Error(I18N`Example error: ${foo}`);

In the messages_en.js file, we'd have something like:

{
  ...,
  "Example error: {0}": {
    "key": "ERR000009"
  },
  ...
}

The template string would expand into:

ERR000009: Example Error: testing

Later on, when we get into actually translating messages, we
would have other message_{locale}.js files (or however else we'd
like to separate it out) that would look something like:

{
  ...,
  "Example error: {0}": {
    "key": "ERR000009",
    "value": "Ejemplo de error: {0}"
}

I'm not yet convinced this approach is ideal, but it's functional.
There are a few gotchas with template strings currently that are less
than ideal (e.g. line wrapping issues, lookup is a bit inefficient).

Getting this out now, tho, to solicit input on the approach.

This is still a work in progress

@jasnell jasnell added wip Issues and PRs that are still a work in progress. semver-major PRs that contain breaking changes and should be released in the next major version. lib / src Issues and PRs related to general changes in the lib or src directory. labels Dec 16, 2015
@jasnell jasnell self-assigned this Dec 16, 2015
"Incorrect value for stdio stream: {0}": {
"key": "ERR000010"
},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the spaces for?

This isn't JSON, so we could add comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just visual separation for now. As I said, it's a work in progress ;-) I'm not in love with the idea of stuffing all these into a single js file

@Qard
Copy link
Member

Qard commented Dec 16, 2015

It might be neat if the mechanisms behind this could be usable in userland. Just a thought.

@indutny
Copy link
Member

indutny commented Dec 16, 2015

@jasnell what do you think about precompiling this as we currently do for debug(). Do our users need to swap locales when using the same node.js binary?

What I don't like about this approach is that it adds runtime cost to the most (if not all) textual reporting in core. Ideally we may use some kind of macro in C++ or whenever to replace this I18N... with their translation either during startup, or during compilation of node.js binary.

'use strict';

// adapted from the approach documented at:
// http://jaysoo.ca/2014/03/20/i18n-with-es6-template-strings/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe note why we are doing it like this a bit more here?

Also since this won;t be in the API docs it may be useful to have a quick API-doc thing here. (i.e. what to put in, what comes out)

@Fishrock123
Copy link
Contributor

I think the approach is pretty nice, but I also agree with @indutny.

@jasnell
Copy link
Member Author

jasnell commented Dec 16, 2015

just to be certain, I'm definitely not sold on this approach yet.

@Fishrock123 @indutny ... we could certainly use compiled variants to improve performance. That does get a bit difficult to manage (one binary for each supported language) but it's certainly not out of the question (it's just a bit more work for the build and release team).

I'd be favorable of that approach but we'd need to be clear about the impact and we'd still need to have the clear ID to message mapping. A pre-processor that runs during make wouldn't be too difficult to build.

@domenic
Copy link
Contributor

domenic commented Dec 16, 2015

Are we really worried about runtime costs in error situations anyway?

@jasnell
Copy link
Member Author

jasnell commented Dec 16, 2015

likely not, but there's a trade off. I'd rather not beef up the binary size to hold a bunch of translated strings if we can avoid it.

@mcollina
Copy link
Member

That's great seeing progress in this area, as a non-native English speaker I rejoice. Even though I like my errors to be in English (mainly because they are easier to search, and I am lazy), translated errors were really helpful when my English was poor.

A couple of things: packagers (deb, rpm) usually package lang files separately. There are reasons for doing this: lots of languages, not requiring a release for adding a locale, ecc. However, if these are not in the main bundle, it is unlikely that a newbie would install them in the first place. On Windows, we can possibly show a nice dialog. But on Mac and Linux? We need to have a clear story for packagers.

Regarding performance: what is the impact? Most platforms were performance is important (e.g. iOS, Android) have externalized errors/strings.

Really really dumb question: why can't we just have functions? Basically we would have a i18n_en.js that is looked up and exposed by i18n at startup.

const ERR000009 = I18N.ERR000009

throw new Error(ERR000009('a', 'b'))

Last thing to consider: a lot of userland modules might want/would benefit of similar functionality.

@Fishrock123
Copy link
Contributor

Last thing to consider: a lot of userland modules might want/would benefit of similar functionality.

At very least, it would probably be a good standard to have.

@jasnell Maybe consider prefixing with NODE_?

@jasnell
Copy link
Member Author

jasnell commented Dec 17, 2015

@mcollina .. the main issue with the function form (which I considered also) is that we'd have to create the function objects for each message -- which isn't out of the question but that ends up being a lot of additional objects kept around. Not sure if it's enough of an impact to worry about tho.

The main challenge with this is that there are several approaches we can take to the API and it's easy to bike shed. For the most part, we simply need to come up with one approach that everyone can at least work with. Once that's settled I can get it implemented and run performance benchmarks.

@Fishrock123 ... the other question I have is whether there should be a single generic naming pattern or if we should tie it to the module... for instance instead of NODE_ERR0001, NODE_ERR0002 it would be like NODE_CHILDPROC_ERR1, NODE_STREAMS_ERR1 that kind of thing.

@Fishrock123
Copy link
Contributor

@jasnell ooh, that could be even better.

@jasnell
Copy link
Member Author

jasnell commented Dec 17, 2015

@mcollina ... btw, as part of this effort, once the approach is settled on, I'm planning to implement a command line tool that can accept one of the message identifiers and output the explanation for it, preferably with localization as well. In other words, if you get an ERR_CHILDPROC_ERR1 you would be able to do $ node-explain ERR_CHILDPROC_ERR1 and it would output a description of the error and other relevant information. Obviously that's a much larger chunk of work tho.

@srl295 srl295 added the i18n-api Issues and PRs related to the i18n implementation. label Dec 17, 2015
@YurySolovyov
Copy link

@jasnell
I don't quite get why

we'd have to create the function objects for each message

Can you explain?

@mcollina
Copy link
Member

@mcollina .. the main issue with the function form (which I considered also) is that we'd have to create the function objects for each message -- which isn't out of the question but that ends up being a lot of additional objects kept around. Not sure if it's enough of an impact to worry about tho.

I am ok for trading a bit a of memory for speed. It would probably be negligible, given all the functions return strings that would also need to be maintained in memory. However, it is something to worry about for low-memory devices (IoT?).

The main challenge with this is that there are several approaches we can take to the API and it's easy to bike shed. For the most part, we simply need to come up with one approach that everyone can at least work with. Once that's settled I can get it implemented and run performance benchmarks.

I agree. We should possibly iterate on it between releases.

@Fishrock123 ... the other question I have is whether there should be a single generic naming pattern or if we should tie it to the module... for instance instead of NODE_ERR0001, NODE_ERR0002 it would be like NODE_CHILDPROC_ERR1, NODE_STREAMS_ERR1 that kind of thing.

👍, much better.

@mcollina ... btw, as part of this effort, once the approach is settled on, I'm planning to implement a command line tool that can accept one of the message identifiers and output the explanation for it, preferably with localization as well.

Yes, but it is really needed. There are some incomplete/bad explanations around the net.

@jasnell
Copy link
Member Author

jasnell commented Jan 7, 2016

@Fishrock123 @mcollina @indutny @Qard @nodejs/ctc @nodejs/collaborators ... Ok, I've taken a step back and did a major refactoring on this and force pushed and update. The new approach is documented in the new commit log in 68792d5.

The changes are:

  1. Messages are now moved into the compiled node binary under src/messages/en/messages.h (previously they were in a *.js file)
  2. A process.binding('messages') object and an internal/messages module are provided to give access to these messages from within lib.
  3. Message keys are defined as integer constants exposed by process.binding('messages')
  4. This does not use the template string approach due to that simply being too inefficient of a model. The new approach significantly outperforms the use of the template string model.

Please take a look and let me know if this is heading in the right direction.

@jasnell
Copy link
Member Author

jasnell commented Jan 7, 2016

I would note that this is a work in progress! The list of messages is not complete. Fortunately, we do not have a large number of messages that we output.

@Fishrock123
Copy link
Contributor

Interesting, will think about it more tomorrow.

@mcollina
Copy link
Member

mcollina commented Jan 8, 2016

I am generally 👍 regarding this, the API is simple enough and we can possibly change the implementation later.

return this.error('Breakpoint not found on line ' + line);
}
if (breakpoint === undefined)
return this.error(I18N(I18N.NODE_DEBUGGER_BREAKPOINT_NOT_FOUND,line));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after the comma.

// I18N(key[, ...args])
// Returns: string
function message() {
const key = arguments[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function always has one argument it's better to declare it as function message(key) {, that avoids an ArgumentsAdaptorTrampoline stack frame.

@jasnell
Copy link
Member Author

jasnell commented Mar 22, 2016

This is likely to go a different direction, currently pending @srl295's work to make ICU always available as a dependency. Closing this for now, will reopen when it's closer to a full solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-api Issues and PRs related to the i18n implementation. lib / src Issues and PRs related to general changes in the lib or src directory. semver-major PRs that contain breaking changes and should be released in the next major version. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.