Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add i18n, global config and improve error messages #397

Merged
merged 54 commits into from
Feb 5, 2024

Conversation

fabian-hiller
Copy link
Owner

@fabian-hiller fabian-hiller commented Jan 28, 2024

Overview

This PR is a first draft of our i18n feature discussed in #36. It also improves the built-in error messages and adds global configuration settings.

Global config

When calling parse or safeParse to validate unknown data with a schema, configurations can be passed as the third argument.

import * as v from 'valibot';

const output = v.parse(LoginSchema, input, {
  lang: 'de', // This property is new and part of the i18n feature
  abortEarly: false,
  abortPipeEarly: true,
  skipPipe: false,
});

These configurations can now also be defined globally with setConfig. The configurations passed directly to parse and safeParse have a higher priority and are merged with the global configurations.

import * as v from 'valibot';

v.setConfig({
  lang: 'de',
  abortEarly: false,
  abortPipeEarly: true,
  skipPipe: false,
});

Error messages

Previously, the error messages were extremely minimalistic. For example, if the data type was incorrect, Invalid type was always returned. With the increasing popularity of Valibot, I see it as a benefit to DX and UX if the default messages are more informative.

Inspired by Zod, I added the expected and received properties to each issue. These are language-neutral strings that follow a defined format. They are language-neutral because they use symbols like "!" and "&" instead of words like "not" or "and".

import * as v from 'valibot';

const Schema = v.string([v.minLength(5)]);

v.parse(Schema, 123); // Invalid type: Expected string but received 123
v.parse(Schema, new Date()); // Invalid type: Expected string but received Date
v.parse(Schema, 'abc'); // Invalid length: Expected >=5 but received 3

The special thing about our implementation is that it only requires about 150 additional bytes after minification and compression.

i18n feature

Issue #36 has resulted in the following three requirements:

  1. The default error messages need to be available in more languages
  2. The default error messages must be globally overridable
  3. Specific error messages must be dynamically customizable

More languages

When implementing i18n for Valibot, it was important to keep the bundle size to a minimum. For example, if only string and email are needed, it should be possible that the final bundle contains only these translations. This is done by providing each translation as a separate submodule and importing them as needed, as proposed by @gmaxlev.

import 'valibot/i18n';           // Import any translation
import 'valibot/i18n/email';     // Import specific message
import 'valibot/i18n/de';        // Import specific language
import 'valibot/i18n/de/email';  // Import specific language and message

Globally overridable

The submodules write the translations to a global storage. The API used for this is accessible and can also be used by developers to globally customize error messages and add own translations. setLocalMessage sets the message of a local function like minLength and has a higher priority than setGlobalMessage which can be used to override the default messages globally.

import * as v from 'valibot';

v.setGlobalMessage(
  (issue) => `Invalid input: Expected ${issue.expected} but received ${issue.received}`,
  'en'
);

z.setLocalMessage(
  'min_length',
  (issue) => `Invalid length: Expected ${issue.expected} but received ${issue.received}`,
  'en'
);

Dynamically customizable

Instead of a string, you can also pass a function to setLocalMessage, as well as to schema and validation functions, which has access to all issue information such as expected and configurations such as lang to dynamically generate error messages. This enables integration with i18n libraries such as Paraglide JS.

import * as v from 'valibot';
import * as m from "./paraglide/messages.js"

const Schema = v.string([
  v.email((issue) => m.invalidEmail(issue)), // Can also be written as `v.email(m.invalidEmail)`
]);

The language setting can be defined globally with setConfig or locally when calling parse and safeParse as described above.

Feedback

I think it is possible that Valibot will be downloaded more than 1 million times a month by the end of the year, making it one of the most important schema libraries. I would like to lay the foundation for this together with you, the early adopters. I look forward to receiving feedback on naming and implementation of this PR draft.

Due to the refactoring and the addition of the i18n submodules, the PR is a bit chaotic. If you have any questions, please post them in the comments.

If you are interested, you can clone the repo with the feat-i18n branch and then run pnpm i && cd library && pnpm build to build the library. Then you can either use the playground in library/playground.ts with pnpm playground or pack the library with pnpm pack to install the bundle into another project with pnpm i ./valibot-0.26.0.tgz.

@fabian-hiller fabian-hiller self-assigned this Jan 28, 2024
@fabian-hiller fabian-hiller added enhancement New feature or request priority This has priority labels Jan 28, 2024
This was referenced Jan 28, 2024
@LorisSigrist
Copy link

First off there is a bunch of stuff I really like:

  • Small, pay-as-you-go bundle size
  • Standard properties for use in messages (recieved / expected). This makes providing message functions way easier!

The global message store + side-effect modules for loading work quite well & seem like the right design choice:

Questions:

What would be the impact of switching to references for message-keys?

import * as v from "valibot"

v.setLocalMessage(
  v.minLength, //instead of "min_length"
  (issue) => `Invalid length: Expected ${issue.expected} but received ${issue.received}`,
  'en'
);

Potential Pros

  • Better Minification. Using references instead of string keys made ~30% difference for paraglide, you might experience similar benefits.
  • More obvious than string keys

Cons

  • Use map instead of Object for message registry
  • Must use the same reference in all places a message is accessed

Are "global" and "local" messages the best names?

If I understand correctly "global" messages are generic messages that can be used when no more specific message is avaliable.

"local" messages are very specific messages that apply to only one validator.

I find this terminology a bit confusing. "local" seems really close to "locale", which could lead to misunderstandings.

Perhaps setGenreicMessage and setMessage could work better?

Default message values could be compressed

Looking at the default messages, most seem to follow this pattern:

(issue) =>`Ungültiger 48 bit MAC: ${issue.expected} erwartet aber ${issue.received} erhalten`

"48 bit MAC" is the expected format for this validator. Most messages differ only in that part of the string. The rest is the same, so re-declaring it is wasteful. With a bit of currying this could be compressed:

// i18n/de/genericMessage
const createInvalidFormatMsg = (expectedFormat: string) => (issue) =>`Ungültiger ${expectedFormat}: ${issue.expected} erwartet aber ${issue.received} erhalten`


// i18n/de/minValue
import { setLocalMessage } from 'valibot';

setLocalMessage(
  'min_value',
  createInvalidFormatMsg("Wert"),
  'de'
);

This would minify way better. Since this would be a non-breaking change it could always be done later, so no rush.

@fabian-hiller
Copy link
Owner Author

fabian-hiller commented Jan 29, 2024

Thank you @LorisSigrist for your detailed feedback! I really appreciate it.

What would be the impact of switching to references for message-keys?

Great idea! I will investigate the difference and share the results with you in the comments of this PR.

Use map instead of Object for message registry

I assume that the memory usage is higher this way if only a few translations are imported. However, I have not tested this and the difference is probably negligible.

Are "global" and "local" messages the best names?

I am not 100% happy with the names. I tried to map both with setMessage. However, this does not work as long as we accept a string as the message argument besides a function. If we only accept a function, it should work.

Perhaps setGenericMessage and setMessage could work better?

Since one is just called setMessage, I find it unclear what the difference is without documentation. Maybe we can come up with better names or only allow functions as arguments for error message generation in setMessage. Depending on whether a function is passed as the first or second argument, we can then tell if it is a general or specific error message.

Important: If we change the message keys to references, this procedure no longer works. That's why it may be better to stick with the current approach and separate them using two different functions.

Default message values could be compressed

Technically this works. However, I would first want to check whether it actually reduces the bundle size after gzip compression.

@paoloricciuti
Copy link

Ok this may sound controversial but I think i18n should not be the job of a validation library.

Don't get me wrong, if you want the burden of keeping the translations up to date and the extra maintenance cost I will use valibot even more happily. But I think if an app has the need to i18n errors messages it will probably have the need to i18ze even more and they will have a system for it outside of the validation library. So imho as long as you provide a way to override error messages (which btw I like the API for) I think it will be fine.

What do you think about this?

@fabian-hiller
Copy link
Owner Author

I think i18n should not be the job of a validation library

Thank you @paoloricciuti for your feedback! I agree that integrated translations do not have to be part of Valibot. I included it in this draft because several users requested it. I would be happy to get more feedback on this from other users.

As long as you provide a way to override error messages (which btw I like the API for) I think it will be fine

What do you like about the API? What do you think about the names setGlobalMessage and setLocalMessage?

@fabian-hiller
Copy link
Owner Author

fabian-hiller commented Jan 30, 2024

It might make sense to move the internal translations (if we decide to provide them) into a separate package, e.g. @valibot/i18n. This will simplify the source code of the validation library. Also, we are not dependent on its release cycle and can add or improve translations more flexible.

@paoloricciuti
Copy link

I think i18n should not be the job of a validation library

Thank you @paoloricciuti for your feedback! I agree that integrated translations do not have to be part of Valibot. I included it in this draft because several users requested it. I would be happy to get more feedback on this from other users.

As long as you provide a way to override error messages (which btw I like the API for) I think it will be fine

What do you like about the API? What do you think about the names setGlobalMessage and setLocalMessage?

I refer to this API

import * as v from 'valibot';
import * as m from "./paraglide/messages.js"

const Schema = v.string([
  v.email((issue) => m.invalidEmail(issue)), // Can also be written as `v.email(m.invalidEmail)`
]);

Regarding setLocalMessage and setGlobalMessage I agree is not the best naming I would probably go the route of setGlobalErrorMessage and setFieldErrorMessage but I don't know if those are right.

@fabian-hiller
Copy link
Owner Author

fabian-hiller commented Feb 1, 2024

Thanks for your feedback! Based on it I changed a few things.

I renamed setConfig to setGlobalConfig and setLocalMessage to setSpecificMessage. Also, as recommended by @LorisSigrist, I switched to references for the message keys. In my eyes this improves the DX and makes the API safer to use. The biggest difference of these changes is that you have to add the translations of the ...Async functions separately.

import * as v from 'valibot';

z.setSpecificMessage(
  v.minLength,
  (issue) => `Invalid length: Expected ${issue.expected} but received ${issue.received}`,
  'en'
);

I also added setSchemaMessage since the dynamic message of schema functions is usually completely the same. So you can choose between setGlobalMessage, setSchemaMessage and setSpecificMessage.

import * as v from 'valibot';

v.setSchemaMessage(
  (issue) => `Invalid type: Expected ${issue.expected} but received ${issue.received}`,
  'en'
);

The last change was that I moved the official translations into a separate package called @valibot/i18n. I also wrote a custom build step that is much faster than the previous one and allows us to keep the translations in a type-safe way in a single TypeScript file.

import { Language } from './types';

// prettier-ignore
const language: Language = {
  code:             'en',
  schema:           (issue) => `Invalid type: Expected ${issue.expected} but received ${issue.received}`,
  specific: {
    bic:            (issue) => `Invalid BIC: Received ${issue.received}`,
    bytes:          (issue) => `Invalid bytes: Expected ${issue.expected} but received ${issue.received}`,
    creditCard:     (issue) => `Invalid credit card: Received ${issue.received}`,
    // ...
   }
 };
 
 export default language;
import '@valibot/i18n';           // Import any translation
import '@valibot/i18n/de';        // Import specific language
import '@valibot/i18n/de/email';  // Import specific language message

@fabian-hiller
Copy link
Owner Author

I plan to merge my changes in the next few days and release a new version next week.

@gmaxlev
Copy link
Contributor

gmaxlev commented Feb 3, 2024

I like how Valibot follows modular principles and separates i18n into its own package, along with the code generation that can make things simpler.

There are a couple of things I'd like to point out.

1. Default English Language

Despite having en.ts file in i18n package, I noticed the following comment in the file:

// Create languages array
// Note: The language file `en` does not need to be added as the default
// messages of Valibot are already in English

Am I correct in understanding that Valibot will come with all English translations right away? The thing is, not all applications use English, so including it by default could increase the bundle size unnecessarily. Why not leave the language choice to the user? Moreover, this will help reduce Valibot bundle size and maintain the modular design.

2. Single Language

Currently, we can only specify one language:

const output = v.parse(LoginSchema, input, {
  lang: 'de',
});

However, it seems to me that as Valibot evolves, the number of supported languages will only increase, and the community might not always keep up with translating all languages immediately after a new feature is released. This means there might be situations where some languages lack complete translations, especially for languages with fewer speakers.

To handle such situations and make the user experience as convenient as possible, why not allow specifying multiple languages in a prioritized order?

const output = v.parse(LoginSchema, input, {
  lang: ['de', 'fr', 'en'],
});

Now, if a translation is not found for the de locale, it will search in fr, and so on.

This could be beneficial for many countries.

3. Expected/Received in Every Translation

Currently, each translation includes information about the received and expected values. I think this information can often be excessive and make it challenging to use. Why not make the display of this information optional? I believe it won't significantly impact the bundle size.

(issue) => `Invalid email${issue.inDetail ? `: Expected ${issue.expected} but received ${issue.received}`: ``}`

Moreover, with utility functions, you can further reduce the amount of code:

function util(message: string, issue: SchemaIssue) {
  return `${message}${issue.inDetail ? `: Expected ${issue.expected} but received ${issue.received}` : ``}`;
}

4. Documentation

Similarly to how a bundle is generated for the i18n package, we can generate such a table for the documentation page to help the community maintain complete translations:

language email mac ipv4
en
de
uk

This way, it becomes easier for contributors to see which languages might need attention in terms of translations.

This can also inspire users to contribute translations in their language 🙂

@fabian-hiller
Copy link
Owner Author

fabian-hiller commented Feb 3, 2024

Thank you very much for your feedback and contribution @gmaxlev!

Am I correct in understanding that Valibot will come with all English translations right away?

Yes and no. Valibot's default messages are in English, but they are not included as a translation pack. Instead, they are backed into the source code. This allows us to implement the default messages with better performance and smaller bundle size. The default messages add about 60 kB initially to the core bundle, while an entire language pack adds about 700 bytes (you can also reduce the bytes of a language pack by importing only the messages you need).

We could outsource all the messages to @valibot/i18n and have no messages in the core valibot package by default. We could also try to outsource a large part of the i18n code to @valibot/i18n and only provide the functionality to inject i18n functionality into the core package. This could further reduce the initial bundle size of valibot by up to 100 bytes.

If we outsource all the messages and part of the implementation to @valibot/i18n, this would have the advantage that the initial bundle size could be about 200 bytes smaller if the default messages and i18n are not needed. On the other hand, this would mean more work and a slightly larger total bundle size if you want to use the default messages.

In the end, it is always a trade-off between bundle size, performance, and developer experience. For example, in some places in the library, we deliberately do not use the spread operator because it is a performance bottleneck. This decision increases the initial bundle size by about 100 to 200 bytes.

I have not made a final decision on this yet, so I am looking forward to your feedback.

To handle such situations and make the user experience as convenient as possible, why not allow specifying multiple languages in a prioritized order?

With the current implementation, the default messages in English are always the fallback if a translation is missing. But I understand your point and see the benefit of this implementation. Again, it's a tradeoff between bundle size, performance, and developer experience. If we decide to outsource a large part of the i18n implementation to @valibot/i18n, I am much more open to this change.

Currently, each translation includes information about the received and expected values. I think this information can often be excessive and make it challenging to use.

I like the idea of this being configurable. Especially in combination with your util function.

Similarly to how a bundle is generated for the i18n package, we can generate such a table for the documentation page to help the community maintain complete translations

This is also a really good idea! I plan to implement it.

@fabian-hiller
Copy link
Owner Author

The initial additional bundle size due to the 3 new features (global config, detailed error messages, and i18n functionality) is approximately 350 kB. However, by improving the existing code, the additional bundle size of this PR is expected to be about 310 kB. The global config is about 50 kB, the detailed error messages are about 200 kB, and the i18n functionality is about 120 kB. Due to compression, the total bundle size is slightly less than that.

@fabian-hiller
Copy link
Owner Author

fabian-hiller commented Feb 4, 2024

3. Expected/Received in Every Translation

Currently, each translation includes information about the received and expected values. I think this information can often be excessive and make it challenging to use. Why not make the display of this information optional? I believe it won't significantly impact the bundle size.

(issue) => `Invalid email${issue.inDetail ? `: Expected ${issue.expected} but received ${issue.received}`: ``}`

Moreover, with utility functions, you can further reduce the amount of code:

function util(message: string, issue: SchemaIssue) {
  return `${message}${issue.inDetail ? `: Expected ${issue.expected} but received ${issue.received}` : ``}`;
}

I checked the implementation and the bundle size. This feature would cost an additional 40 kB. Because of gzip compression, it makes no difference if we implement it with or without a util function. I am not sure if we should add this feature now or wait until multiple users request it and then decide. We could also research if users of Zod have ever requested this feature.

@fabian-hiller
Copy link
Owner Author

I also want to mention that it is really easy to overwrite the error message and remove the details using setGlobalMessage:

import * as v from 'valibot';

v.setGlobalMessage((issue) =>
  issue.message.slice(0, issue.message.indexOf(':'))
);

@fabian-hiller
Copy link
Owner Author

I expect to merge this PR tomorrow. However, I am still happy to receive feedback. All feedback is heard and influences the long-term development of the library. Major changes are still possible until v1.

@fabian-hiller fabian-hiller marked this pull request as ready for review February 5, 2024 18:45
@mxdvl
Copy link
Contributor

mxdvl commented Feb 5, 2024

Better error messages will be very welcome!

How did you calculate the 150 bytes increase?

@fabian-hiller
Copy link
Owner Author

fabian-hiller commented Feb 5, 2024

I calculated it by bundling a schema with and without the respective code. The byte amount is the minified and gzipped version.

@fabian-hiller fabian-hiller merged commit 87457e7 into main Feb 5, 2024
6 checks passed
@fabian-hiller fabian-hiller deleted the feat-i18n branch February 5, 2024 22:13
@fabian-hiller
Copy link
Owner Author

fabian-hiller commented Feb 6, 2024

v0.28.0 with i18n is available: https://valibot.dev/guides/internationalization/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority This has priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants