Codes for Languages (and language groups) of the World are covered by the ISO-639 standard
These standards provide letter codes for each language. E.g. ISO-639-3 provides a three-letter code for all living languages.
There are too many such codes to be contained in a java-enum (e.g. https://github.com/TakahikoKawasaki/nv-i18n/blob/master/src/main/java/com/neovisionaries/i18n/LanguageAlpha3Code.java is just not complete)
This package has the tab seperated files provided by https://iso639-3.sil.org/, and java classes to read this, and provide all language codes as java objects, with getters.
import org.meeuw.i18n.languages.*;
// get a language by its code;
Optional<LanguageCode> optional = ISO_639.getByPart3("nld");
LanguageCode languageCode = LanguageCode.languageCode("nl");
// show its 'inverted' name
System.out.println(languageCode.nameRecord(Locale.US).inverted());
// get a language family
Optional<LanguageFamilyCode> family = ISO_639.getByPart5("ger");
// get by any code
Optional<ISO_639_Code> byCode = ISO_639.get("nl");
// stream by names, language may have several names (dutch, flemish), and appear multiple times
ISO_639_Code.streamByNames().forEach(e -> {
System.out.println(e.getKey() + " " + e.getValue());
});
See also the test cases
link:src/test/java/org/meeuw/i18n/languages/test/LanguageCodeTest.java[role=include]
LanguageCode#getByCode
will also support retired codes if possible. This means that the code of the returned object may be different:
// the 'krim' dialect (Sierra Leone) officially merged into 'bmf' (Bom-Kim) in 2017
assertThat(LanguageCode.getByCode("krm").get().getCode()).isEqualTo("bmf");
Sometimes we have to deal with systems which have their own versions of the standards. In these cases it is possible to register 'fall backs'.
E.g.
// Our partner uses the pseudo ISO-639-1 code 'XX' for 'no language'
// fall back to a proper Part 3 code.
try {
LanguageCode.registerFallback("XX", LanguageCode.languageCode("zxx"));
assertThat(ISO_639.iso639("XX").code()).isEqualTo("zxx");
} finally {
LanguageCode.resetFallBacks();
}
The language code is annotated with a JAXB annotation. It will serialize and deserialize to and from the code. The dependency on the annotation is optional.
The needed classes are also annotated by Jackson annotations, so they can be serialized to and from JSON.
LanguageCode
is serializable too, and ensures that on deserialization the same object for every language is returned. (And only the code is non-transient).
The default sort order of a LanguageCode
used to be on 'Inverted Name'. There may be more than one (inverted) name though (E.g. Dutch and Flemish). Since 3.0 LanguageCode is not Sortable anymore. LanguageCode#stream() is sorted by ISO-639-3 code.
The files of sil.org
only provide names of languages and language families in English. Java’s java.util.Locale
can provide the name of a lot of languages (I presume the language with a ISO-639-1 code) in a lot of other languages.
E.g.
new Locale("en").getDisplayName(new Locale("nl"));
will give the name of the language 'English' in Dutch ('Engels').
To make this available in a generic way the base interface ISO_639
just has
String getDisplayName(Locale locale)
For ISO_639_1
this will first try to use the above way with java.util.Locale
. For other codes, and as a fallback in ISO_639_1
will use the resource bundle org.meeuw.i18n.languages.DisplayNames
, where the default and english names are provided by the sil.org
files.
Noticeably, for example for sign languages this is the way to have proper names available
assertThat(LanguageCode.languageCode("dse").getDisplayName(new Locale("nl"))).isEqualTo("Nederlandse Gebarentaal");
<1 |
developing/testing |
2023 |
|
1.x |
compatible with java 8, javax.xml, module-info java 11 |
||
1.0 |
2023-11-30 |
||
2.x |
java 11, jakarta.xml |
2024-01-28 |
jakarta mostly applies to the optional jaxb support (and to some - also optional - validation annotations) |
2.1 |
support for retired codes |
2024-02-11 |
|
2.2 |
migrated support for language code validation from i18n-regions |
2024-? |
|
3.0 |
Refactoring |
2024-3 |
Added enum for ISO-639-1 codes,
Made syntax forward compabible with records. So, getters like |
3.1 |
Refactoring |
2024-3 |
Support for ISO-639-5. Dropped the -3 from the artifact id. |
3.6 |
Support for #getDisplayName |
2024-8 |