Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Hydrated stub pages with metadata and structure; first drafts of constructor and supportedLocalesOf pages * Segmenter examples (#4) * make spanish_segmenter more... modern? * add example and syntax to Segmenter#resolvedOptions * add example and syntax to Segmenter#segment * add information about segment data objects * Hit the 80-20 point on Intl.Segmenter * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/constructor/index.md Co-authored-by: Richard Gibson <richard.gibson@gmail.com> * Apply suggestions from code review Co-authored-by: Richard Gibson <richard.gibson@gmail.com> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/constructor/index.md Co-authored-by: Richard Gibson <richard.gibson@gmail.com> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/constructor/index.md Co-authored-by: Richard Gibson <richard.gibson@gmail.com> * Fixed constructor structure * Fixed constructor structure * Apply suggestions from code review Co-authored-by: Richard Gibson <richard.gibson@gmail.com> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/segments/index.md Co-authored-by: Richard Gibson <richard.gibson@gmail.com> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/index.md Co-authored-by: Richard Gibson <richard.gibson@gmail.com> * Fixed main index link reference * Fixed code block error * Wrote the @@iterator page * Apply suggestions from code review Co-authored-by: wbamberg <will@bootbonnet.ca> * Rework tree structure per @Elchi3 comment * Remove exotic whitespace/gremlin * Add interactive examples (cf. mdn/interactive-examples#1987) * Remove jsxref, fix links, normalize tags * Taking review comments into account, improving examples * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/segment/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/segmenter/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segments/containing/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segments/@@iterator/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segmenter/supportedlocalesof/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Update files/en-us/web/javascript/reference/global_objects/intl/segments/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Remove interactive example due to Fx missing impl. * sort methods alphabetically * Update files/en-us/web/javascript/reference/global_objects/intl/segments/@@iterator/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * Favor const * improve example while condition * Update files/en-us/web/javascript/reference/global_objects/intl/segments/containing/index.md Co-authored-by: wbamberg <will@bootbonnet.ca> * DLify localeMatcher * this one needs to be let Co-authored-by: Ujjwal Sharma <ryzokuken@disroot.org> Co-authored-by: Richard Gibson <richard.gibson@gmail.com> Co-authored-by: Romulo Cintra <romulocintra@users.noreply.github.com> Co-authored-by: wbamberg <will@bootbonnet.ca> Co-authored-by: julieng <julien.gattelier@gmail.com> Co-authored-by: SphinxKnight <SphinxKnight@users.noreply.github.com>
- Loading branch information
1 parent
8ade1ab
commit c95e770
Showing
12 changed files
with
492 additions
and
30 deletions.
There are no files selected for viewing
55 changes: 48 additions & 7 deletions
55
files/en-us/web/javascript/reference/global_objects/intl/segmenter/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,65 @@ | ||
--- | ||
title: Intl.Segmenter | ||
slug: Web/JavaScript/Reference/Global_Objects/Intl/Segmenter | ||
tags: | ||
- Internationalization | ||
- Intl | ||
- JavaScript | ||
- Localization | ||
- Reference | ||
browser-compat: javascript.builtins.Intl.Segmenter | ||
--- | ||
{{JSRef}} | ||
|
||
The **`Intl.Segmenter`** object is a constructor for segmenters, objects that enable language sensitive string splitting. | ||
The **`Intl.Segmenter`** object enables locale-sensitive text segmentation, enabling you to get meaningful items (graphemes, words or sentences) from a string. | ||
|
||
## Constructor | ||
|
||
- [`Intl.Segmenter()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/Segmenter) | ||
- : Creates a new `Segmenter` object. | ||
- : Creates a new `Intl.Segmenter` object. | ||
|
||
## Static methods | ||
|
||
- {{jsxref("Intl.Segmenter.supportedLocalesOf", "Intl.Segmenter.supportedLocalesOf()")}} | ||
- [`Intl.Segmenter.supportedLocalesOf()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/supportedLocalesOf) | ||
- : Returns an array containing those of the provided locales that are supported without having to fall back to the runtime's default locale. | ||
|
||
## Instance methods | ||
|
||
- {{jsxref("Intl.Segmenter.segment", "Intl.Segmenter.prototype.segment()")}} | ||
- : Getter function that segments a string according to the locale and granularity of this {{jsxref("Global_Objects/Intl/Segmenter", "Intl.Segmenter")}} object. | ||
- {{jsxref("Intl.Segmenter.resolvedOptions", "Intl.Segmenter.prototype.resolvedOptions()")}} | ||
- : Returns a new object with properties reflecting the locale and granularity options computed during initialization of this {{jsxref("Global_Objects/Intl/Segmenter", "Intl.Segmenter")}} object. | ||
- [`Intl.Segmenter.prototype.resolvedOptions()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/resolvedOptions) | ||
- : Returns a new object with properties reflecting the locale and granularity options computed during initialization of this `Intl.Segmenter` object. | ||
- [`Intl.Segmenter.prototype.segment()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/segment) | ||
- : Returns a new iterable [`Segments`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segments) instance | ||
representing the segments of a string according to the locale and granularity of this `Intl.Segmenter` instance. | ||
|
||
## Examples | ||
|
||
### Basic usage and difference from String.prototype.split() | ||
|
||
If we were to use [`String.prototype.split(" ")`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split) to segment a text in words, we would not get the correct result if the locale of the text does not use whitespaces between words (which is the case for Japanese, Chinese, Thai, Lao, Khmer, Myanmar, etc.). | ||
|
||
```js example-bad | ||
const str = "吾輩は猫である。名前はたぬき。"; | ||
console.table(str.split(" ")); | ||
// ['吾輩は猫である。名前はたぬき。'] | ||
// The two sentences are not correctly segmented. | ||
|
||
``` | ||
|
||
```js example-good | ||
const str = "吾輩は猫である。名前はたぬき。"; | ||
const segmenterJa = new Intl.Segmenter('ja-JP', { granularity: 'word' }); | ||
|
||
const segments = segmenterJa.segment(str); | ||
console.table(Array.from(segments)); | ||
// [{segment: '吾輩', index: 0, input: '吾輩は猫である。名前はたぬき。', isWordLike: true}, | ||
// etc. | ||
// ] | ||
``` | ||
|
||
## Specifications | ||
|
||
{{Specifications}} | ||
|
||
## Browser compatibility | ||
|
||
{{Compat}} |
76 changes: 75 additions & 1 deletion
76
...web/javascript/reference/global_objects/intl/segmenter/resolvedoptions/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,81 @@ | ||
--- | ||
title: Intl.Segmenter.prototype.resolvedOptions() | ||
slug: Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/resolvedOptions | ||
tags: | ||
- Internationalization | ||
- Intl | ||
- JavaScript | ||
- Localization | ||
- Reference | ||
browser-compat: javascript.builtins.Intl.Segmenter.resolvedOptions | ||
--- | ||
{{JSRef}} | ||
|
||
Returns a new object with properties reflecting the locale and granularity options computed during initialization of this [`Intl.Segmenter`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) object. | ||
The **`Intl.Segmenter.prototype.resolvedOptions()`** method returns a new object with properties reflecting the locale and granularity options computed during the initialization of this [`Intl.Segmenter`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) object. | ||
|
||
## Syntax | ||
|
||
```js | ||
resolvedOptions() | ||
``` | ||
|
||
### Parameters | ||
|
||
None. | ||
|
||
### Return value | ||
|
||
A new object with properties reflecting the locale and collation options computed | ||
during the initialization of the given [`Intl.Segmenter`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) object. | ||
|
||
## Description | ||
|
||
The resulting object has the following properties: | ||
|
||
- `locale` | ||
- : The BCP 47 language tag for the locale actually used. If any Unicode extension | ||
values were requested in the input BCP 47 language tag that led to this locale, | ||
the key-value pairs that were requested and are supported for this locale are | ||
included in `locale`. | ||
- `granularity` | ||
- : The value provided for this property in the `options` argument or filled | ||
in as the default. | ||
|
||
## Examples | ||
|
||
### Basic usage | ||
|
||
```js | ||
const spanishSegmenter = new Intl.Segmenter("es", {granularity: "sentence"}); | ||
const options = spanishSegmenter.resolvedOptions(); | ||
console.log(options.locale); // "es" | ||
console.log(options.granularity); // "sentence" | ||
``` | ||
|
||
### Default granularity | ||
|
||
```js | ||
const spanishSegmenter = new Intl.Segmenter("es"); | ||
const options = spanishSegmenter.resolvedOptions(); | ||
console.log(options.locale); // "es" | ||
console.log(options.granularity); // "grapheme" | ||
``` | ||
|
||
### Fallback locale | ||
|
||
```js | ||
const banSegmenter = new Intl.Segmenter("ban"); | ||
const options = banSegmenter.resolvedOptions(); | ||
console.log(options.locale); | ||
// "fr" on a runtime where the Balinese locale | ||
// is not supported and French is the default locale | ||
console.log(options.granularity); // "grapheme" | ||
``` | ||
|
||
## Specifications | ||
|
||
{{Specifications}} | ||
|
||
## Browser compatibility | ||
|
||
{{Compat}} |
64 changes: 63 additions & 1 deletion
64
...s/en-us/web/javascript/reference/global_objects/intl/segmenter/segment/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,69 @@ | ||
--- | ||
title: Intl.Segmenter.prototype.segment() | ||
slug: Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/segment | ||
tags: | ||
- Internationalization | ||
- Intl | ||
- JavaScript | ||
- Localization | ||
- Reference | ||
browser-compat: javascript.builtins.Intl.Segmenter.segment | ||
--- | ||
{{JSRef}} | ||
|
||
Getter function that segments a string according to the locale and granularity of this [`Intl.Segmenter`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) object. | ||
The **`Intl.Segmenter.prototype.segment()`** method segments a string according to the locale and granularity of this [`Intl.Segmenter`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) object. | ||
|
||
## Syntax | ||
|
||
```js | ||
segment(input) | ||
``` | ||
|
||
### Parameters | ||
|
||
- `input` | ||
- : The text to be segmented as a [`String`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String). | ||
|
||
### Return value | ||
|
||
A new iterable [`Segments`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segments) object containing the segments of the input string, using the segmenter's locale and granularity. | ||
|
||
## Examples | ||
|
||
```js | ||
// Create a locale-specific word segmenter | ||
const segmenter = new Intl.Segmenter("fr", {granularity: "word"}); | ||
|
||
// Use it to get an iterator over the segments of a string | ||
const input = "Moi ? N'est-ce pas ?"; | ||
const segments = segmenter.segment(input); | ||
|
||
// Use that for segmentation | ||
for (const {segment, index, isWordLike} of segments) { | ||
console.log("segment at code units [%d, %d]: «%s»%s", | ||
index, index + segment.length, | ||
segment, | ||
isWordLike ? " (word-like)" : "" | ||
); | ||
} | ||
// logs | ||
// segment at code units [0, 3]: «Moi» (word-like) | ||
// segment at code units [3, 4]: « » | ||
// segment at code units [4, 5]: «?» | ||
// segment at code units [5, 6]: « » | ||
// segment at code units [6, 11]: «N'est» (word-like) | ||
// segment at code units [11, 12]: «-» | ||
// segment at code units [12, 14]: «ce» (word-like) | ||
// segment at code units [14, 15]: « » | ||
// segment at code units [15, 18]: «pas» (word-like) | ||
// segment at code units [18, 19]: « » | ||
// segment at code units [19, 20]: «?» | ||
``` | ||
|
||
## Specifications | ||
|
||
{{Specifications}} | ||
|
||
## Browser compatibility | ||
|
||
{{Compat}} |
5 changes: 0 additions & 5 deletions
5
...-us/web/javascript/reference/global_objects/intl/segmenter/segmentdata/index.md
This file was deleted.
Oops, something went wrong.
70 changes: 70 additions & 0 deletions
70
...en-us/web/javascript/reference/global_objects/intl/segmenter/segmenter/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
--- | ||
title: Intl.Segmenter() constructor | ||
slug: Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/Segmenter | ||
tags: | ||
- Constructor | ||
- Segmenter | ||
- Internationalization | ||
- Intl | ||
- JavaScript | ||
- Localization | ||
- Reference | ||
browser-compat: javascript.builtins.Intl.Segmenter.constructor | ||
--- | ||
|
||
The **`Intl.Segmenter()`** constructor creates [`Intl.Segmenter`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) objects that enable locale-sensitive text segmentation. | ||
|
||
## Syntax | ||
|
||
```js | ||
new Intl.Segmenter() | ||
new Intl.Segmenter(locales) | ||
new Intl.Segmenter(locales, options) | ||
``` | ||
|
||
### Parameters | ||
|
||
- `locales` {{ optional_inline }} | ||
- : A string with a BCP 47 language tag, or an array of such strings. For the general form and interpretation of the `locales` argument, see the [`Intl`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl#locale_identification_and_negotiation) page. | ||
- `options` {{ optional_inline }} | ||
- : An object with some or all of the following properties: | ||
- `granularity` {{ optional_inline }} | ||
- : A string. Possible values are: | ||
- `"grapheme"` (default) | ||
- : Split the input into segments at grapheme cluster (user-perceived character) boundaries, as determined by the locale. | ||
- `"word"` | ||
- : Split the input into segments at word boundaries, as determined by the locale. | ||
- `"sentence"` | ||
- : Split the input into segments at sentence boundaries, as determined by the locale. | ||
- `localeMatcher` {{ optional_inline }} | ||
- : The locale matching algorithm to use. Possible values are: | ||
- `"best fit"` (default) | ||
- : The runtime may choose a possibly more suited locale than the result of the lookup algorithm. | ||
- `"lookup"` | ||
- : Use the [BCP 47 Lookup algorithm](https://datatracker.ietf.org/doc/html/rfc4647#section-3.4) to choose the locale from `locales`. For each locale in `locales`, the runtime returns the first supported locale (possibly removing restricting subtags of the provided locale tag to find such a supported locale. In other words providing `"de-CH"` as `locales` may result in using `"de"` if `"de"` is supported but `"de-CH"` is not). | ||
|
||
|
||
### Return value | ||
|
||
A new [`Intl.Segments`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segments) instance. | ||
|
||
## Examples | ||
|
||
### Basic usage | ||
|
||
The following example shows how to count words in a string using the Japanese language (where splitting the string using `String` methods would have given an incorrect result). | ||
|
||
```js | ||
const text = "吾輩は猫である。名前はたぬき。"; | ||
const japaneseSegmenter = new Intl.Segmenter("ja-JP", {granularity: "word"}); | ||
console.log([...japaneseSegmenter.segment(text)].filter(segment => segment.isWordLike).length); | ||
// logs 8 as the text is segmented as '吾輩'|'は'|'猫'|'で'|'ある'|'。'|'名前'|'は'|'たぬき'|'。' | ||
``` | ||
|
||
## Specifications | ||
|
||
{{Specifications}} | ||
|
||
## Browser compatibility | ||
|
||
{{Compat}} |
5 changes: 0 additions & 5 deletions
5
...javascript/reference/global_objects/intl/segmenter/segments/@@iterator/index.md
This file was deleted.
Oops, something went wrong.
5 changes: 0 additions & 5 deletions
5
...javascript/reference/global_objects/intl/segmenter/segments/containing/index.md
This file was deleted.
Oops, something went wrong.
5 changes: 0 additions & 5 deletions
5
.../en-us/web/javascript/reference/global_objects/intl/segmenter/segments/index.md
This file was deleted.
Oops, something went wrong.
61 changes: 60 additions & 1 deletion
61
.../javascript/reference/global_objects/intl/segmenter/supportedlocalesof/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,66 @@ | ||
--- | ||
title: Intl.Segmenter.supportedLocalesOf() | ||
slug: Web/JavaScript/Reference/Global_Objects/Intl/Segmenter/supportedLocalesOf | ||
tags: | ||
- Internationalization | ||
- Intl | ||
- JavaScript | ||
- Localization | ||
- Reference | ||
browser-compat: javascript.builtins.Intl.Segmenter.supportedLocalesOf | ||
--- | ||
{{JSRef}} | ||
|
||
Returns an array containing those of the provided locales that are supported without having to fall back to the runtime's default locale. | ||
The **`Intl.Segmenter.supportedLocalesOf()`** method returns an array containing those of the provided locales that are supported without having to fall back to the runtime's default locale. | ||
|
||
## Syntax | ||
|
||
```js | ||
supportedLocalesOf(locales) | ||
supportedLocalesOf(locales, options) | ||
``` | ||
|
||
### Parameters | ||
|
||
- `locales` | ||
- : A string with a BCP 47 language tag, or an array of such strings. For the general | ||
form of the `locales` argument, see the [`Intl`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl#locale_identification_and_negotiation) page. | ||
- `options` {{optional_inline}} | ||
- : An object that may have the following property: | ||
- `localeMatcher` | ||
- : The locale matching algorithm to use. Possible values are | ||
"`lookup`" and "`best fit`"; the default is | ||
"`best fit`". For information about this option, see the [`Intl`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl#locale_negotiation) page. | ||
|
||
### Return value | ||
|
||
An array of strings representing a subset of the given locale tags that are supported | ||
in segmentation without having to fall back to the runtime's default locale. | ||
|
||
## Examples | ||
|
||
### Using supportedLocalesOf() | ||
|
||
Assuming a runtime that supports Indonesian and German but not Balinese in list | ||
formatting, `supportedLocalesOf` returns the Indonesian and German language | ||
tags unchanged, even though `pinyin` collation is neither relevant to segmentation | ||
nor used with Indonesian, and a specialized German for Indonesia is | ||
unlikely to be supported. Note the specification of the "`lookup`" | ||
algorithm here — a "`best fit`" matcher might decide that Indonesian is an | ||
adequate match for Balinese since most Balinese speakers also understand Indonesian, | ||
and therefore return the Balinese language tag as well. | ||
|
||
```js | ||
const locales = ['ban', 'id-u-co-pinyin', 'de-ID']; | ||
const options = { localeMatcher: 'lookup' }; | ||
console.log(Intl.Segmenter.supportedLocalesOf(locales, options).join(', ')); | ||
// → "id-u-co-pinyin, de-ID" | ||
``` | ||
|
||
## Specifications | ||
|
||
{{Specifications}} | ||
|
||
## Browser compatibility | ||
|
||
{{Compat}} |
Oops, something went wrong.