ids/42/957.txt

[1] [CITE@ja-jp[OpenType glyph processing (part 1) - Typography | Microsoft Docs]], [[alib-ms]], [TIME[2022-08-27T04:06:55.000Z]] <https://docs.microsoft.com/ja-jp/typography/develop/processing-part1>

>A run is normally a maximum of one line in length and consists of a text string formatted in a single font at a particular size, in a single direction, in a particular script and a particular language system.

[2] [[OTLS]] の処理モデルでは

- [[フォント]]変更
- [[用字系]]変更
- [[言語系]]変更
- [[書字方向]]変更
- [[行末]]

... が[[連なり]]境界になります。 [SRC[>>1]]


[3] [CITE@ja-jp[OpenType development (LEGACY INFORMATION) - Typography | [[Microsoft]] Docs]], [[nihar]], [TIME[2022-08-27T07:16:20.000Z]] <https://docs.microsoft.com/ja-jp/typography/develop/otdevinfo#unicode-script-processor>

[4] 
[[Uniscribe]] の処理モデルでは、 [[Uniscribe]] [[shaping engine]] によって[[文字列]]を

- [RUBYB[[[項目]]群][items]] - [[用字系]]と[[書字方向]]属性が同じ[[文字列]]
- [RUBYB[[[連なり]]群][runs]] - 項目群のうち continuous formatting attribute を持つ部分
- [RUBYB[[[クラスター]]群][clusters]] - [[用字系]]定義の[RUBYB[分割できない][indivisible]]文字を[RUBYB[群化][grouping]]したもの

に分割します。[SRC[>>3]]

[5] 
そして [[Uniscribe]] の[[クライアント]] ([[ライブラリー]]を呼び出す[[応用]])
は自身の持つ formatting attribute と、
[[Uniscribe]] から取得した[[項目]]の境界に基づき、
[[連なり]]を構築します。
[SRC[>>2]]


[6] 
この説明では [[Uniscribe]] 側と[[応用]]側の[[連なり]]が必ずしも同じではないとなっていて、
[[Uniscribe]] が[[連なり]]と認識していても[[応用]]がそう見なすとは限らないということになりますね。

[9] [[クラスター]]については [[shaping engine]] 参照。



[7] >>1 には

>
Obviously, in many documents, the majority of consecutive runs of text will simply equate to individual lines. In multilingual documents, however, a client may need to be able to identify and tag a number of runs within a single line of text. 

と書いてあって、著者は専門家だろうに[[用字系]]の混在がありふれてる[[日本語]]の例を[[考えていない][欧米中心主義]]ことがわかります。混在しないのが「many documents」でそれ以外が
「multilingual documents」なのですから。
この直後で [[bidi]] の実例を示してますけど、[[アラビア文字]]と[[数字]]を混在させるだけで
runs になってしまう、という[[アラビア語]]のごく普通の文書の例も[[漏れてる][欧米中心主義]]。

詳しくない一般欧米人の認識ならともかく、専門家が詳しくない人に解説するチュートリアルにこんなこと書いてるのはよくない。

[8] 
これを読んだ欧米人開発者が、ああそうなんだ多言語文書なんて滅多にないし、
a run = a line になるケースに最適化して他はパフォーマンスが落ちても仕方ないよね、
みたいな認識で設計したらどうするんか。




[10] 
[[フォント]]境界の絡みについては[[結合文字]]も参照。

[11] [[文字列座標]]

[12] 
[[Unicode]]
では[[符号点]]に[[Unicode用字系特性値]]が割り当てられていて、
[[連なり]]の検出に利用できます。
[SEE[ [[Unicode用字系特性値]] ]]


[159] 
[[shaping]] において[[連なり]]に分割してから処理されるために、
[[用字系]]が違うと
[CODE[GSUB]]
が適用されないことがあるようです。

-[13] [CITE[23073-irgn2581-fdbk.pdf]], [TIME[2023-03-23T16:06:03.000Z]], [TIME[2023-07-13T08:24:27.947Z]] <https://www.unicode.org/L2/L2023/23073-irgn2581-fdbk.pdf#page=2>

[14] 
>>13 [[IDS]] を [CODE[GSUB]] [CODE[ccmp]] で[[漢字]]として[[レンダリング]]する[[フォント]]があります。しかし
[[IDS]] が [CODE[Common]] のため、
後続の [CODE[Hani]]
と別[[用字系]]とみなされてうまくいかないことがあると言っています。

[15] >>13 は

>[SNIP[]] the feature under the DFLT script in the font
will NOT be applied to the sequences according to the specification. Of course you
can rewrite a Shaping Engine to support the features cross the different script properties,
but that actually violates the rule, and lack of standardization.

と書いてますけど、それがどの仕様書のどこに書いてある規則だとは教えてくれない...


[16] 
[[HTML]] / [[DOM]] + [[CSS]] のような[[マーク付け言語]]や[[データモデル]]の構造や[[スタイル言語]]の処理モデルと[[連なり]]の決め方の相互作用にも注意が必要です。