ids/9/248.txt


*1.  Introduction
**1.1.  Overview and Motivation
>   A Uniform Resource Identifier (URI) is defined in [RFC3986] as a
sequence of characters chosen from a limited subset of the repertoire
of US-ASCII [ASCII] characters.

統一資源識別子 ([[URI]]) は [[RFC 3986]] において [CODE(charset)[[[US-ASCII]]]]
[[文字]]の[[レパートリ]]の限られた[[部分集合]]から選ばれた[[文字]]の列として定義されています。

>   The characters in URIs are frequently used for representing words of
natural languages.  This usage has many advantages: Such URIs are
easier to memorize, easier to interpret, easier to transcribe, easier
to create, and easier to guess.  For most languages other than
English, however, the natural script uses characters other than A -
Z. For many people, handling Latin characters is as difficult as
handling the characters of other scripts is for those who use only
the Latin alphabet.  Many languages with non-Latin scripts are
transcribed with Latin letters.  These transcriptions are now often
used in URIs, but they introduce additional ambiguities.

[[URI]] の[[文字]]はよく[[自然言語]]の[[語]]を表現するために使われます。
この用法は、覚えやすい、分かりやすい、転写しやすい、
作りやすい、推定しやすいなど多くの利点があります。しかし[[英語]]以外のほとんどの[[言語]]では、
自然な表記では [CODE(char)[[[A]]]]〜[CODE(char)[[[Z]]]]
以外の[[文字]]を使います。多くの人々にとって[[ラテン文字]]を扱うのは[[ラテン字母]]だけを使う人が他の種類の[[文字]]を扱うのと同じくらい難しいことです。
多くの非[[ラテン文字]]を使う[[言語]]は[[ラテン文字]]に転写されます。
この転写はいま [[URI]] でよく使われていますが、
より曖昧さを増させています。

>   The infrastructure for the appropriate handling of characters from
local scripts is now widely deployed in local versions of operating
system and application software.  Software that can handle a wide
variety of scripts and languages at the same time is increasingly
common.  Also, increasing numbers of protocols and formats can carry
a wide range of characters.

局所[[用字系]]の[[文字]]を適切に扱う基盤がいま局所版[[オペレーティング・システム]]や[[応用ソフトウェア]]で広く採用されています。
広範囲の[[用字系]]や[[言語]]を同時に扱えるソフトウェアも益々増えています。
また、広範囲の[[文字]]を伝播できるプロトコルや書式も増えています。

>   This document defines a new protocol element called Internationalized
Resource Identifier (IRI) by extending the syntax of URIs to a much
wider repertoire of characters.  It also defines "internationalized"
versions corresponding to other constructs from [RFC3986], such as
URI references.  The syntax of IRIs is defined in section 2, and the
relationship between IRIs and URIs in section 3.

この[[文書]]は [[URI]] の構文をより広い[[文字]][[レパートリ]]に拡張した国際化資源識別子
([[IRI]]) という新しい[[プロトコル要素]]を定義します。また、
[[RFC 3986]] の [[URI参照]]などの他の構造に対応する[Q[国際化]]版も定義します。
[[IRI]] の構文は2章で定義しており、 [[IRI]] と [[URI]]
の関係は3章で定義しています。

>   Using characters outside of A - Z in IRIs brings some difficulties.
Section 4 discusses the special case of bidirectional IRIs, section 5
various forms of equivalence between IRIs, and section 6 the use of
IRIs in different situations.  Section 7 gives additional informative
guidelines, and section 8 security considerations.

[[IRI]] で [CODE(char)[A]]〜[CODE(char)[Z]] 以外の[[文字]]を使うことによって困ったことも起きます。
4章は双方向的 [[IRI]] の特殊な場合を議論し、
5章は [[IRI]] の色々な[[等価]]形を議論し、
6章は色々な場面での [[IRI]] の使用について扱います。
7章は更に参考となる指針を示し、
8章は安全性について考えます。

**1.2.  Applicability
>   IRIs are designed to be compatible with recommendations for new URI
schemes [RFC2718].  The compatibility is provided by specifying a
well-defined and deterministic mapping from the IRI character
sequence to the functionally equivalent URI character sequence.
Practical use of IRIs (or IRI references) in place of URIs (or URI
references) depends on the following conditions being met:

[[IRI]] は新しい [[URI scheme]] に関する推奨と互換になるよう設計されています。
この互換性は [[IRI]] [[文字列]]から機能的に[[等価]]な [[URI]]
[[文字列]]への良く定義された決定的な[[写像]]を規定することによって提供されます。
実際に [[IRI]] (や [[IRI参照]]) を [[URI]] (や [[URI参照]])
の代わりに使うかどうかは次の条件が満たされるかどうかによります。

>   a.  A protocol or format element should be explicitly designated to
be able to carry IRIs.  The intent is not to introduce IRIs into
contexts that are not defined to accept them.  For example, XML
schema [XMLSchema] has an explicit type "anyURI" that includes
IRIs and IRI references. Therefore, IRIs and IRI references can
be in attributes and elements of type "anyURI".  On the other
hand, in the HTTP protocol [RFC2616], the Request URI is defined
as a URI, which means that direct use of IRIs is not allowed in HTTP requests.

[[プロトコル]]や書式は明示的に [[IRI]] を伝播できると述べるべきです。
[[IRI]] を認めると定義されていない場所に [[IRI]] を入れないということです。
例えば、 [[XML Schema]] は [CODE(XML)[[[anyURI]]]]
という [[IRI]] と [[IRI参照]]を含む型を特に持っています。
ですから、型が [CODE(XML)[[[anyURI]]]] の[[属性]]や[[要素]]では
[[IRI]] と [[IRI参照]]を使うことができます。しかし、 [[HTTP]]
では [CODE(ABNF)[[[Request-URI]]]] は [[URI]] として定義されており、
[[IRI]] は [[HTTP]] [[要求]]で認められないことを意味します。

>   b.  The protocol or format carrying the IRIs should have a mechanism
to represent the wide range of characters used in IRIs, either
natively or by some protocol- or format-specific escaping
mechanism (for example, numeric character references in [XML1]).

[[IRI]] を伝播する[[プロトコル]]や書式は [[IRI]]
で使われる広範囲の[[文字]]を生で直接的に、
または何らかの[[プロトコル]]や書式が規定する[[逃避]]の仕組み
(例えば [[XML]] の[[数値文字参照]]) で表現する仕組みを持つべきです。

>   c.  The URI corresponding to the IRI in question has to encode
original characters into octets using UTF-8.  For new URI
schemes, this is recommended in [RFC2718].  It can apply to a
whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384],
or the URN syntax [RFC2141]).  It can apply to a specific part of
a URI, such as the fragment identifier (e.g., [XPointer]).  It
can apply to a specific URI or part(s) thereof.  For details,
please see section 6.4.

当該 [[IRI]] に対応する [[URI]] が元の[[文字]]を [[UTF-8]]
を使って[[オクテット]]に[[符号化]]していなければなりません。
新しい [[URI scheme]] ではこれが [[RFC 2718]] で推奨されています。
これは scheme 全体に適用できます (例えば [[IMAP]] [[URL]]
や [[POP]] [[URL]] や [[URN]] 構文)。 [[URI]] の特定の部分のみ、
例えば[[素片識別子]]のみに適用することもできます
(例えば [[XPointer]])。特定の [[URI]] やその部分に適用することもできます。
詳細は6.4節をご覧下さい。

**1.3.  Definitions
>   The following definitions are used in this document; they follow the
terms in [RFC2130], [RFC2277], and [ISO10646].

次の定義をこの[[文書]]で使います。 [[RFC 2130]], [[RFC 2277]],
[[ISO/IEC 10646]] の用語に従っています。

>
:   character: A member of a set of elements used for the organization,
control, or representation of data.  For example, "LATIN CAPITAL
LETTER A" names a character.

:[[文字]]:[[データ]]の組織化、[[制御]]、[[表現]]に使う[[要素]]の[[集合]]の一員。
例えば [CODE(char)[[[LATIN CAPITAL LETTER A]]]] は[[文字]]の[[名前]]です。

>
:   octet: An ordered sequence of eight bits considered as a unit.

:[[オクテット]]:[[単位]]と考える8[[ビット]]の順序列。

>
:   character repertoire: A set of characters (in the mathematical sense).

:[[文字レパートリ]]:[[文字]]の[[集合]] (数学的意味で)。

>
:   sequence of characters: A sequence of characters (one after another).

:[[文字列]]:[[文字]]の[[列]] (ある[[文字]]の後に次の[[文字]])。

>
:   sequence of octets: A sequence of octets (one after another).

:[[オクテット列]]:[[オクテット]]の[[列]] 
(ある[[オクテット]]の後に次の[[オクテット]])。

>
:   character encoding: A method of representing a sequence of characters
as a sequence of octets (maybe with variants).  Also, a method of
(unambiguously) converting a sequence of octets into a sequence of characters.

:[[文字符号化]]:[[文字列]]を[[オクテット列]]として[[表現]]する方法
(複数たり得ます)。また、[[オクテット列]]を[[文字列]]に (曖昧なく)
変換する方法。


>
:   charset: The name of a parameter or attribute used to identify a
character encoding.

:[[charset]]:[[文字符号化]]を識別するために使う[[引数]]や[[属性]]の[[名前]]。

>
:   UCS: Universal Character Set. The coded character set defined by
ISO/IEC 10646 [ISO10646] and the Unicode Standard [UNIV4].

[[普遍文字集合]]。 [[ISO/IEC 10646]] と [[Unicode規格]]で定義された[[符号化文字集合]]。

>
:   IRI reference: Denotes the common usage of an Internationalized
Resource Identifier.  An IRI reference may be absolute or
relative.  However, the "IRI" that results from such a reference
only includes absolute IRIs; any relative IRI references are
resolved to their absolute form.  Note that in [RFC2396] URIs did
not include fragment identifiers, but in [RFC3986] fragment
identifiers are part of URIs.

:[[IRI参照]]:国際化資源識別子の共通な用法を指します。
[[IRI参照]]は絶対でも相対でも構いません。しかし、 [[IRI参照]]から得られる
[Q[[[IRI]]]] は必ず[[絶対IRI]] です。[[相対IRI参照]]は絶対形に解決されます。
[[RFC 2396]] で [[URI]] は[[素片識別子]]を含みませんでしたが、
[[RFC 3986]] では[[素片識別子]]も [[URI]] の一部であることに注意して下さい。

>
:   running text: Human text (paragraphs, sentences, phrases) with syntax
according to orthographic conventions of a natural language, as
opposed to syntax defined for ease of processing by machines
(e.g., markup, programming languages).

:普通の文章:[[機械]]が処理しやすいように定義された構文
(例えば[[マーク付け言語]]や[[プログラム言語]]) に対して、
[[自然言語]]の辞書的規則に従った構文によって書かれた人間の文章
([[段落]]、[[文]]、[[語句]])。

>
:   protocol element: Any portion of a message that affects processing of
that message by the protocol in question.

:[[プロトコル要素]]:当該[[プロトコル]]の[[メッセージ]]の処理に影響を与える[[メッセージ]]の部分。

>
:   presentation element: A presentation form corresponding to a protocol
element; for example, using a wider range of characters.

:[[表現要素]]:[[プロトコル要素]]に対応する[[表現形]]。
例えば、広範囲の[[文字]]を使用します。

>
:   create (a URI or IRI): With respect to URIs and IRIs, the term is
used for the initial creation.  This may be the initial creation
of a resource with a certain identifier, or the initial exposition
of a resource under a particular identifier.

:([[URI]] や [[IRI]] の) [[作成]]:
[[URI]] や [[IRI]] に関しては、最初の[[作成]]の時にこの用語を使います。
これはある[[識別子]]の[[資源]]の最初の[[作成]]かもしれませんし、
特定の[[識別子]]に[[資源]]を最初に結び付けることかもしれません。

>
:   generate (a URI or IRI): With respect to URIs and IRIs, the term is
used when the IRI is generated by derivation from other information.

:([[URI]] や [[IRI]] の) [[生成]]:
[[URI]] や [[IRI]] に関しては、 [[IRI]]
が他の情報から派生して[[生成]]される時にこの用語を使います。

**1.4.  Notation
>   RFCs and Internet Drafts currently do not allow any characters
outside the US-ASCII repertoire.  Therefore, this document uses
various special notations to denote such characters in examples.

[[RFC]] と [[Internet Draft]] は現在 [CODE(charset)[[[US-ASCII]]]]
[[レパートリ]]の範囲外の[[文字]]を認めていません。従って、
この文書は例示で [CODE(charset)[[[US-ASCII]]]]
の範囲外の[[文字]]を示すために特別な表記法を使います。

>   In text, characters outside US-ASCII are sometimes referenced by
using a prefix of 'U+', followed by four to six hexadecimal digits.

文章中で [CODE(charset)[[[US-ASCII]]]] の範囲外の[[文字]]を接頭辞
[CODE[U+]] に続けて4〜6桁の十六進数字を使って参照することがあります。

>   To represent characters outside US-ASCII in examples, this document
uses two notations: 'XML Notation' and 'Bidi Notation'.

例示中で [CODE(charset)[[[US-ASCII]]]] の範囲外の[[文字]]を表現するためにこの文書では
[Q[[[XML]] 表記法]]と [Q[Bidi 表記法]]の2つの表記法を使います。

>   XML Notation uses a leading '&#x', a trailing ';', and the
hexadecimal number of the character in the UCS in between.  For
example, &#x44F; stands for CYRILLIC CAPITAL LETTER YA.  In this
notation, an actual '&' is denoted by '&amp;'.

[[XML]] 表記法は最初に [CODE(XML)[&#x]], 最後に [CODE(XML)[;]]
を付け、その間に [[UCS]] における[[文字]]の十六進番号を挟みます。
例えば [SAMP(XML)[&#x44F;]] は [CODE(char)[[[CYRILLIC CAPITAL LETTER YA]]]]
を表します。この表記法では実際の [CODE(char)[&]] は
[CODE(XML)[[[&amp;]]]] と示します。

>   Bidi Notation is used for bidirectional examples: Lowercase letters
stand for Latin letters or other letters that are written left to
right, whereas uppercase letters represent Arabic or Hebrew letters
that are written right to left.

Bidi 表記法は[[双方向]]的な例示で使います。
[[小文字]]は[[ラテン文字]]やその他の[[左]]から[[右]]に書く[[文字]]を表し、
[[大文字]]は[[アラビア文字]]や[[ヘブライ文字]]のように[[右]]から[[左]]に書く[[文字]]を表します。

>   To denote actual octets in examples (as opposed to percent-encoded
octets), the two hex digits denoting the octet are enclosed in "<"
and ">".  For example, the octet often denoted as 0xc9 is denoted
here as <c9>.

例示中で ([[百分率符号化]]した[[オクテット]]に対して) 
実際の[[オクテット]]を示すため2桁の[[オクテット]]を示す[[数字]]を
[CODE(char)[<]] と [CODE(char)[>]] で囲って示します。
例えば、よく [SAMP[0xc9]] と書かれる[[オクテット]]はここでは
[SAMP[<c9>]] と示します。

>   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY",
and "OPTIONAL" are to be interpreted as described in [RFC2119].

この文書では、[[鍵語]] [Q['''しなければなりません''']],
[Q['''してはなりません''']], [Q['''必須''']],
[Q['''するべきです''']], [Q['''するべきではありません''']],
[Q['''推奨します''']], [Q['''して構いません''']],
[Q['''任意選択''']] は [[RFC 2119]] で説明されているように解釈します。

* メモ