Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: 8190f56b42
Fetching contributors…

Cannot retrieve contributors at this time

707 lines (608 sloc) 28.372 kb
<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
which is available here: -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
There has to be one entity for each item to be referenced.
An alternate method (rfc include) is described in the references. -->
<!ENTITY I-D.narten-iana-considerations-rfc2434bis SYSTEM "">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs),
please see -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
(Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space
(using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std" docName="draft-moore-text-diff-00" ipr="trust200902">
<!-- category values: std, bcp, info, exp, and historic
ipr values: full3667, noModification3667, noDerivatives3667
you can add the attributes updates="NNNN" and obsoletes="NNNN"
they will automatically be output with "(if approved)" -->
<!-- ***** FRONT MATTER ***** -->
<!-- The abbreviated title is used in the page header - it is only necessary if the
full title is longer than 39 characters -->
<title abbrev="Abbreviated Title">The text/diff Media Type</title>
<!-- add 'role="editor"' below for the editors if appropriate -->
<!-- Another author who claims to be an editor -->
<author fullname="Jonathan Moore" initials="J.M."
<!-- Reorder these if your country does things differently -->
<!-- uri and facsimile elements may also be added -->
<date year="2012" />
<!-- If the month and year are both specified and are the current ones, xml2rfc will fill
in the current day for you. If only the current year is specified, xml2rfc will fill
in the current day and month for you. If the year is not the current one, it is
necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the
purpose of calculating the expiry date). With drafts it is normally sufficient to
specify just the year. -->
<!-- Meta-data Declarations -->
<workgroup>Internet Engineering Task Force</workgroup>
<!-- WG name at the upperleft corner of the doc,
IETF is fine for individual submissions.
If this element is not present, the default is "Network Working Group",
which is used by the RFC Editor as a nod to the history of the IETF. -->
<keyword>media types, text</keyword>
<!-- Keywords will be incorporated into HTML output
files in a meta tag but they have no effect on text or nroff
output. If you submit your draft to the RFC Editor, the
keywords will be used for the search engine. -->
<t>The &quot;unified diff format&quot; is commonly used to
represent changes from one version of a file to another or to
represent a comparison of two files. This document captures
the definition of the unified diff format as the text/diff
media type for the purpose of registering this format formally
in the IANA standards tree for media types.
<section title="Introduction">
<t>The &quot;unified diff format&quot; is a commonly-used file
format typically used for exchanging patch changesets to a set
of documents--most notably, source code trees--or for comparing
the contents of two distinct files. This format is generated
and understood by a variety of open source tools including diff,
patch, svn, and git. However, at the time of this writing, no
definitive, formal definition of this file format exists,
although there are numerous informal definitions,
including <xref target="WIKIPEDIA">a Wikipedia article about
diff</xref>, a <xref target="VANROSSUM">blog post</xref> by
Guido van Rossum, and the <xref target="GNU">documentation for
GNU diffutils</xref>.</t>
<t>At the same time, the HTTP PATCH method has been standardized
as <xref target="RFC5789">RFC 5789</xref>, for which this file
format would be an ideal media type. There are additionally
other Internet-based services that trade regularly in patches or
diffs that might benefit specifically from having participants
explicitly declare that exchanged patchsets conform to a
specific file format.
<t>Therefore, the goals of this document are first, to record a
formal definition of the file format for reference purposes, and
second, to request the IANA registration of a standard media
type for this format.</t>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
document are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
<section title="Definition">
<t>As a member of the &quot;text/*&quot; media type family, the
text/diff format follows the requirements for textual messages laid
out in <xref target="RFC2046">RFC 2046</xref>, including the use
of a &quot;charset&quot; parameter with a default value of US-ASCII
and the use of CRLF as a linebreak.
<section title="Permitted Character Sets">
<t>The unified diff format relies on the use of certain US-ASCII
characters to delineate change hunks and context.
Therefore, the value of the &quot;charset&quot; parameter MUST
reference a character set which is a superset of US-ASCII. Permitted
character sets specifically include:
<list style="symbols">
<t>those in the ISO-8859-* family</t>
Other character sets that are supersets of US-ASCII MAY be specified
as values of the &quot;charset&quot; parameter. If the parameter
is omitted then a default value of US-ASCII MAY be assumed.
<t>Generally speaking, this format is useful for describing
changesets against files whose original and modified character
set encodings are both supersets of US-ASCII. Implementations
generating text/diff SHOULD follow the guidelines
of <xref target="RFC2046">RFC 2046</xref> and use the
&quot;lowest common denominator&quot; for the
&quot;charset&quot; parameter value. For example, it is
possible that the original and modified versions of a file
might use the ISO-8859-1 character set, but the changed
portions only use US-ASCII characters: in this case the
resulting diff should be marked as using the &quot;us-ascii&quot;
character set. An implementation MUST NOT include characters
in a text/diff file that are not compatible with the value
of the &quot;charset&quot; parameter.
<t>The requirement to have a single character set for a text/diff
file means that there are certain change operations this format
cannot express, for example the conversion of a file from
ISO-8859-1 to UTF-8 where non-US-ASCII characters are used in both
the old and new versions. In particular, this means that not all
output of the common UN*X diff utility--even when generating the
unified diff format--will be a valid text/diff file. This media
type is aimed instead at the very common case where the character
sets of the old and new file are compatible with these requirements
and can thus be treated as a text/* type.</t>
<section title="Diff Contexts">
<t>Diffs are typically generated from a specific context;
for example, from a specific directory on a file system,
or based against a particular version of a source control
repository. Diffs are likewise applied against a specific
context, and often this context is different than the
generation context. For example, the diff may be applied
against a different directory on the same machine, or
against a directory on an entirely different filesystem,
or against a different branch of a codebase. Indeed, the
main purpose of diffs are to make a portable description
of a set of document changes.
The intended application context for diffs is carried
out-of-band from the diffs themselves in common usage, and
in fact is sometimes completely implicit, as with patchsets
sent on a developer mailing list for an open source project.
As such, this media type description will reference a
particular context without specifying it explicitly; instead,
a recipient fixes the context when attempting to apply the
diff to that context. Insofar as the same diff can be
applied to different contexts--albeit to varying degrees of success and to varying results--the meaning of the diff itself is not dependent
on a particular context.
For the remainder of this specification, we will assume that
this &quot;diff context&quot; can be specified as
a <xref target="RFC3986">URI</xref>.
<section title="Grammar">
<t>Formal syntax in this document is governed by
the <xref target="RFC5234">ABNF (Augmented Backus-Naur Form)
notation</xref>, including the following core ABNF syntax
rules defined by that specification: DIGIT (decimal digits),
HTAB (horizontal tab), SP (space), CR (carriage return), and
LF (linefeed). This syntax also incorporates by reference the
&quot;URI-reference&quot; ABNF production
from <xref target="RFC3986">RFC 3986</xref>.
<t>A &quot;diff file&quot; captures a set of instructions
describing a particular set of document changes, often known
as a &quot;patchset&quot; or a &quot;changeset&quot;. It consists
of one or more &quot;document diffs&quot;, each document diff describing changes
to a single document. A document diff consists of a &quot;header&quot;
identifying the target document, usually relative to the overall diff
context, plus one or more &quot;change hunks&quot; or &quot;chunks&quot;
specifying additions, deletions, or edits of particular lines in the
target document.
<artwork type="abnf">
year = 4DIGIT
month = 2DIGIT
day = 2DIGIT
date = year "-" month "-" day
hour = 2DIGIT
minute = 2DIGIT
second = 2DIGIT
nanosecond = 9DIGIT
time = hour ":" minute ":" second [ "." nanosecond ]
timezone = ("+" / "-") hour minute
resource-modification = URI-reference HTAB date SP time SP timezone
old-indicator = "---"
new-indicator = "+++"
old-file-header = old-indicator SP resource-modification
new-file-header = new-indicator SP resource-modification
file-headers = old-file-header CRLF new-file-header CRLF
<t>After the header, a diff file contains a set of chunks. Each chunk begins with a header identifying the relevant selection of lines from the old and new files.</t>
<artwork type="abnf">
line-number = 1*DIGIT
range = line-number [ "," line-number ]
chunk-header = "@@" SP "-" range SP "+" range SP "@@"
<t>After the chunk header, the chunk includes a set of lines that are marked as being either common, only in the old file, or only in the new file.</t>
<artwork type="abnf">
common = SP
old-only = "-"
new-only = "+"
line-affinity = common / old-only / new-only
non-linebreak = %x00-09 / %x0B-0C / %x0E-FF
non-linefeed = %x00-09 / %x0B-FF
content = non-linebreak / (CR non-linefeed)
chunk-line = line-affinity 1*content CRLF
<t>Thus, a chunk consists of a chunk header followed by at least one chunk line.</t>
<artwork type="abnf">
chunk = chunk-header CRLF 1*chunk-line
<t>Therefore, a single file diff is a header followed by at least one chunk.</t>
<artwork type="abnf">
diff = headers 1*chunk
<t>And lastly, a patch consists of a non-empty set of single
file diffs. A file matching the &quot;patch&quot; production is
a valid text/diff file.</t>
<artwork type="abnf">
patch = 1*diff
<section anchor="security_considerations" title="Security Considerations">
<section title="Out-of-scope attacks">
Since this document describes a media type and not a protocol,
network-based attacks are out of scope for this specification.
In particular, eavesdropping, replay, message insertion,
deletion, modification, and man-in-the-middle attacks are the
concern of the containing or transporting protocol and not of
the media type itself. This specification makes no claims
about and does not provide facilities for authentication,
authorization, file integrity, or non-repudiation.</t>
<section title="In-scope attacks">
<t>As a subtype of the &quot;text/*&quot; media type, text/diff
shares many of the security concerns related to processing any
arbitrary text body. In particular, implementators SHOULD be
aware that this specification does not impose a maximum line
length, and that implementations that assume a fixed maximum
line length may be subject to buffer overrun attacks. Implementations
should also be aware of denial-of-service or buffer overrun attacks
involving extraordinarily large text/diff files. Again, however,
file framing would be a concern of the containing or transporting
<t>In addition, text/diff files may contain control characters
that may alter an operating environment when displayed to a
terminal (or otherwise). <xref target="RFC3023">RFC 3023</xref>
contains a list of possible attacks and recommended defenses
against these attacks in its Security Considerations section.</t>
<t>The contents of a text/diff file describe a set of changes
that may be applied to a file to alter its contents, and hence
implementations that process text/diff patches SHOULD be sure
that the sender is properly authenticated and authorized
before applying patches. To the extent that patches are
automatically applied, recipients of text/diff entities may be
at risk. In general, it may be possible to specify changesets
that perform unauthorized file changes and/or make changes to
the recipient's system that may affect subsequent processing.
<t>Some security considerations will vary by application domain.
For example, text/diff patches intended to update financial
records may have more stringent privacy, security, and auditing
requirements than a publicly-disseminated file showing the
source code differences between two releases of an open source
software project.
<section anchor="IANA" title="IANA Considerations">
<t>This memo includes a request to IANA to register the text/diff
media type.</t>
<t>Type name: text</t>
<t>Subtype name: diff</t>
<t>Required parameters: none</t>
<t>Optional parameters:
<list style="empty">
<list style="empty">
<t>This parameter has identical semantics to the &quot;charset&quot; parameter of the "text/plain" media type as specified in <xref target="RFC2046">RFC 2046</xref>. Values are restricted to charsets which are supersets of US-ASCII.</t>
<t>Encoding considerations:
<list style="empty">
<t>The syntax of the text/diff format is expressed over 8-bit
octets. The encoding of the file is as specified by the charset
parameter, which defaults to US-ASCII if not provided.</t>
<t>Security considerations: see <xref target="security_considerations"/>.</t>
<t>Interoperability considerations: TBD</t>
<t>Published specification: this document.</t>
<t>Applications that use this media type: diff, patch, svn, git.</t>
<t>Additional information:
<list style="empty">
<t>Magic number(s): none</t>
<t>File extension(s): .patch, .diff</t>
<t>Macintosh file type code(s): TEXT</t>
<t>Person &amp; email address to contact for more information: authors
of this document.</t>
<t>Intended usage: COMMON</t>
<t>Restrictions on usage: none</t>
<t>Author: authors of this document.</t>
<t>Change controller: authors of this document.</t>
<section anchor="simple_list" title="Simple List">
<t>List styles: 'empty', 'symbols', 'letters', 'numbers', 'hanging',
<t><list style="symbols">
<t>First bullet</t>
<t>Second bullet</t>
</list> You can write text here as well.</t>
<section title="Figures">
<t>Figures should not exceed 69 characters wide to allow for the indent
of sections.</t>
<figure align="center" anchor="xml_happy">
<preamble>Preamble text - can be omitted or empty.</preamble>
<artwork align="left"><![CDATA[
| Use XML, be Happy :-) |
<postamble>Cross-references allowed in pre- and postamble. <xref
target="min_ref" />.</postamble>
<t>The CDATA means you don't need to escape meta-characters (especially
&lt;&nbsp;(&amp;lt;) and &amp;&nbsp;(&amp;amp;)) but is not essential.
Figures may also have a title attribute but it won't be displayed unless
there is also an anchor. White space, both horizontal and vertical, is
significant in figures even if you don't use CDATA.</t>
<!-- This PI places the pagebreak correctly (before the section title) in the text output. -->
<!-- <?rfc needLines="8" ?> -->
<!-- <section title="Subsections and Tables"> -->
<!-- <section title="A Subsection"> -->
<!-- <t>By default 3 levels of nesting show in table of contents but that -->
<!-- can be adjusted with the value of the "tocdepth" processing -->
<!-- instruction.</t> -->
<!-- </section> -->
<!-- <section title="Tables"> -->
<!-- <t>.. are very similar to figures:</t> -->
<!-- <texttable anchor="table_example" title="A Very Simple Table"> -->
<!-- <preamble>Tables use ttcol to define column headers and widths. -->
<!-- Every cell then has a "c" element for its content.</preamble> -->
<!-- <ttcol align="center">ttcol #1</ttcol> -->
<!-- <ttcol align="center">ttcol #2</ttcol> -->
<!-- <c>c #1</c> -->
<!-- <c>c #2</c> -->
<!-- <c>c #3</c> -->
<!-- <c>c #4</c> -->
<!-- <c>c #5</c> -->
<!-- <c>c #6</c> -->
<!-- <postamble>which is a very simple example.</postamble> -->
<!-- </texttable> -->
<!-- </section> -->
<!-- </section> -->
<!-- <section anchor="nested_lists" title="More about Lists"> -->
<!-- <t>Lists with 'hanging labels': the list item is indented the amount of -->
<!-- the hangIndent: <list hangIndent="8" style="hanging"> -->
<!-- <t hangText="short">With a label shorter than the hangIndent.</t> -->
<!-- <t hangText="fantastically long label">With a label longer than the -->
<!-- hangIndent.</t> -->
<!-- <t hangText="vspace_trick"><vspace blankLines="0" />Forces the new -->
<!-- item to start on a new line.</t> -->
<!-- </list></t> -->
<!-- It would be nice to see the next piece (12 lines) all on one page. -->
<!-- <?rfc needLines="12" ?> -->
<!-- <t>Simulating more than one paragraph in a list item using -->
<!-- &lt;vspace&gt;: <list style="letters"> -->
<!-- <t>First, a short item.</t> -->
<!-- <t>Second, a longer list item.<vspace blankLines="1" /> And -->
<!-- something that looks like a separate pararaph..</t> -->
<!-- </list></t> -->
<!-- <t>Simple indented paragraph using the "empty" style: <list -->
<!-- hangIndent="10" style="empty"> -->
<!-- <t>The quick, brown fox jumped over the lazy dog and lived to fool -->
<!-- many another hunter in the great wood in the west.</t> -->
<!-- </list></t> -->
<!-- <section title="Numbering Lists across Lists and Sections"> -->
<!-- <t>Numbering items continuously although they are in separate -->
<!-- &lt;list&gt; elements, maybe in separate sections using the "format" -->
<!-- style and a "counter" variable.</t> -->
<!-- <t>First list: <list counter="reqs" hangIndent="4" style="format R%d"> -->
<!-- <t>#1</t> -->
<!-- <t>#2</t> -->
<!-- <t>#3</t> -->
<!-- </list> Specify the indent explicitly so that all the items line up -->
<!-- nicely.</t> -->
<!-- <t>Second list: <list counter="reqs" hangIndent="4" style="format R%d"> -->
<!-- <t>#4</t> -->
<!-- <t>#5</t> -->
<!-- <t>#6</t> -->
<!-- </list></t> -->
<!-- </section> -->
<!-- <section title="Where the List Numbering Continues"> -->
<!-- <t>List continues here.</t> -->
<!-- <t>Third list: <list counter="reqs" hangIndent="4" style="format R%d"> -->
<!-- <t>#7</t> -->
<!-- <t>#8</t> -->
<!-- <t>#9</t> -->
<!-- <t>#10</t> -->
<!-- </list> The end of the list.</t> -->
<!-- </section> -->
<!-- </section> -->
<!-- <section anchor="codeExample" -->
<!-- title="Example of Code or MIB Module To Be Extracted"> -->
<!-- <figure> -->
<!-- <preamble>The &lt;artwork&gt; element has a number of extra attributes -->
<!-- that can be used to substitute a more aesthetically pleasing rendition -->
<!-- into HTML output while continuing to use the ASCII art version in the -->
<!-- text and nroff outputs (see the xml2rfc README for details). It also -->
<!-- has a "type" attribute. This is currently ignored except in the case -->
<!-- 'type="abnf"'. In this case the "artwork" is expected to contain a -->
<!-- piece of valid Augmented Backus-Naur Format (ABNF) grammar. This will -->
<!-- be syntax checked by xml2rfc and any errors will cause a fatal error -->
<!-- if the "strict" processing instruction is set to "yes". The ABNF will -->
<!-- also be colorized in HTML output to highlight the syntactic -->
<!-- components. Checking of additional "types" may be provided in future -->
<!-- versions of xml2rfc.</preamble> -->
<!-- <artwork><![CDATA[ -->
<!-- /**** an example C program */ -->
<!-- #include <stdio.h> -->
<!-- void -->
<!-- main(int argc, char *argv[]) -->
<!-- { -->
<!-- int i; -->
<!-- printf("program arguments are:\n"); -->
<!-- for (i = 0; i < argc; i++) { -->
<!-- printf("%d: \"%s\"\n", i, argv[i]); -->
<!-- } -->
<!-- exit(0); -->
<!-- } /* main */ -->
<!-- /* end of file */ -->
<!-- ]]></artwork> -->
<!-- </figure> -->
<!-- </section> -->
<section anchor="Acknowledgements" title="Acknowledgements">
<t>The author wishes to thank those who have done the hard work of
developing the algorithms and tools that form the de facto standard
of patchset exchange on the Internet today. In particular, he would
like to recognize Douglas McIlroy and Larry Wall for
the initial diff and patch utilities as well as the numerous
contributors that have maintained and updated them over the years.</t>
<!-- Possibly a 'Contributors' section ... -->
<!-- *****BACK MATTER ***** -->
<!-- References split into informative and normative -->
<!-- There are 2 ways to insert reference entries from the citation libraries:
1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
(for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")
Both are cited textually in the same manner: by using xref elements.
If you use the PI option, xml2rfc will, by default, try to find included files in the same
directory as the including file. You can also define the XML_LIBRARY environment variable
with a value containing a set of directories to search. These can be either in the local
filing system or remote ones accessed by http (http://domain/dir/... ).-->
<references title="Normative References">
<references title="Informative References">
<!-- Here we use entities that we defined at the beginning. -->
<!-- A reference written by by an organization not a person. -->
<reference anchor="WIKIPEDIA"
<date year="2012"/>
<reference anchor="VANROSSUM"
<title>Unified Diff Format</title>
<author surname="van Rossum" fullname="Guido van Rossum"/>
<date day="14" month="June" year="2006"/>
<reference anchor="GNU"
<title>GNU diffutils manual, Detailed Description of Unified Format</title>
<organization>GNU, Free Software Foundation</organization>
<date year="2012"/>
<section anchor="app-additional" title="Additional Stuff">
<t>This becomes an Appendix.</t>
Jump to Line
Something went wrong with that request. Please try again.