Skip to content

Plain-Diff: Tool Consuming Plain Diffs, Generating Unified Diffs.

License

Notifications You must be signed in to change notification settings

stvar/plain-diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

                                   Plain-Diff
                                   ~~~~~~~~~~
                        Stefan Vargyas, stvar@yahoo.com

                                  Dec 12, 2017


Table of Contents
-----------------

0. Copyright
1. The Plain-Diff Program
2. The Plain-Diff File Format
3. Use Cases of Plain-Diff
   a. Obtain brief help information from 'pdiff'
   b. Verify the correctness of given plain diff text
   c. Generate an unified diff from given plain diff text
   d. Fix up diff hunk addresses of given plain diff text
4. References


0. Copyright
============

This program is GPL-licensed free software. Its author is Stefan Vargyas. You
can redistribute it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

You should have received a copy of the GNU General Public License along with
this program (look up for the file COPYING in the top directory of the source
tree). If not, see http://gnu.org/licenses/gpl.html.


1. The Plain-Diff Program
=========================

Plain-Diff's main vocation is that of an easy to use tool for consuming plain
diff text files and producing from them unified diff files suitable as input
for the 'patch' program [1].

The plain diff file format was designed such that to very easily be writable
by hand out of source text files. Writing by hand plain diff files, one is
alleviating the tedious activity of developing manually unified diff files.

An obvious question arise: why would someone choose to write plain diff files
instead of producing patch files out of source file trees? (For example, by
using the 'diff' program [2] or 'git's [3] own diffing functionality.)

One possible answer to this question is as follows: during a session of code
coverage analysis (when the source tree is closed to changes), while one is
reviewing intensively the source code itself, it comes naturally the need for
refactoring locally some implementations. If one does not want to work back
and forth with two distinct source trees, then he may choose to note down all
refactoring changes in a plain diff text file.

Upon completing the code coverage analysis session, the plain diff text file
obtained can be used safely for to patch in the source tree all refactoring
changes collected along the way.

The Plain-Diff program consists of a single Python 2.6 script 'pdiff.py' --
which is using only standard Python modules. However, this script uses a few
OS-dependent features that ties it to Unix (i.e. POSIX) platforms.


2. The Plain-Diff File Format
=============================

By design, the format of the files accepted as input by 'pdiff' is trivially
simple. Plain diff files are text files that are line oriented, incorporating
text from source files with almost zero-editing overhead.

The adverb *almost* used above should to be understood literally: that is that
(of course, modulo my limited knowledge of these things) nearly every piece of
modern-day written source text code of mainstream programming languages can be
copied from files containing it and pasted *unchanged* into a plain diff file.

There are only very few exceptions to this nice property of plain diff files:
when a source line begins with one of the strings "\\" (the string consisting
of a sole backslash character), "///", "###", "<<<" or ">>>", the associated
plain diff text line must be escaped, i.e. the source line must be prepended
with one backslash character ('\\') for to become a plain diff line.

The strings "\\", "///", "###", "<<<" and ">>>" have to be escaped if and only
if they occur at the beginning of source lines. The plain diff format requires
these strings to be escaped since each has attached special syntactical meaning.

The string "///" marks the beginning of comment text lines. Lines starting with
the string "###" contain hash digests specifications attached to source files.
The strings "<<<" and ">>>" are diff hunks delimiters.

The complete lexical and syntactical structure of plain diff files is specified
in the file 'pdiff.g'. Simply put, a plain diff file consists of a series of
entries of the following four syntactical categories:

  --------------  ------------------------------------------------------------
   empty_line      single-line containing whitespace characters only
   comment_line    single-line entry allowed to contain any text
   file_hash       single-line entry associating hash digests to source files
   diff_hunk       multi-line entry defining an insert, a delete or a replace
                   operation applied to a named source file
  --------------  ------------------------------------------------------------

The 'file_hash' lines look like the following one (its meaning is self-evident):

  $ sed -n 26p test/json-type.pdiff
  ### lib/json-type.c:b15e0661bf664f99cbfe8156916921d1df074b12

The file 'test/json-type.pdiff' contains only replace hunks, of which the first
one looks as shown (again, its meaning is self-evident):

  $ sed -n 28,36p test/json-type.pdiff
  >>> lib/json-type.c:1400
          v = node->args[0].val;
          if ((s = JSON_AST_NODE_AS_IF_CONST(v, string)) == NULL ||
              !json_type_ruler_lookup_obj_type(s->buf.ptr, &t))
  <<<
          v = node->args[0].val;
          if (!(s = JSON_AST_NODE_AS_IF_CONST(v, string)) ||
              !json_type_ruler_lookup_obj_type(s->buf.ptr, &t))
  >>>


3. Use Cases of Plain-Diff
==========================

Plain-Diff processes an input plain diff text file, either specified by its
name provided in the invoking command line, or otherwise read in from stdin.

For convenience of use of 'pdiff.py' script, let define the shell function:

  $ export PLAIN_DIFF_HOME=...

  $ pdiff() { python "$PLAIN_DIFF_HOME"/pdiff.py "$@"; }

3.a. Obtain brief help information from 'pdiff'
-----------------------------------------------

Issuing the following command would produce succinct information about the
command line options of 'pdiff'. With only a few exceptions, the output of
'pdiff' is self-explanatory:

  $ pdiff --help
  ...

A 'pdiff' command line containing only `--dump-opts' shows which are the
default values for each of the program's options:

  $ pdiff --dump-opts
  action:       unified
  input:        <stdin>
  context:      3
  directory:    None
  dry-run:      False
  no-hash:      False
  no-text:      False
  no-adjust:    False
  no-reorder:   False
  no-hunk-out:  False
  interactive:  False
  trace-parser: False
  verbose:      False

Passing `--dump-opts' along with other options in a more elaborated 'pdiff'
command line shows the actual values received by each option of the program.

The program's input plain diff text file is specified by name by the command
line options `-i|--input'. When these options are not given, either take the
input file name to be the first non-option command line argument or, when no
such argument is present in the invoking command line, take the file content
from stdin. Note that the file name `-' has a special meaning: tells 'pdiff'
to read its input from stdin.

Plain-Diff has four main modes of operation, to each corresponding separate
action options:

  ---------------------    ------------------------------------------------
   `-O|--parse-only'        only parse input and verify its correctness
   `-P|--plain-diff'        produce a normalized plain diff text on stdout
   `-U|--unified-diff'      produce an unified diff text on stdout
   `-A|--addresses-diff'    produce an addresses diff text on stdout
  -----------------------  ------------------------------------------------

The action options `-P|--plain-diff' generates on stdout a normalized version
of the input plain diff file given. Normalizing a plain diff text means doing
the following:

  * remove all empty lines and comment lines (lines which start with "///");
  * regroup all file hash entries at the top of the output text, sorted by
    file name in increasing order;
  * regroup all diff hunks entries after the file hash entries, sorted by
    file name and line numbers in increasing order.

The remaining action options -- `-O|--parse-only', `-U|--unified-diff' and
`-A|--addresses-diff' -- are presented in the next three subsections.

3.b. Verify the correctness of given plain diff text
----------------------------------------------------

Plain-Diff's source tree contains in its 'test' directory a real plain diff
file -- that is 'test/json-type.pdiff'. This plain diff file is referring to
Json-Type's [4] source tree -- it was produced out of a certain revision of
Json-Type, during a code coverage analysis session.

I'll use 'test/json-type.pdiff' below, in this subsection and in the next one,
for illustrating the functionalities of 'pdiff' program. The note attached to
reference entry [4] presents the procedure of obtaining locally Json-Type's
source tree revision to which 'test/json-type.pdiff' is referring to.

Let $JSON_TYPE_HOME be set such that to point to the local directory containing
the cloned Json-Type source tree:

  $ export JSON_TYPE_HOME=...

The command line options `-O|--parse-only' provide an useful functionality:
they forbid 'pdiff' from generating anything on stdout, thus resulting that
'pdiff' only parses the input provided and verifies its correctness.

  $ pdiff -O test/json-type.pdiff 
  pdiff: error: test/json-type.pdiff:26: \
  invalid file hash: source file 'lib/json-type.c' not found

The plain diff file 'test/json-type.pdiff' has on line 26 a file hash entry:

  $ sed -n 26p test/json-type.pdiff
  ### lib/json-type.c:b15e0661bf664f99cbfe8156916921d1df074b12

Since the source file 'lib/json-type.c' doesn't exists within the source tree
of Plain-Diff, 'pdiff' exits reporting this error. But, when passing on to it
Json-Type's home directory, things work OK:

  $ pdiff -O test/json-type.pdiff -d "$JSON_TYPE_HOME" && echo OK
  OK

That would not work if Json-Type's source tree contains a modified version of
'lib/json-type.c':

  $ pdiff -O test/json-type.pdiff -d "$JSON_TYPE_HOME" && echo OK
  pdiff: error: $PLAIN_DIFF_HOME/test/json-type.pdiff:26: \
  invalid file hash: SHA1-digest mismatch: \
  expected 'b15e0661bf664f99cbfe8156916921d1df074b12' but got '...'

One might avoid verifying hash digests -- by passing `--no-hash-check' option
to 'pdiff' --, yet the input still be unacceptable for the program:

  $ pdiff -O test/json-type.pdiff -d "$JSON_TYPE_HOME" --no-hash-check
  pdiff: error: $PLAIN_DIFF_HOME/test/json-type.pdiff:29: \
  lib/json-type.c:...: diff text not matching source text

This error message tells that the 29th plain diff text line is not the same --
as it should be -- with the corresponding source text line:

  $ sed -n 28,29p test/json-type.pdiff
  >>> lib/json-type.c:1400
          v = node->args[0].val;

There is yet more of a possibility: avoid matching text lines in the plain diff
file with the ones in the corresponding source files:

  $ pdiff -O test/json-type.pdiff --no-hash-check --no-text-check && echo OK
  OK

If one only wants to verify the textual correctness of his plain diff file --
thus overruling all checks normally done by 'pdiff' with respect to the source
files specified in the plain diff file -- `--dry-run' is what he needs using:

  $ pdiff -O test/json-type.pdiff --dry-run && echo OK
  OK

Syntactically incorrect input makes 'pdiff' complain even if invoked with
`--dry-run':

  $ sed '28s/>>>//' test/json-type.pdiff|pdiff -O --dry-run
  pdiff: error: <stdin>:28: non-empty lines not allowed in DIFF context

3.c. Generate an unified diff from given plain diff text
--------------------------------------------------------

The main use case of 'pdiff' is that of generating unified diff texts having
a number of context lines specified by its user. This is achieved using the
command line options `-U|--unified-diff', `-c|--context' and `--adjust-diff'.

Important is to remark that all the command lines `pdiff -O' in the previous
subsection that fail with error message, will fail the same exact way if `-O'
is replaced with `-U'. This should be a sensible behavior: a prerequisite of
generating an unified diff text out of a plain diff text is that the latter
text must be valid in the sense outlined by the previous subsection.

Given option `--adjust-diff', 'pdiff' is enforced to always use source text
lines read from source files to adjust unified context lines for the number
of these to be up to the number specified by `-c|--context'. The adjustment
may not always be possible at the extend specified by `-c|--context' when
the context lines are near the top lines or, respectively, the bottom lines
of the source text.

A caveat of using action options `-U|--unified-diff' along with `--dry-run':
they work OK together, but sometimes the outcome may not be as expected. That
is because `--dry-run' implies `--no-adjust-diff' -- 'pdiff' does not touch
source files when given `--dry-run' --, thus some diff hunks of the generated
unified diff may not have their context lines extended as much as prescribed
by `-c|--context'.

The unified diff hunk generated from the first plain diff hunk in the file
'test/json-type.pdiff' looks as follows (see that plain diff hunk listed at
the end of section 2):

  $ sed -n 28,36p test/json-type.pdiff|pdiff -d "$JSON_TYPE_HOME" -U
  --- lib/json-type.c
  +++ lib/json-type.c
  @@ -1398,7 +1398,7 @@ #1
       case json_type_ruler_obj_first_key_type:
           // meta-rule (2)
           v = node->args[0].val;
  -        if ((s = JSON_AST_NODE_AS_IF_CONST(v, string)) == NULL ||
  +        if (!(s = JSON_AST_NODE_AS_IF_CONST(v, string)) ||
               !json_type_ruler_lookup_obj_type(s->buf.ptr, &t))
               return JSON_TYPE_RULER_ERROR_POS(
                   v->pos, invalid_type_obj_type_not_obj_arr_list_dict);

Note that 'pdiff' added four context lines, two at the top and two at the
bottom of the unified diff hunk above. These four source lines had to be
read from the source file 'lib/json-type.c' itself, since the originating
plain diff hunk contains only three lines of source code.

Extending the series of command line examples in the previous subsection, the
command lines below patch the source tree of Json-Type according to the plain
diff file 'test/json-type.pdiff'. The following command line must be issued
within $PLAIN_DIFF_HOME directory:

  $ pdiff -d "$JSON_TYPE_HOME" -U test/json-type.pdiff|patch -d "$JSON_TYPE_HOME" -p0
  patching file lib/json-type.c

The next command needs to be invoked within $JSON_TYPE_HOME directory:

  $ pdiff -U "$PLAIN_DIFF_HOME"/test/json-type.pdiff|patch -p0
  patching file lib/json-type.c

3.d. Fix up diff hunk addresses of given plain diff text
--------------------------------------------------------

This subsection presents one more important use case of 'pdiff': that of action
option `-A|--addresses-diff'. When one has at hand a plain diff text file which
refers to source files of which content 'pdiff' finds out to be in disagreement
with hunk source lines within the given plain diff file, `-A|--addresses-diff'
may come to help fixing the offending diff file -- for to bring it in sync with
source file content.

More precisely, when source files change such that their line addresses within
the plain diff became invalid, `-A|--addresses-diff' can be employed to produce
a plain diff text which fixes up conveniently the invalidated diff addresses.

To exemplify the situation just described, let's make the valid plain diff file
'test/json-type.pdiff' be invalid by changing one of its hunk addresses (also
retaining its original content in the file 'test/json-type.pdiff~'):

  $ sed -i~ '28s/1400/1401/' test/json-type.pdiff

Now, 'pdiff' reacts as expected, rejecting the altered diff file:

  $ pdiff -d "$JSON_TYPE_HOME" -O test/json-type.pdiff
  pdiff: error: $PLAIN_DIFF_HOME/test/json-type.pdiff:29: \
  lib/json-type.c:1401: diff text not matching source text

Nice is that action option `-A|--addresses-diff' is able to handle the broken
file, finding out and producing a fix for the hunk address edited above:

  $ pdiff -d "$JSON_TYPE_HOME" -A test/json-type.pdiff
  >>> $PLAIN_DIFF_HOME/test/json-type.pdiff:28
  \>>> lib/json-type.c:1401
  <<<
  \>>> lib/json-type.c:1400
  >>>

Piping the plain diff text produced by `-A|--addresses-diff' into 'pdiff' once
more will produce an unified diff ready to be used by 'patch' for bringing the
broken file 'test/json-type.pdiff' back to its initial valid state:

  $ pdiff -d "$JSON_TYPE_HOME" -A test/json-type.pdiff|pdiff -U
  --- $PLAIN_DIFF_HOME/test/json-type.pdiff
  +++ $PLAIN_DIFF_HOME/test/json-type.pdiff
  @@ -25,7 +25,7 @@ #1
   
   ### lib/json-type.c:b15e0661bf664f99cbfe8156916921d1df074b12
   
  ->>> lib/json-type.c:1401
  +>>> lib/json-type.c:1400
           v = node->args[0].val;
           if ((s = JSON_AST_NODE_AS_IF_CONST(v, string)) == NULL ||
               !json_type_ruler_lookup_obj_type(s->buf.ptr, &t))

  $ pdiff -d "$JSON_TYPE_HOME" -A test/json-type.pdiff|pdiff -U|patch -p0
  patching file $PLAIN_DIFF_HOME/test/json-type.pdiff

  $ diff -q test/json-type.pdiff{~,} && echo OK
  OK

One careful reader may wonder whether the 'patch' pipeline command above is
actually correct, since the file 'test/json-type.pdiff' is both read from and
written into by processes belonging to the same pipe chain. Fact is that there
is nothing to worry about here because 'pdiff' functions as a "sponge": before
writing anything out, it reads in the entire content of the input file given.

For the remainder of the current subsection, let's make two text files that'll
illustrate further the behavior of action option `-A|--addresses-diff':

  $ cd /tmp

  $ cat <<EOF >foo.txt
  foo
  bar
  bar
  foo
  baz
  EOF

  $ cat <<EOF >foo.pdiff
  >>> foo.txt:5
  foo
  <<<
  faa
  >>>
  >>> foo.txt:6
  bar
  <<<
  baz
  >>>
  EOF

Now, applying 'pdiff' to 'foo.pdiff' gives an anticipated error message:

  $ pdiff -O foo.pdiff
  pdiff: error: foo.pdiff:2: foo.txt:5: diff text not matching source text

Action option `-A|--addresses-diff' has yet more of a useful behavior for the
case of a diff hunk source text occurring multiple times in its target source
text file. The output line of 'pdiff' below tells that one hunk source text in
'foo.pdiff' (that consisting of the sole text line 'foo') do appear in several
places of source file 'foo.txt':

  $ pdiff -A foo.pdiff
  pdiff: error: foo.pdiff:1: diff hunk source lines occur \
  multiple times in source text: 1, 4

In case of hunk source texts matching multiple times source text lines, 'pdiff'
can still generate the needed plain diff output. In such cases, 'pdiff' requires
the user to decide himself, interactively, for each instance of a multi-matching
diff hunk, which line addresses to choose:

  $ pdiff -A foo.pdiff --interactive
  foo.pdiff:1: foo.txt: source lines matching choice [1, 4]: 1
  >>> foo.pdiff:1
  \>>> foo.txt:5
  <<<
  \>>> foo.txt:1
  >>>
  foo.pdiff:6: foo.txt: source lines matching choice [2, 3]: 3
  >>> foo.pdiff:6
  \>>> foo.txt:6
  <<<
  \>>> foo.txt:3
  >>>

User interaction is working when 'pdiff' program's output is piped further, for
to be processed by another program (this is OK, even if the latter program is
'pdiff' itself):

  $ pdiff -A foo.pdiff --interactive|pdiff -U
  foo.pdiff:1: foo.txt: source lines matching choice [1, 4]: 1
  foo.pdiff:6: foo.txt: source lines matching choice [2, 3]: 3
  --- foo.pdiff
  +++ foo.pdiff
  @@ -1,4 +1,4 @@ #1
  ->>> foo.txt:5
  +>>> foo.txt:1
   foo
   <<<
   faa
  @@ -3,7 +3,7 @@ #2
   <<<
   faa
   >>>
  ->>> foo.txt:6
  +>>> foo.txt:3
   bar
   <<<
   baz

One pipeline command is sufficient -- as already seen above -- for correcting
the initially invalid diff file 'foo.pdiff':

  $ pdiff -A foo.pdiff --interactive|pdiff -U|patch -p0
  foo.pdiff:1: foo.txt: source lines matching choice [1, 4]: 1
  foo.pdiff:6: foo.txt: source lines matching choice [2, 3]: 3
  patching file foo.pdiff

  $ pdiff -O foo.pdiff && echo OK
  OK

By becoming valid, 'foo.pdiff' makes `-A|--addresses-diff' to produce no output:

  $ pdiff -A foo.pdiff

Invoking 'pdiff' with `-A|--addresses-diff' and `--no-hunk-output' on a valid
plain diff file -- while preventing the program from producing any plain diff
text -- instructs it to print on stdout all multi-matching hunks contained in
that given file:

  $ pdiff -A foo.pdiff --no-hunk-output
  foo.pdiff:1: diff hunk source lines occur multiple times in source text: 1, 4
  foo.pdiff:6: diff hunk source lines occur multiple times in source text: 2, 3

For an invalid plain diff file, a 'pdiff' command line like the above -- while
also not producing any plain diff text -- prints out all non-matching hunks too:

  $ sed '2s/foo/doo/' foo.pdiff|pdiff -A --no-hunk-output
  <stdin>:1: diff hunk source lines not found in source text
  <stdin>:6: diff hunk source lines occur multiple times in source text: 2, 3

  $ sed '2s/foo/baz/' foo.pdiff|pdiff -A --no-hunk-output
  <stdin>:1: diff hunk source lines occur somewhere else in source text: 5
  <stdin>:6: diff hunk source lines occur multiple times in source text: 2, 3


4. References
=============

[1] GNU Patch
    https://www.gnu.org/software/patch/

[2] GNU Diffutils
    https://www.gnu.org/software/diffutils/

[3] Git
    https://git-scm.com/

[4] Json-Type: JSON Push Parsing and Type Checking
    http://nongnu.org/json-type/

    For to use the plain diff file 'test/json-type.pdiff' on real source files,
    one needs to do two things. Firstly, clone Json-Type's remote git repo into
    a local directory of one's choice:

    $ cd ...

    $ git clone git://git.sv.nongnu.org/json-type

    Secondly, checkout the revision needed by 'test/json-type.pdiff':

    $ grep "$PLAIN_DIFF_HOME"/test/json-type.pdiff -e git -A1
    /// $ git log -1 --pretty=oneline
    /// a64f983a5fb892951bf0eaa39e7fa2e57c59f31a NEWS: new file.

    $ git checkout -q a64f983a5f && echo OK
    OK


About

Plain-Diff: Tool Consuming Plain Diffs, Generating Unified Diffs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published