Skip to content

gitGNU/gnu_path-set

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

                                    Path-Set
                                    ~~~~~~~~
                        Stefan Vargyas, stvar@yahoo.com

                                  Sep  3, 2016


Table of Contents
-----------------

 0. Copyright
 1. The Path-Set Program
    a. The Initiating Problem of Path-Set
    b. Initial Choices Developing Path-Set
    c. The Dictionary Data Structures of Path-Set
 2. Making a 'path-set' Binary
 3. Using 'path-set-test' Shell Function
 4. Using 'path-set-test-grep' Shell Function
 5. Using 'path-set-test-graph' Shell Function
 6. Using 'path-set-test-percents' Shell Function
 7. Appendix: The GNU/Linux Programs Used
 8. Appendix: Links to Json-Type and Gnulib
 9. Appendix: Patching the 'graph' Program
10. References


0. Copyright
============

This program is GPL-licensed free software. Its author is Stefan Vargyas. You
can redistribute it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

You should have received a copy of the GNU General Public License along with
this program (look up for the file COPYING in the top directory of the source
tree). If not, see http://gnu.org/licenses/gpl.html.

The source tree of Path-Set includes source files from two other free software
projects: Json-Type [1] and Gnulib [2]. These source files were placed in two
separate directories, in 'lib/json-type' and, respectively, 'lib/gnulib'.

Each Json-Type and Gnulib source file in the 'lib' directory contains unaltered
the original copyright and license notices.

Path-Set includes also a couple of patch files of 'graph' program. This program
is part of GNU's 'plotutils' package version 2.6, which itself is licensed with
GPL v3. The file 'tools/plotutils/graph-legend.patch' is due to Daniel Llorens'
work: https://notabug.org/lloda/plotutils-graph-patches-lloda. The other two
patch files -- 'tools/plotutils/graph-{force-top,bottom}-label.patch' -- were
produced by the author of this package and are subject to the same copyright
and license provisions as the rest of Path-Set is.

Note that the set of patch files named above leave unaltered the copyright and
license notices of the source files in GNU 'plotutils' package.

The test results data -- the files 'tests/*/$SHA1NAMES' -- and the test plots
generated by the modified 'graph' program -- the files 'tests/*/*.png' -- are
subject to the same copyright and license provisions as the rest of Path-Set is. 


1. The Path-Set Program
=======================

1.a. The Initiating Problem of Path-Set
---------------------------------------

Path-Set's main vocation is that of a playground and test bed for a forest of
data structures and associated algorithms exploring parts of the solution space
of the following problem:

   File Path Names Set Compression: given a huge list of realistic
   file name paths (e.g. obtained from applying the 'find' command
   to existing file systems) implement an *online dictionary* data
   structure that stores internally its keys (i.e. the file path
   names) such a way that the total amount of memory claimed by the
   data structure from the free store to be *significantly less*
   than the amount of memory one would need to actually store all
   the keys as such.

The problem stated above deserve a few clarifications. First to say is that
the *online dictionary* data structure is to be understood as an in-memory
associative array of key-value pairs, where the keys are strings. The data
structure is providing (at least) three operations -- 'insert', 'lookup' and
'delete' -- freely to be mixed while using them. The free mixing of the basic
operations accounts for the *online* adjective used above: there is no separate
internal processing stage needed for to achieve the expected memory gain.

Second to remark is that a certain loss of speed of the 'insert' and 'lookup'
operations of this data structure is acceptable as a trade-off favoring the
desired memory consumption gain.

The third thing -- that of the expected gain in memory consumption -- is the
least clearly delineated. It is the main aim of Path-Set to experiment various
data structures obeying the other conditions set forth above for to measure the
memory consumptions implied.

Interesting to note is that a starting point for Path-Set was a conversation
thread on Coreutils's public discussion list from 2014 [5].

1.b. Initial Choices Developing Path-Set
----------------------------------------

In the literature of algorithms circumscribing the problem of online dictionaries
of strings there is well-known data structure -- the ternary search tree, TST, of
Bentley and Sedgewick [3] -- which is simple, beautiful and quite efficient (see
for example the accounts given in 'doc/trie-impl.txt' in Json-Type [1]).

Path-Set chooses the TSTs as its main theme: the dictionary data structure with
which it tackles the problem stated above is based on TSTs.

Path-Set incorporates for comparison purposes relating to the issues mentioned
by [5] and as a variant hash table implementation the Gnulib's [2] generic hash
table -- the files 'lib/hash.{h,c}'.

1.c. The Dictionary Data Structures of Path-Set
-----------------------------------------------

The online dictionaries of Path-Set come come in six base set structures, each
having specific variants depending on compile-time configuration parameters.

Path-Set defines a two-level naming scheme of its dictionaries. The inner naming
level is comprised of the 'set' type: either 'ternary-tree', 'linear-hash' or
'gnulib-hash'. The outer naming level is the 'struct' type: either 'plain-set'
or 'path-trie'. The table below shows which structure is implemented by which
source file:

  set/struct    plain-set          path-trie
  ------------  -----------------  ---------------
  ternary-tree  trie-plain-set.c   trie-path-set.c
  linear-hash   lhash-plain-set.c  lhash-path-set.c
  gnulib-hash   ghash-plain-set.c  ghash-path-set.c

The 'plain-set' structure types are plain sets -- the set themselves. The other
structure type -- the 'path-trie' type -- stands for a new kind of dictionary
data structure -- the 'Path Trie' -- and of which inner structure can be any of
the already mentioned 'set' type. Therefore there are three kinds of path tries:
ternary tree path tries, linear hash path tries and gnulib hash path tries.

The 'ternary-tree' data structure is precisely the implementation of Bentley
and Sedgewick's [3] ternary search trees (or tries). Important to note is that
professor Sedgewick made available C implementations for the iterative variants
of the 'insert' and 'lookup' operations of TSTs on his own site (see the links
attached to reference entry [3], specifically the two C source files, 'tstdemo.c'
and 'demo.c'). I used his C code as spring of inspiration for my implementation.

The 'linear-hash' data structure is an implementation of the linear probing hash
table of Knuth [4, pp. 526-528].

The 'gnulib-hash' data structure is a modified version of Gnulib's [2] generic
chained hash table implementation. The Gnulib's implementation was adapted to
suit the internal interfaces of Path-Set. Instrumentation code was added for
collecting statistics and a couple of dependencies were eliminated for to make
the code functioning outside Gnulib. The differences to the original source
files can be seen using the procedure described in section 8, Appendix: Links
to Json-Type and Gnulib, below.

The 'path-trie's are the ternary search tries of Bentley and Sedgewick [3] with
their symbols being path elements instead of characters. This is to say that the
keys of path tries are sequence of path elements and not strings (not sequence
of characters). However, the path elements are strings: e.g. the path elements
of path name '/tmp/foo/bar' are the strings '/', 'tmp/', 'foo/' and 'bar'. As a
result, the path tries are using an inner dictionary data structure mapping path
element strings to uniquely identifying pointers or 32-bit offsets. The 'insert'
and 'lookup' operations of path tries are mere variants of those of TSTs.

The six structures implemented by the source files named above are controlled by
the following '#define' configuration parameters:

  CONFIG_PATH_TRIE_ELEM_32BIT_OFFSETS
  -----------------------------------
  This parameter determines whether the linear hash path trie data
  structures uses 32-bit offsets instead of pointers for referencing
  the path elements within the structure.

  CONFIG_PATH_TRIE_NODE_32BIT_OFFSETS
  -----------------------------------
  This parameter determines whether the path trie data structures use
  32-bit offsets instead of pointers for referencing the path trie nodes
  within the structure.

  CONFIG_PATH_TRIE_UINT_HASH_IDENTOP
  ----------------------------------
  The path trie data structure uses internally a 'PATH_TRIE_UINT_HASH'
  hash function on unsigned integers. When this parameter is defined
  the hash function is made to be the identity operation.

  CONFIG_TRIE_PATH_SET_PRINT_SET
  ------------------------------
  The ternary tree path trie dictionary type (implemented in the source
  file 'trie-path-set.c') defines a 'print-set' operation if and only if
  this parameter was defined. The 'print-set' operation determines the
  structure of the inner set used by this dictionary type: when needing
  'print-set', the inner trie nodes have to use an additional pointer.

  CONFIG_VALUE_TYPE_SIZE
  ----------------------
  Each dictionary structure has a user-defined value type defined to be
  the type 'int${n}_t' where '${n}' is the value of this parameter.

All of the above 'CONFIG_*' parameters are by default not defined. The parameter
'CONFIG_VALUE_TYPE_SIZE' is an exception: it is defined as '__WORDSIZE'.


2. Making a 'path-set' Binary
=============================

The six source files -- '{trie,lhash,ghash}-{plain,path}-set.c' -- mentioned
in the previous subsection are compiled and linked into a 'path-set' binary.
By specifying to the compiler different sets of 'CONFIG_*' configuration
parameters, several kinds of 'path-set' binaries are to be obtained from it.

All source files of 'path-set' are written in modern C and were developed under
a GNU/Linux environment using the GCC C compiler v4.3.4 and an experimental
v4.8.0 build from sources (a version which is not yet supporting `-std=c11').

The header file 'config.h', located in the 'lib' directory of the source tree,
defines a few more compile-time 'CONFIG_*' parameters which determine the proper
functioning of 'path-set'. Before proceeding to build 'path-set', one must assure
himself that each of these parameters in 'config.h' have a meaningful value, in
accordance with  the target platform (machine and compiler) on which the program
is supposed to be built.

Important to note is that 'path-set' depends on it being build with GCC: it uses
a few notable extensions to the C language and to the C standard library GCC is
providing. Only two examples: the ubiquitous usage of the built-in function
'__builtin_types_compatible_p' within a wide range of preprocessor macros
and the extensively used C extension of the so called 'compound statement
enclosed in parentheses' expressions (also for the construction of a wide
array of useful C macros).

The 'path-set' binary is supposed to be build by GNU Make. I used version 3.81.
To start the build procedure one has to issue a 'make' command of the following
kind from within the 'src' directory of the source tree:

  $ cd src

  $ make ARGS...

The 'CONFIG_*' parameters are passed in to 'make' in one of the two forms below:

  'make' argument       'gcc' argument
  --------------------  ---------------------
  CONFIG+=$NAME         -DCONFIG_$NAME
  CONFIG+=$NAME=$VALUE  -DCONFIG_$NAME=$VALUE

$NAME is the name of the respective configuration parameter (stripped of the
common prefix 'CONFIG_') and $VALUE is the value assigned to the respective
parameter if it has such assignable value (the previous subsection indicates
that only 'CONFIG_VALUE_TYPE_SIZE' accepts assigning a value to it).

Alongside these 'CONFIG_*' parameters there are three more such configuration
parameters that control the way a 'path-set' binary is built:

  parameter  argument   default      'gcc' arguments
  ---------  ---------  -----------  ---------------
  32BIT      no|yes     no           -DBIT32 -m32
  DEBUG      no|yes     yes          -DDEBUG -g
  OPT        0|1|2|3|4  not defined  -DOPT=$n -O$n

The meanings of '32BIT', 'DEBUG' and 'OPT' are obvious taking into consideration
the arguments 'make' pass in to 'gcc'. Only note that having 'DEBUG=yes' on the
'make' invocation command line enables 'DEBUG' in all source files making up a
'path-set' binary. This brings in the binary quite a lot of runtime checking
code due to the many ASSERT macro calls scattered throughout the source files.

An empty invocation command line of 'make' produces the following output:

  $ make
  gcc -Wall -Wextra -std=gnu99 \
  -I. -I../lib/json-type -I../lib/gnulib \
  -DPROGRAM=path-set -DDEBUG -g \
  main.c common.c clocks.c set-intf.c pool-alloc.c \
  trie-path-set.c lhash-path-set.c ghash-path-set.c \
  trie-plain-set.c lhash-plain-set.c ghash-plain-set.c \
  ../lib/gnulib/hash.c -o path-set -lm

Upon building it, 'path-set' can be asked to produce brief help info on stdout:

  $ ./path-set --help
  usage: path-set [ACTION|OPTION]...
  where actions are specified as:
    -C|--print-config        do nothing but print out the config parameters
    -I|--print-sizes         do nothing but print out the node struct sizes
                               associated to the set structure specified by
                               options `-r|--struct-type' and `-s|--set-type'
    -N|--print-names         do nothing but print out the statistics parameter
                               names associated to the set structure specified
                               by options `-r|--struct-type' and `-s|--set-type'
    -L|--load-only           only load in the input (default)
    -P|--[print-]set         print out the path set
    -E|--[print-]elems       print out the path elements
  the options are:
    -p|--pool-size NUM[KM]   memory pool initial size (default 128K)
    -h|--hash-size NUM[KM]   by case, the linear hash table or the gnulib
                               hash table initial size (default 1K)
    -z|--rehash-size FLOAT   grow hash table size by specified factor when rehashing;
                               FLOAT must be strictly between 1 and 4; default is NAN,
                               i.e. use the table's default value
    -l|--rehash-load FLOAT   rehash hash table when its load factor reached given value;
                               FLOAT must be strictly between 0 and 1; default is NAN,
                               i.e. use the table's default value
    -r|--struct-type TYPE    use the specified structure type: any of l|plain-set
       --plain-set             or p[ath-trie]; the default is path-trie
       --path-trie
    -s|--[set-]type TYPE     use the specified set type: any of t[ernary-tree],
       --ternary-tree          l[inear]-hash or g[nulib-hash]; the default
       --linear-hash           is ternary-tree
       --gnulib-hash
    -t|--sep[arator]s STR    use the given chars as separators of path elements
                               in path tries; the default char set is '/.-'
    -S|--[no-][print-]stats  print out some statistics information or
                               otherwise do not (default not)
       --dump-options        print options and exit
    -V|--[no-]verbose        be verbose or otherwise do not (default not)
       --help-stats          print info about the statistics parameters and exit
    -v|--version             print version numbers and exit
    -?|--help                display this help info and exit

The program takes in from stdin file path names separated by newline characters.
Its output is determined by the action option specified. Only `--dump-options',
`--help-stats', `-v|--version' and `-?|--help' are ignoring any action option
given in the command line for to produce on stdout their own kind of output.

The meanings of almost all the command line options above are obvious (having in
mind the descriptions made in the previous section). Only note that the command
line option `-p|--pool-size' defines the initial amount of free store memory
claimed by the memory pool of 'path-set'. The memory pool is used for:

  (a) the nodes of the main structure, when this is of type 'plain-set';
  (b) the nodes of the main structure, when this is of type 'path-trie',
      compiled without 'CONFIG_PATH_TRIE_NODE_32BIT_OFFSETS' defined;
  (c) the path element nodes of 'path-trie's, excluding the 'linear-hash'
      ones compiled with 'CONFIG_PATH_TRIE_ELEM_32BIT_OFFSETS' defined;
  (d) the path strings in the case of '{linear,gnulib}-hash' 'plain-set's;
  (e) the path element strings of 'path-trie's, excluding the 'linear-hash'
      ones compiled with 'CONFIG_PATH_TRIE_ELEM_32BIT_OFFSETS' defined;
  (f) the path element strings in case of 'gnulib-hash' 'path-trie's.

If building 'path-set' with 'CONFIG_PATH_TRIE_NODE_32BIT_OFFSETS' defined, then
the help info would include an extra `-n|--node-size' command line option:

  $ ./path-set --help|grep -e --node-size
    -n|--node-size NUM[KM]   the path trie node table initial size (default 1K)

The option `-n|--node-size' is defining the initial size of the table used for
allocating contiguously path element nodes in the 'path-trie' structure. This
contiguous placement of path element nodes allows the usage of 32-bit offsets
instead of pointers for to refer to these nodes within the structure itself.

Given that the 'make' command line above contained no configuration parameter
specifications, the invocation of 'path-set' below displays the defaults the
configuration parameters received: 

  $ ./path-set -C
  CONFIG_PATH_TRIE_NODE_32BIT_OFFSETS: no
  CONFIG_PATH_TRIE_ELEM_32BIT_OFFSETS: no
  CONFIG_PATH_TRIE_UINT_HASH_IDENTOP:  no
  CONFIG_TRIE_PATH_SET_PRINT_SET:      no
  CONFIG_VALUE_TYPE_SIZE:              64
  32BIT:                               no
  DEBUG:                               yes
  OPT:                                 -

The parameter 'CONFIG_VALUE_TYPE_SIZE' above has the value '64' because I ran
'make' on an x86-64 GNU/Linux platform which defined '__WORDSIZE' to be 64.

A 'path-set' command line containing only `--dump-options' shows the defaults
the runtime options of the program received:

  $ ./path-set --dump-options
  action:      load-only
  struct-type: path-trie
  set-type:    ternary-tree
  pool-size:   128K
  hash-size:   1K
  rehash-size: nan
  rehash-load: nan
  sep-set:     /.-
  print-stats: no
  verbose:     no
  argc:        0

When the option `-n|--node-size' is enabled, its default value is shown to be:

  $ ./path-set --dump-options|grep -e node-size
  node-size:   1K


3. Using 'path-set-test' Shell Function
=======================================

Alongside 'path-set' binaries, Path-Set is comprised of two more programs -- two
shell functions defined in the 'bash' script 'src/commands.sh': 'path-set-test'
and 'path-set-test-grep'. Before one attempts to use these functions, the shell
script must be sourced in at the 'bash' command line prompt:

  $ shopt -s extglob

  $ cd src && . commands.sh

Important note: the 'bash' script 'src/commands.sh' needs the extended pattern
matching features to be enabled prior to sourcing it in. When 'extglob' isn't set
in the current 'bash' environment, loading 'src/commands.sh' fails with an error
message.

The main functionality of shell function 'path-set-test' is to run series of
tests on 'path-set' binaries which it builds according to the configuration
parameters it receives as input and to record in text files all the statistics
information obtained from 'path-set'.

However, the details of using 'path-set-test' are a bit more involved. The shell
function accepts the following command line options:

  $ funchelp -f commands.sh -c path-set-test --long-wrap-join=auto
  actions:
    -A|--averages[-results]=FILE   averages results
    -b|--[build-]binary            build binary
    -B|--[test-]base=NUM           test base (default)
    -D|--node-[struct-]def[=NAME]  print node struct def (`-D+' gets all defs;
                                     `-D?' prints out all names and exit)
    -N|--[print-]stat-names        print stat names
    -R|--[test-]results=FILE       test results
    -S|--stat[istic]s              collect statistics
    -t|--[run-]test=NUM            run test
    -T|--[run-]tests               run tests
    -E|--[run-]series              run series
  
  options:
    -C:|--config-NAME              define the named 'path-set' config parameter
        --config-NAME=VALUE          (`-C?' prints out all names and exit)
    -e|--[range-]expr=EXPR         tests range expression (function of 'x')
                                     (default: '10 ** x')
    -i|--input=FILE                input file (default: 'files.txt')
    -l|--[range-]limits=NUM[-NUM]  tests range limits (default: '1-6')
    -o|--output=FILE               output file when action is '-E' or '-T' ('+'
                                     means to generate a SHA1 name) (default: '-'
                                     aka stdout)
    -O:|--NAME                     define the named 'path-set' command line
        --NAME=VALUE                 option (`-O?' prints out all names and exit)
    -p:|--stat-NAME                account for the named 'path-set' stat
                                     parameter (default: 'total-node-mem') (`-p?'
                                     prints out all names and exit)
    -q|--quiet                     be quiet -- suppress normal output when action
                                     is '-E' or '-T' and output doesn't go to
                                     stdout
    -r|--repeat=NUM                number of times to repeat the 'path-set'
                                     command (default: 10)
    -s|--sleep=NUM                 microseconds to sleep between repeats
                                     (default: 100)
    -X|--exclude=NAME              exclude the named set type when action is '-E'
                                     (cumulative)

The input it expects and the output 'path-set-test' produces are controlled by
the last action option specified in the invoking command line. The `--config-*'
parameters are as follows:

  $ path-set-test -C?
  path-trie-32bit-offsets
  path-trie-node-32bit-offsets
  path-trie-elem-32bit-offsets
  path-trie-uint-hash-identop
  trie-path-set-print-set
  value-type-size=8|16|32|64
  32bit
  debug
  optimize

Note that the configuration parameter 'path-trie-32bit-offsets' is merely a
shortcut for the configuration parameters 'path-trie-{node,elem}-32bit-offsets':
specifying 'path-trie-32bit-offsets' on a command line is like having specified
both of the other two configuration parameters.

The 'path-set' command line options passed in from 'path-set-test' are:

  $ path-set-test -O?
  pool-size=NUM[KM]
  hash-size=NUM[KM]
  node-size=NUM[KM]
  rehash-size=FLOAT
  rehash-load=FLOAT
  struct-type=TYPE
  set-type=TYPE
  separators=STR
  plain-set
  path-trie
  ternary-tree
  linear-hash
  gnulib-hash

When these 'path-set' options are not specified in the 'path-set-test' command
line they receive the following default values:

  'path-set' option  default value
  -----------------  -------------
  --pool-size        128M
  --hash-size        2M
  --node-size        4M
  --rehash-size      NAN
  --rehash-load      NAN
  --struct-type      plain-set
  --set-type         gnulib-hash
  --separators       '/'

Important to note is that some of the command line options of 'path-set-test'
can be omitted from an invoking command line if defined by way of associated
environment variables:

  'path-set-test' option  environment variable
  ----------------------  -------------------------
  --input                 PATH_SET_TEST_INPUT
  --expr                  PATH_SET_TEST_EXPR
  --limits                PATH_SET_TEST_LIMITS
  --pool-size             PATH_SET_TEST_POOL_SIZE
  --hash-size             PATH_SET_TEST_HASH_SIZE
  --node-size             PATH_SET_TEST_NODE_SIZE
  --rehash-size           PATH_SET_TEST_REHASH_SIZE
  --rehash-load           PATH_SET_TEST_REHASH_LOAD
  --struct-type           PATH_SET_TEST_STRUCT_TYPE
  --set-type              PATH_SET_TEST_SET_TYPE
  --separators            PATH_SET_TEST_SEPARATORS

The action options of 'path-set-test' work as described below:

  ==========================================================================
  -A|--average-results=$FILE
  --------------------------------------------------------------------------
  * input: the file '$FILE' obtained from running the shell function with
    action option `-t|--run-test' and the name of a 'path-set' stat parameter
    given by option `-p|--stat-$NAME';
  * output: a list of the following values: the minimum 'min', the maximum
    'max', the average 'avg', the variance 'var' and the standard deviation
    'dev' of the 'avg' column of the table recorded in the file $FILE, table
     obtained from calling itself with action option `-R|--test-results' and
     option `-p|--stat-$NAME'.
  ==========================================================================
  -b|--build-binary
  --------------------------------------------------------------------------
  * input: all `--config-*' parameters;
  * output: a 'path-set' binary built according to the specified `--config-*'
    parameters; the parameters not given on the invoking command line are
    left undefined; there are two exceptions: the 'make' arguments '32BIT'
    and 'DEBUG' corresponding to `--config-32bit' and `--config-debug' get
    assigned the value 'no'.
  ==========================================================================
  -B|--test-base=$NUM
  --------------------------------------------------------------------------
  * input: a 'path-set' binary, the number '$NUM' and the options:
      `--input',
      `--pool-size',
      `--hash-size',
      `--node-size',
      `--rehash-size',
      `--rehash-load',
      `--struct-type',
      `--set-type', and
      `--separators';
  * output: the statistics info printed out by 'path-set' invoked with the
    command line option `-L|--load-only'; the input provided to 'path-set'
    consists of a random selection of '$NUM' lines from the file '$input'.
  ==========================================================================
  -D|--node-struct-def=$NAME
  --------------------------------------------------------------------------
  * input: all `--config-*' parameters and '$NAME' (which is allowed to be
    '{trie,lhash,ghash}-{plain,path}-set');
  * output: the C language declarations of the node structures defined by
    the source file '$NAME.c' (subsection 1.c above gives details about
    these source files).
  ==========================================================================
  -E|--run-series
  --------------------------------------------------------------------------
  * input: all `--config-*' parameters, and the options:
      `--input',
      `--expr',
      `--limits',
      `--pool-size',
      `--hash-size',
      `--node-size',
      `--rehash-size',
      `--rehash-load',
      `--separators',
      `--repeat', and
      `--sleep';
  * output: a specially formatted text describing the context and the output
    of running the shell function itself with action option `-T|--run-tests'
    for each combination of `--struct-type' and `--set-type'. Before calling
    itself for `-T|--run-tests' the shell function builds a 'path-set' binary
    corresponding to the `--config-*' parameters it received in the invoking
    command line. This is accomplished by calling itself with action option
    `-b|--build-binary'.
  ==========================================================================
  -N|--print-stat-names
  --------------------------------------------------------------------------
  * input: a 'path-set' binary and the options:
      `--struct-type' and
      `--set-type';
  * output: that output obtained from invoking 'path-set' with action option
    `-N|--print-names' and options `--$struct_type', and `--$set_type'.
  ==========================================================================
  -R|--test-results=$FILE
  --------------------------------------------------------------------------
  * input: the file '$FILE' obtained from running the shell function with
    action option `-t|--run-test' and the name of a 'path-set' stat parameter
    given by option `-p|--stat-$NAME';
  * output: a table aggregating the values of '$NAME' stat parameter, one
    line per each command 'path-set-test -t $NUM' found in the file '$FILE'.
  ==========================================================================
  -S|--statistics
  --------------------------------------------------------------------------
  * input: a 'path-set' binary, the options:
      `--struct-type' and
      `--set-type',
    and the output obtained from of a series of calls to itself with action
    option `-B|--test-base' and options `--$struct_type' and `--$set_type';
  * output: a formatted table collecting the set of statistics parameters and
    their values obtained from the series of calls to `-B|--test-base'.
  ==========================================================================
  -t|--run-test=$NUM
  --------------------------------------------------------------------------
  * input: a 'path-set' binary, the number '$NUM' and the options:
      `--input',
      `--pool-size',
      `--hash-size',
      `--node-size',
      `--rehash-size',
      `--rehash-load',
      `--struct-type',
      `--set-type',
      `--separators',
      `--repeat', and
      `--sleep';
  * output: the shell function calls itself '$repeat' times for action option
    `-B|--test-base=$NUM' (it sleeps '$sleep' microseconds between repeats);
    the output of the series of `-B|--test-base=$NUM' is piped in to a call
    to itself with action option `-S|--statistics'. The output obtained from
    this latter call to itself is that provided to the caller.    
  ==========================================================================
  -T|--run-tests
  --------------------------------------------------------------------------
  * input: a 'path-set' binary and the options:
      `--input',
      `--expr',
      `--limits',
      `--pool-size',
      `--hash-size',
      `--node-size',
      `--rehash-size',
      `--rehash-load',
      `--struct-type',
      `--set-type', and
      `--separators';
  * output: a specially formatted text describing the context and the output
    of running the shell function itself with action option `-t|--run-test=$n'
    for each number '$n' generated by expression '$expr' on the range defined
    by '$limits'. The text produced contains also the output of running the
    shell function with action option `-R|--test-results=$FILE' and option
    `-p|--stat-$NAME', for each stat parameter '$NAME' 'path-set' provides
    for the set structure defined by `--$struct_type' and `--$set_type'.
    '$FILE' is a temporary text file recording the context, the command line
    and the output of each of the calls to itself in the `-t|--run-test'
    series just issued.
  ==========================================================================

If in doubt about what a given 'path-set-test' command line does, or, otherwise,
when wanting to hack some behavior of the shell function, one has to only append
`-d' to the respective command line for to obtain the text of the 'bash' script
which gets generated and executed internally:

  $ path-set-test ... -d|less


4. Using 'path-set-test-grep' Shell Function
============================================

The shell function 'path-set-test-grep' processes the test result data --
the statistics information obtained from 'path-set-test' -- for to arrange
it in several convenient formats.

For using 'path-set-test-grep', one must be acquainted with the shell function
'path-set-test' first. The command line options of 'path-set-test-grep' are
as follows:

  $ funchelp -f commands.sh -c path-set-test-grep --long-wrap-join=auto
  actions:
    -E|--[grep-]series      grep series
    -T|--[grep-]tests       grep tests (default)
    -G|--raw-text           print raw grepped text
    -L|--raw-list           print raw text list
    -M|--[formatted-]list   print formatted text list
    -R|--raw-table          print raw text table
    -F|--[formatted-]table  print formatted table
  
  options:
    -e|--exclude-config     exclude all the 'config-*' parameters from output
    -H|--[with-]filename    print out the test file name for each match
    -I|--include=NAME       include the named option as column of the output
                              (cumulative) (`-I?' prints out all names and exit)
    -n|--not                negate the next 'path-set-test' command line option
    -O:|--NAME              define the named 'path-set-test' command line option
        --NAME=VALUE          (`-O?' prints out all names and exit)
    -p:|--stat-NAME         account for the named 'path-set' stat parameter
        --stat-NAME=VALUE     (cumulative); the value is of form NUM[.NUM][KMG]
                              or is equal to '+' in case NAME is of form '*-mem'
                              (`-p?' prints out all names and exit)
    -t|--[ran-]test=NUM     filter for the output of commands of form
                              'path-set-test --run-test=NUM' -- when action is
                              '-[FLMR]' (default: '1000000')
    -V|--verbose            be verbose when action is '-[FLMR]'
    -X|--exclude=NAME       exclude the named option as column of the output
                              (cumulative) (`-X?' prints out all names and exit)

The input of 'path-set-test-grep' consists of the SHA1-named test files which
were produced by 'path-set-test' in the current directory. The output of the
shell function depends on the last action option given on the invoking command
line. It is either a set of SHA1-named file names (in case of `-E|--grep-series'
and `-T|--grep-tests') or text obtained out these SHA1-named test files.

The output of 'path-set-test-grep' is also controlled by the options of type
`-O|--NAME|--NAME=VALUE'. The set of this kind of options given in the invoking
command line composes the grepping criteria based on which 'path-set-test-grep'
selects the SHA1-named test files for extracting data from.

The names to be included and/or excluded from the set of table column names or,
by case, list row names produced by the shell script are the following (options
`-I?' and `-X?' have identical outputs):

  $ path-set-test-grep -I?
  input
  expr
  limits
  pool-size
  hash-size
  node-size
  rehash-size
  rehash-load
  separators

Initially, the set of these names contains 'input' and 'separators'.

The action options of 'path-set-test-grep' produce output as described below:

  ==========================================================================
  -E|--grep-series
  --------------------------------------------------------------------------
  * output: the SHA1-named series files matching the criteria defined by
    options `-O|--NAME|--NAME=VALUE' present in the invoking command line.
  ==========================================================================
  -F|--formatted-table
  --------------------------------------------------------------------------
  * output: a formatted table of the one provided by `-R|--raw-table'.
  ==========================================================================
  -G|--raw-text
  --------------------------------------------------------------------------
  * output: the raw text filtered out of the set of test files matching the
    criteria defined by the options `-O|--NAME|--NAME=VALUE' given. The text
    is made of a series of name/value pairs separated by TAB chars, for each
    'path-set' configuration parameter and for each stat parameter specified
    in the command line by an option like `-p|--stat-NAME|--stat-NAME=VALUE',
    parameters that were recorded in the matching test files. The groups of
    name/value pairs that come out from different test files are separated
    by empty lines.
  ==========================================================================
  -L|--raw-list
  --------------------------------------------------------------------------
  * output: basically the one provided by `-G|--raw-text' with the list of
    name/value pairs being reordered and including new pairs for computed
    stat parameters (those that were specified by `-p|--stat-NAME=VALUE').
  ==========================================================================
  -M|--formatted-list
  --------------------------------------------------------------------------
  * output: a formatted version of the one provided by `-L|--raw-list'.
  ==========================================================================
  -R|--raw-table
  --------------------------------------------------------------------------
  * output: basically the one provided by `-G|--raw-text' with the list of
    name/value pairs being rearranged in table format (TAB chars separating
    the columns) and including new pairs for computed stat parameters (those
    that were specified by `-p|--stat-NAME=VALUE'). The table is sorted by
    the  columns corresponding to stat parameters, if such parameters were
    specified in the invoking command line.
  ==========================================================================
  -T|--grep-tests
  --------------------------------------------------------------------------
  * output: the SHA1-named tests files matching the criteria defined by the
    options `-O|--NAME|--NAME=VALUE' present in the invoking command line.
  ==========================================================================

If 'path-set-test-grep' is grepping test files -- i.e. when its action option
is neither `-E|--grep-series' nor `-T|--grep-tests' -- and it was issued with
a command line containing statistics parameter options of form `--stat-NAME'
or `--stat-NAME=VALUE' (or the equivalents `-p NAME' or `-p NAME=VALUE'), then
the shell function will include in its output a column named 'NAME', for each
such parameter specified. The values produced on this column are those recorded
in the input test file, on column 'avg' of the output produced by the command
'path-set-test -t $NUM'. Here '$NUM' is the argument of option `-t|--ran-test'
or, otherwise, is the value '1000000', if this option was not specified.

Furthermore, for each statistics parameter options of form `--stat-NAME=VALUE',
where 'VALUE' is of form 'NUM[.NUM][KMG]', the output of 'path-set-test-grep'
will also include a column named 'VALUE' to the right of column named 'NAME'.
On each line of output, the value on the column 'VALUE' is the percentage of
*gain* relative to 'VALUE' of the corresponding value on the column 'NAME'.

For options of form `--stat-NAME=+', where 'NAME' ends with '-mem', the ouput of
shell function 'path-set-test-grep' will include two columns beside each column
named 'NAME'. The names of these two columns are of form 'NUM[.NUM][KMG]'. The
first name is the average value of the 'line-mem' stat parameter of 'path-set'
relative to the same grepping criteria that the invoking command line defined.
The second name is the average value of the 'NAME' stat parameter of 'path-set'
relative to the grepping criteria that the invoking command line defined, but
restricted to struct type `--plain-set' and set type `--gnulib-hash'. These two
extra columns attached to column 'NAME' contain percentage values as described
above for the case of statistics parameter options `--stat-NAME=VALUE', where
'VALUE' is of form 'NUM[.NUM][KMG]'. Thus a command line like:

  $ path-set-test-grep GREPPING_ACTION GREPPING_CRITERIA --stat-NAME=+

where 'NAME' is of form '*-mem', will produce the same output as the command:

  $ path-set-test-grep GREPPING_ACTION GREPPING_CRITERIA --stat-NAME={VALUE1,VALUE2}

where 'VALUE1' is the average value on the column 'line-mem' of the output of:

  $ path-set-test-grep -Re GREPPING_CRITERIA --stat-line-mem

and 'VALUE2' is the average value on the column 'NAME' of the output of:

  $ path-set-test-grep -Re GREPPING_CRITERIA --stat-NAME --plain-set --gnulib-hash

For the options of same form `--stat-NAME=+', but where NAME ends with '-time',
the output of shell function 'path-set-test-grep' will include only one column
next to the one named 'NAME'. The name of this column is of form 'NUM[.NUM]':
it is the average of the 'NAME' stat parameter values of 'path-set' relative to
the grepping criteria that the invoking commmand line defined, criteria further
restricted to struct type `--plain-set' and set type `--gnulib-hash'. Therefore,
a command line like:  

  $ path-set-test-grep GREPPING_ACTION GREPPING_CRITERIA --stat-NAME=+

where 'NAME' is of form '*-time', will produce the same output as the command:

  $ path-set-test-grep GREPPING_ACTION GREPPING_CRITERIA --stat-NAME=VALUE

where 'VALUE' is the average value on the column 'NAME' of the output of:

  $ path-set-test-grep -Re GREPPING_CRITERIA --stat-NAME --plain-set --gnulib-hash

As for the case of shell function 'path-set-test', appending the option `-d' to
a 'path-set-test-grep' command line produces the text of the 'bash' script which
gets generated and executed internally:

  $ path-set-test-grep ... -d|less


5. Using 'path-set-test-graph' Shell Function
=============================================

Path-Set contains two more programs alongside the three main ones presented so
far: the shell functions 'path-set-test-graph' and 'path-set-test-percents' --
also defined in the 'bash' file 'src/commands.sh'.

The shell function 'path-set-test-graph' is a convenient wrapper for the useful
tool GNU 'plotutils' package provides: that is the 'graph' program.

First thing to note about 'graph' is that the well-functioning of the two shell
functions 'path-set-test-{graph,percents}' requires that the 'graph' binary be
built from patched sources as described below in section 9. As a result of this
patching, 'graph' offers an extended set of features connected to the following
newly added command line options:

      option & argument      short description
  -------------------------  -----------------
  `--legend TEXT'            render out legend text
  `--place-legend POS'       set position of legend text
  `--legend-font-size SIZE'  set font size of legend text
  `--force-top-label'        force rendering out top label (i.e. title)
  `--bottom-label TEXT'      render out bottom label (i.e. note) text
  `--note-font-name NAME'    set font name of bottom label
  `--note-font-size SIZE'    set font size of bottom label

The command line options of 'path-set-test-graph' are as shown below:

  $ funchelp -f commands.sh -c path-set-test-graph --long-wrap-join=auto
  actions:
    -O|--options            print out options
    -F|--features           query optional features: `--[place-]legend',
                              `--legend-font-size', `--force-top-label',
                              `--bottom-label' and `--note-font-{name,size}'
    -G|--graph              generate graph (default)
  
  options:
    -g:|--graph-[no-]NAME   define given graph parameter (`-g?' prints out all
        --graph-NAME=VALUE    names and exit)
    -h|--[no-]header        use or don't use input table header for setting plot
                              names (default do not)

With the exception of action options `-O|--options' and `-F|--features', the
shell function reads in from 'stdin' a multi-column text table comprising a set
of datasets to be passed on to 'graph'. Note that 'path-set-test-graph' expects
that the first row of its input table to contain column names (i.e. sequences
of non-white-space characters separated by white-spaces). In case of the former
two action options, the shell function does not read anything from 'stdin'.

The action options of 'path-set-test-graph' produce output as described below:

  ==========================================================================
  -O|--options
  --------------------------------------------------------------------------
  * output: a list of 'graph' command line options along with corresponding
    arguments (if there are any) which get passed from 'path-set-test-graph'
    to 'graph'.
  ==========================================================================
  -F|--features
  --------------------------------------------------------------------------
  * output: a possibly empty subset of the set of 'graph' command line option
    names listed above by the output of 'funchelp', that are supported by the
    'graph' binary currently installed.    
  ==========================================================================
  -G|--graph
  --------------------------------------------------------------------------
  * output: a PNG file containing the plot generated by 'graph' based on the
    set of input datasets provided on 'stdin' in the form of a text table.
  ==========================================================================

Specifying any of the `-g:|--graph-[no-]NAME|--graph-NAME=VALUE' options in a
command line implies that the shell function will switch its action option to
`-G|--graph', if after the last of such options doesn't follow any other action
option.

The set of parameters 'path-set-test-graph' passes in to 'graph' is as follows:

  $ path-set-test-graph -g?
  bitmap-size
  bottom-label
  font-name
  font-size
  force-top-label
  grid-style
  height-of-plot
  legend
  legend-font-size
  line-mode
  line-width
  note-font-name
  note-font-size
  output-format
  page-size
  place-legend
  symbol
  tick-size
  title-font-size
  toggle-frame-on-top
  toggle-log-x-axis
  toggle-log-y-axis
  toggle-use-color
  toggle-x-axis-end
  toggle-y-axis-end
  top-label
  width-of-plot
  x-label
  y-label

The built-in defaults of 'path-set-test-graph' -- which are passed to 'graph',
if not set via the corresponding `--graph-[no-]NAME' or `--graph-NAME=VALUE'
command line options -- are listed below:

  $ path-set-test-graph -O
  --toggle-use-color
  --toggle-frame-on-top
  --output-format png
  --page-size a4
  --symbol 4 .02
  --font-name HersheySans-Bold
  --font-size .038
  --grid-style 4
  --tick-size -.01
  --title-font-size .05
  --bitmap-size 700x700
  --height-of-plot .7
  --width-of-plot .7
  --line-width .001
  --x-label path-names
  --y-label percents

Similarly to the other shell functions of 'src/commands.sh', the option `-d'
inserted in a 'path-set-test-graph' command line prints out the text of the
'bash' script generated and executed internally:

  $ path-set-test-graph ... -d|less


6. Using 'path-set-test-percents' Shell Function
================================================

The 'path-set-test-percents' shell function's main purpose is that of producing
comparative percents datasets of named statistics parameters between series of
runs of 'path-set' binary with varying number of input path names, as recorded
in SHA1-named test result data files.

Briefly, the command line options of 'path-set-test-percents' are the following:

  $ funchelp -f commands.sh -c path-set-test-percents --long-wrap-join=auto
  actions:
    -N:|--name=table            generate name
        --name=graph
    -S|--source[=FILE]          generate source table
    -T|--table                  generate complete table (default)
    -G|--graph                  generate graph
  
  options:
    -f|--overwrite[-output]     force overwrite existing output file when action
                                  is '-T' or '-G'
    -H|--no-header              do not output header row when action is '-T'
    -i|--input=FILE             take input SHA1-named files from given FILE when
                                  action is '-T' ('+' means to grep for them
                                  using 'path-set-test-grep') (default: '-')
    -o|--output=FILE            output file name when action is '-T' or '-G' ('+'
                                  means to generate a name for it) (default: '-')
    -l|--legend-place=STR       legend place when action is '-G':
                                  '{top,bottom}-{left,right}' (default:
                                  'top-left')
    -p:|--stat-NAME             account for the given stat parameter (default:
                                  'total-mem') (`-p?' prints out all names and
                                  exit)
    -r:|--ref-series=SHA1NAME   reference series file name or otherwise compute
        --ref-series=plain-set    series relative to 'gnulib-hash' of type
        --ref-series=set-type     'plain-set' or given by `-s|--set-type'
                                  (default: 'set-type')
    -R|--exclude-ref-series     exclude reference series column from output when
                                  action is '-T'
    -s:|--set-type=NAME         grep for named set type (default: 'path-trie')
        --plain-set
        --path-trie
    -z|--image-size=NUM         size of generated PNG image when action is '-G'
                                  (default: 500)

However, the command lines invoking 'path-set-test-percents' have the following
actual form:

  $ path-set-test-percents OPTION... [-- GREPPING_OPTION...]

where the grepping options are comprising a set of options which get passed on
to an inner call of 'path-set-test-grep' as explained further below. The set of
grepping options are required and used only when the action option directing the
operation of 'path-set-test-percents' is not `-S|--source'. 

  ==========================================================================
  -G|--graph
  --------------------------------------------------------------------------
  * input: exactly as in the case of action option `-T|--table' paired with
    the option `-i|--input=+'.
  * output: a PNG file, when given `-o|--output=-' or when `-o|--output' is
    not given at all, or a PNG file name, if provided with `-o|--output=+'.
    In this latter case, the file name produced on output is generated from
    the set of specified grepping options (see the entry `-N|--name=graph').

    The PNG file is produced by 'path-set-test-graph' to which is provided
    the text table obtained from invoking 'path-set-test-percents' with the
    action option `-T|--table' and the option `-i|--input=+' along with all
    grepping options given in the initiating command line.
  ==========================================================================
  -N|--name=$ACTION, where '$ACTION' is either 'table' or 'graph'
  --------------------------------------------------------------------------
  * input: none.
  * output: a file name of form '$BASE.txt', when '$ACTION' is 'table', or,
    '$BASE.png', when '$ACTION' is 'graph'. '$BASE' is generated from the
    set of grepping options given in the invoking command line.
  ==========================================================================
  -S|--source=FILE
  --------------------------------------------------------------------------
  * input: a test result (SHA1-named) text file.
  * output: a text table made of two columns of numerical data obtained out
    of file '$FILE'. The first column contains in a series the number of path
    names fed in to 'path-set' in the sequence of tests recorded by '$FILE'.
    The second column contains in a series (running parallel to the previous
    one) the *average* values of the stat parameter named by `-p|--stat-NAME' 
    -- which is grepped from the same sequence of tests recorded by '$FILE'.
  ==========================================================================
  -T|--table
  --------------------------------------------------------------------------
  * input: taken from a set of test result SHA1-named files of which names
    are given in the input file '$FILE', if the argument of `-i|--input' is
    '$FILE' ('-' is 'stdin'). Otherwise, if the argument of `-i|--input' is
    '+', then the SHA1-named input files are computed by a grepping command
    like:

  $ path-set-test-grep -T --$SET_TYPE GREPPING_OPTION...

    where '$SET_TYPE' is either the argument of option `-s|--set-type', or,
    is 'path-trie', when the option isn't given in the invoking command line.

  * output: a text table, when given `-o|--output=-' or when `-o|--output' is
    not given at all, or a table file name, if provided with `-o|--output=+'.
    In this latter case, the file name produced on output is generated from
    the set of specified grepping options (see the entry `-N|--name=table').

    The text table is made of several columns of numerical data obtained out
    of the input SHA1-named test result files. This aggregated text table is
    obtained by joining in the text tables resulted from invoking the shell
    function with options `-S|--source=$FILE' and `-p|--stat-$NAME', for each
    file '$FILE' in the mentioned set of SHA1-named files. The join is based
    on the first column of each of the inner text tables, 'path-names'. The
    'n+1'-th output column contains the percents of gain of the value on the
    second column of the 'n'-th SHA1-named file. The data rows produced are
    preceded by a header row containing on the first column 'path-names' and
    on the 'n+1'-th column the first 8 characters of the name of the 'n'-th
    SHA1-named file.

    The percents of gain computed on each data cell of the output table are
    taken with respect to the argument given to option `-r|--ref-series', if
    any such option was specified in the invoking command line (if not, then
    is considered given as 'set-type'). Before computing these percentages
    for each cell, the output table contains raw values only, extracted from
    SHA1-named files. The raw values are transformed to percentage values by
    the procedure explained below. 

    When given `-r|--ref-series=$SHA1NAME', the output table is adjoined with
    the second column of the table produced upon invoking the shell function
    with `-S|--source=$SHA1NAME' and `-p|--stat-$NAME'.

    When given `-r|--ref-series=plain-set', the output table is adjoined with
    the second column of the table produced upon invoking the shell function
    with `-S|--source=$SHA1NAME' and `-p|--stat-$NAME', where '$SHA1NAME' is
    the name produced by a command like:

  $ path-set-test-grep -T \
  --plain-set --gnulib-hash --rehash-size=1.4142 GREPPING_OPTION...

    When given `-r|--ref-series=set-type', the output table is adjoined with
    one column, precisely as in the case of `-r|--ref-series=plain-set', but
    replacing 'plain-set' with '$SET_TYPE', the argument of `-s|--set-type'.
    Thus the SHA1-named file used for adjoining the additional column to the
    output table is obtained from a command like:

  $ path-set-test-grep -T \
  --$SET_TYPE --gnulib-hash --rehash-size=1.4142 GREPPING_OPTION...

    In any case, upon the adjoining of this latter column -- let it be named
    '$adjoined' --, the percentages mentioned above are computed as follows:

      for (each data row $i) {
          for (each column $j) {
              // the cell <$i,$j> is set to the value of
              // the percents of gain of the cell itself
              // relative to the cell <$i,$adjoined>
              cell <$i,$j> = percents(
                  cell <$i,$adjoined>,
                  cell <$i,$j>)
          }
      }

    where the 'percents' function is simply:

      percents(t, v)
      { return t ? (t - v) / t * 100 : NAN }

    The column '$adjoined' is then removed from the output table in the cases
    of options `-r|--ref-series=$SHA1NAME' and of `-r|--ref-series=plain-set',
    only if the invoking command line specified `-R|--exclude-ref-series'.
  ==========================================================================

Similarly to the shell functions above, inserting properly the option `-d' in a
'path-set-test-percents' command line tells the shell function to print out the
text of the 'bash' script generated and executed internally:

  $ path-set-test-percents ... -d [-- GREPPING_OPTION...]|less


7. Appendix: The GNU/Linux Programs Used
========================================

The programs used by the shell functions in file 'src/commands.sh' are:

  * GNU awk 3.1.8
  * GNU bash 3.2.51
  * GNU coreutils 8.12 (cp, date, echo, env, head, mktemp, rm, sha1sum, shuf,
      sleep, sort, stdbuf, tee, tr, uniq)
  * GNU diffutils 2.8.7 (cmp, diff)
  * GNU findutils 4.4.0 (find, xargs)
  * GNU gcc 4.3.4
  * GNU grep 2.7
  * GNU make 3.81
  * GNU plotutils 2.6 (graph -- patched as shown in section 9)
  * GNU sed 4.1.5
  * git 1.6.0
  * ImageMagick 6.4.3 (convert)
  * super-sed 3.62

Note that 'diff' and 'git' are used only by the shell function 'git-repo-diff'.

The patched 'graph' is used only by shell functions 'path-set-test-graph' and
'path-set-test-percents'. The shell function 'path-set-test-percents' is using
'graph' only indirectly -- only if invoked with action option `-G|--graph' --,
via a call to 'path-set-test-graph'.


8. Appendix: Links to Json-Type and Gnulib
==========================================

The table below lists the SHA1 hashes of the original files brought in 'lib'
directory from Json-Type's and Gnulib's 'git' repositories:

  SHA1 hash                                 file name
  ----------------------------------------  -------------------------
  bd82a90d2881d2aa588ea72855f788cf8904e9be  json-type/config.h
  d12bf27027475b25022bef19c5ac2937ac66b867  json-type/int-traits.h
  53e49832ca9604f9d53774c4d92f20e638eddfaa  json-type/ptr-traits.h
  9a05940bb563a5630d4e221dfeebf463f2c9bd8d  json-type/stack-impl.h
  ----------------------------------------  -------------------------
  f9dede0a8336dc1a328c5a56f9803f22becd4502  gnulib/hash.c
  b61bee63ba11bbc59a0795c76e009d14f4cf1c63  gnulib/hash.h
  44f16441c79a1e4df74fe379042d291fc78a61a4  gnulib/xalloc-oversized.h
  ----------------------------------------  -------------------------

The commands below show which of these source files are modified versions of
the original ones. Note that $JSON_TYPE_HOME and $GNULIB_HOME are the paths
to the directories hosting Json-Type's and, respectively, Gnulib's local 'git'
repositories. (The URLs of the respective public 'git' repositories are listed
in the References section below.)

  $ shopt -s extglob

  $ . src/commands.sh

  $ git-repo-diff -g $JSON_TYPE_HOME --json-type|lsdiff -s
  ! int-traits.h

  $ git-repo-diff -g $GNULIB_HOME --gnulib|lsdiff -s
  ! hash.c
  ! hash.h

Note that each of the commands above expects to be issued from within the top
directory of Path-Set source tree. The command line options of shell function
'git-repo-diff' are as follows:

  $ funchelp -f commands.sh -c git-repo-diff --long-wrap-join=auto
    -b|--ignore-space-change  pass '-b|--ignore-space-change' to diff
    -g|--git-dir=DIR          'git' repo directory (default: '$HOME/$target')
    -s|--sha1-hashes=FILE     sha1 hashes file name ('-' means stdin, the default
                                is 'README')
    -t:|--target=NAME         target name: 'json-type' or 'gnulib' (default:
        --json-type             'json-type')
        --gnulib
    -u|--unified=NUM          pass '-u|--unified=NUM' to diff


9. Appendix: Patching the 'graph' Program
=========================================

Section 5, Using 'path-set-test-graph' Shell Function, mentioned that for the
shell functions 'path-set-test-{graph,percents}' to be able to generate plots
base on the statistics data recorded in SHA1-named text files, one must have
readily available a specifically patched version of the 'graph' program of GNU
'plotutils' package v2.6.

This section presents the procedure of making up the respective patched 'graph'. 

Since 'graph' has to be built from sources, first thing to do is to obtain the
GNU 'plotutils' tarball v2.6 in a directory of one's choice:

  $ wget ftp://ftp.gnu.org/gnu/plotutils/plotutils-2.6.tar.gz

  $ sha1sum -b plotutils-2.6.tar.gz
  7921301d9dfe8991e3df2829bd733df6b2a70838 *plotutils-2.6.tar.gz

Note that one could also verify the signature of the received package by:

  $ wget ftp://ftp.gnu.org/gnu/plotutils/plotutils-2.6.tar.gz.sig

  $ gpg --verify plotutils-2.6.tar.gz.sig 

A description of the procedure of verifying a signature can be read from GnuPG
FAQ, https://www.gnupg.org/faq/gnupg-faq.html#how_do_i_verify_signed_packages.

Next step is to apply the patches in 'tools/plotutils' on the obtained source
tree, in the order shown:

  $ tar -zxf plotutils-2.6.tar.gz

  $ cd plotutils-2.6

  $ export PATH_SET_HOME=...

  $ patch -p0 -i $PATH_SET_HOME/tools/plotutils/graph-legend.patch

  $ patch -p0 -i $PATH_SET_HOME/tools/plotutils/graph-force-top-label.patch

  $ patch -p0 -i $PATH_SET_HOME/tools/plotutils/graph-bottom-label.patch

The environment variable '$PATH_SET_HOME' is set to the path to the directory
were the source tree of Path-Set is located.

Once the GNU 'plotutils' source tree was patched, the accustomed procedure of
building it can right away be employed:

  $ ./configure ...

  $ make

  $ make check

  $ make install

  $ make distclean

A special mention has to be made with regards of GNU 'plotutils' being build
with PNG support enabled. This is so for 'graph' to be able to generate plots
in the form of PNG files. Therefore, the system on which the above procedure
takes place has to have installed the development package of 'libpng'. This
package includes not only the 'libpng.so' shared library binary, but also the
the header files matching precisely the respective binary library.

One more thing related to building 'graph': if one's 'libpng' installation is
of version 1.4.0 or higher, prior to start building GNU 'plotutils', he will
have to manually patch the source tree once more as shown by Ludovic Courtès'
note on https://lists.gnu.org/archive/html/bug-plotutils/2013-07/msg00000.html.

Upon building the package successfully, one has to make sure that referencing
the 'graph' program conducts to the freshly built one: the 'path-set-test-graph'
command below must produce exactly the output shown:

  $ path-set-test-graph -F
  bottom-label
  force-top-label
  note-font-name
  note-font-size
  legend
  place-legend
  legend-font-size

Finally, let mention the commands producing 'tools/plotutils/graph-legend.patch'
from Daniel Llorens' public git repository:

  $ git clone https://notabug.org/lloda/plotutils-graph-patches-lloda.git

  $ git --git-dir plotutils-graph-patches-lloda/.git \
    diff 798e423..4411194 --no-prefix -b|
    filterdiff -i configure.ac -i '*.[hc]' \
    > graph-legend.patch


10. References
==============

Free Software:

[1] Json-Type: JSON Push Parsing and Type Checking
    http://www.nongnu.org/json-type/

    git://git.sv.nongnu.org/json-type

[2] Gnulib - The GNU Portability Library
    http://www.gnu.org/software/gnulib/

    git://git.sv.gnu.org/gnulib

Papers & Books:

[3] Jon L. Bentley, Robert Sedgewick:
    Fast Algorithms for Sorting and Searching Strings
    8th Symposium on Discrete Algorithms, 1997, 360-369.

    http://www.cs.princeton.edu/~rs/strings/paper.pdf

    http://www.cs.princeton.edu/~rs/strings/index.html
    http://www.cs.princeton.edu/~rs/strings/tstdemo.c
    http://www.cs.princeton.edu/~rs/strings/demo.c

[4] Donald E. Knuth
    The Art of Computer Programming,
    Vol. 3, Sorting and Searching, 2nd ed.
    Addison Wesley Longman, 1998, 780 pages
    ISBN 978-0-201-89685-0

Mailing Lists:

[5] Rasmus Borup Hansen: My experience with using cp to copy a lot of files
      (432 millions, 39 TB)
    Coreutils Archives, 11 Aug 2014

    https://lists.gnu.org/archive/html/coreutils/2014-08/msg00012.html;

    Pádraig Brady: Re: My experience with using cp to copy a lot of files
      (432 millions, 39 TB)
    Coreutils Archives, 14 Sep 2014

    http://lists.gnu.org/archive/html/coreutils/2014-09/msg00020.html:

      BTW there is a lot of redundancy in all the stored paths,
      which could be improved by using a prefix compression method.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published