Merge pull request #12710 from milljm/fix-csvdiff-12687

Fix csvdiff script
idaholab · Jan 25, 2019 · 7f694b2 · 7f694b2
2 parents 0892347 + 9ed7488
commit 7f694b2
Show file tree

Hide file tree

Showing 38 changed files with 576 additions and 128 deletions.
diff --git a/modules/doc/content/python/CSVDiff.md b/modules/doc/content/python/CSVDiff.md
@@ -0,0 +1,203 @@
+# CSVDiff Tool
+
+The supplied CSVDiff tool (csvdiff.py) provides the TestHarness the capability to perform differentiations with comma separated value (CSV) files.
+
+## Basic Usage
+
+In it's simplest behavior, performing a differentiation on two CSV files (a and b) requires the following syntax:
+
+```
+csvdiff.py a b
+```
+
+If the two files are the same, the program will state that it is so, and exit with return code 0. Example:
+
+```
+> echo -e "x\n0" | tee a b
+> moose/scripts/csvdiff.py a b
+Files are the same
+
+> echo $?
+0
+```
+
+A detected difference will be stated, and exit with a non-zero error code. Example:
+
+```
+> echo -e "x\n0.000001" > a
+> echo -e "x\n0.000001000005501" > b
+
+> moose/scripts/csvdiff.py a b
+In file b: The values in column "x" don't match @ t0
+    relative diff:   0.000001 ~ 0.000001 = 0.000006 (5.501e-06)
+
+> echo $?
+1
+```
+
+## Extended Usage
+
+The CSVDiff tool can be used to test specific fields with specific error tolerances. It can also be made to detect when not to perform a differentiation
+if the value being tested is below a certain threshold (floor, or zero). These features can be used as direct arguments to csvdiff.py, or through the use
+of a comparison file.
+
+## Syntax
+
+```
+./csvdiff.py CSV_FILE CSV_FILE [additional arguments]
+```
+
+!alert note
+Always specify the two files you wish to perform a differentiation on, before any other options
+
+| Arguments | Value | Help |
+| :- | :- | :- |
+| `--summary or -s` | *csv_file* | Create a comparison file based on *csv file* |
+| `--comparison-file or -c` | *comparison file* | Use specified comparison file while performing differentiations |
+| `--ignore-fields` | *str* | A list of space separated fields to ignore when performing differentiations |
+| `--diff-fields` | *str* | A list of space separated fields to include when performing differentiations |
+| `--abs-zero` | *str float* | A scientific notiation or float value representing zero (the floor). Any values lower than this amount will be considered zero. (default: 1e-11) |
+| `--relative-tolerance` | *str float* | A float or scientific notation value representing an acceptable degree of tolerance between two opposing values. Any float comparison which falls within this tolerance will be considered the same number. (default 5.5e-6) |
+| `--custom-columns` | *str* | Space separated list of custom field IDs to compare |
+| `--custom-abs-zero` | *str float* | Space separated list of scientific notations or floats for absolute zero, corresponding to the values in --custom-colums |
+| `--custom-rel-err` | *str float* | Space separated list of scientific notations or floats for relative tolerance, corresponding to the values in --custom-colums |
+
+## Comparison File
+
+Using a comparison file is ideal when needing to adjust a complex set of fields and tolerances, which would make for a very long and confusing command line argument. The CSVDiff tool can generate this comparison
+file which, can be used to set the above arguments quickly.
+
+To generate a comparison file, run the CSVdiff tool with the appropriate `--summary csv_file` argument. In the following example, we use `echo` to create a simple csv file. We then instruct csvdiff.py to create
+a comparison file from our csv file, and redirect the output to a file named `a.cmp`:
+
+```
+> echo -e "x\n0" > a
+> moose/scripts/csvdiff.py --sumary a > a.cmp
+> cat a.cmp
+TIME STEPS relative 1 floor 0  # min: 0 @ t0  max: 0 @ t0
+
+GLOBAL VARIABLES relative 5.5e-06 floor 1e-11
+    x                    # min: 0.000e+00 @ t0          max: 0.000e+00 @ t0
+```
+
+You can then edit this file and modify key sections to control tolerances, or instruct csvdiff to ignore an entire field all together.
+
+The 'TIME STEPS' field is a special header, which currently is not used and is present for future capabilities yet to be added to the CSVDiff tool.
+
+The 'GLOBAL VARIABLES' field allows you to change the tolerance for every field present in the CSV file. There are two key parameters; relative and floor. You can modify one or both of the
+values immediately following the parameter to suite your needs. You can also modify the tolerances for each individual field. In the case of our example, 'x' is the only field in our CSV
+file. To adjust only that field's tolerance values, we can add a parameter directly proceeding the 'x' label:
+
+```
+TIME STEPS relative 1 floor 0  # min: 0 @ t0  max: 0 @ t0
+
+GLOBAL VARIABLES relative 5.5e-06 floor 1e-11
+    x relative 5.5e-06   # min: 0.000e+00 @ t0          max: 0.000e+00 @ t0
+```
+
+The above change does nothing, as the relative error value we added is the same as the global relative error value. You can also add both relative and floor tolerances to this line. As well as comments and other logical statements:
+
+```
+GLOBAL VARIABLES relative 5.5e-06 floor 1e-11
+    # loosen tolerances for x
+    x floor 1e-8 relative 2e-04
+
+    # do not run differential tests for field y
+    !y
+
+    z
+```
+
+Here we added comments, loosened both the floor and error tolerances for field 'x'. Field 'y' will be ignored entirely. The 'z' field we left alone, and will end up using the global values set forth by the global variables header line.
+
+## A Real Example
+
+Consider the following two CSV files:
+
+File a:
+
+```
+x,y,z
+0,0,100
+1,0.000001,1
+```
+
+File b:
+
+```
+y,z,x
+0,100,0
+0.000001000005501,1,1
+```
+
+!alert note
+We purposely altered the field header to demonstrate CSVDiff's capability of correctly mapping the field labels between two files.
+
+If we run csvdiff.py on a and b, we see there is a small difference of 5.501e-06 for field 'y' at time step 1 (or simply put, row 1). Just a bit more than what our default global tolerance allows for:
+
+```
+> moose/scripts/csvdiff.py a b
+In file b: The values in column "y" don't match @ t1
+    relative diff:   0.000001 ~ 0.000001 = 0.000006 (5.501e-06)
+```
+
+We can create a comparison file to set forth new tolerances which will allow the two files to be considered identical. Start off by creating a comparison file using file 'a':
+
+```
+> moose/scripts/csvdiff.py --summary a > a.cmp
+```
+
+The following example changes would allow both files to be considered identical:
+
+Loosen the error tolerances for 'y':
+
+```
+TIME STEPS relative 1 floor 0  # min: 0 @ t0  max: 1 @ t1
+
+GLOBAL VARIABLES relative 5.5e-06 floor 1e-11
+    y relative 5.501e-06 # min: 0.000e+00 @ t0          max: 1.000e-06 @ t1
+    x                    # min: 0.000e+00 @ t0          max: 1.000e+00 @ t1
+    z                    # min: 1.000e+00 @ t1          max: 1.000e+02 @ t0
+```
+
+Raise the floor tolerance:
+
+```
+TIME STEPS relative 1 floor 0  # min: 0 @ t0  max: 1 @ t1
+
+GLOBAL VARIABLES relative 5.5e-06 floor 1e-11
+    y floor 1e-5         # min: 0.000e+00 @ t0          max: 1.000e-06 @ t1
+    x                    # min: 0.000e+00 @ t0          max: 1.000e+00 @ t1
+    z                    # min: 1.000e+00 @ t1          max: 1.000e+02 @ t0
+```
+
+Ignore field 'y' by including a not '!' statement:
+
+```
+TIME STEPS relative 1 floor 0  # min: 0 @ t0  max: 1 @ t1
+
+GLOBAL VARIABLES relative 5.5e-06 floor 1e-11
+    !y                   # min: 0.000e+00 @ t0          max: 1.000e-06 @ t1
+    x                    # min: 0.000e+00 @ t0          max: 1.000e+00 @ t1
+    z                    # min: 1.000e+00 @ t1          max: 1.000e+02 @ t0
+```
+
+Removing the offending field from the comparison file:
+
+```
+TIME STEPS relative 1 floor 0  # min: 0 @ t0  max: 1 @ t1
+
+GLOBAL VARIABLES relative 5.5e-06 floor 1e-11
+    x                    # min: 0.000e+00 @ t0          max: 1.000e+00 @ t1
+    z                    # min: 1.000e+00 @ t1          max: 1.000e+02 @ t0
+```
+
+Any one of the above example comparison files, would allow a and b to be considered identical:
+
+```
+> moose/scripts/csvdiff.py a b --comparison-file a.cmp
+Files are the same
+```
+
+!alert note title=Exodiff-like summary report
+The summary report follows the same output style as another popular tool: `exodiff -summary`. By design, the two summary reports are interchangeable.
diff --git a/modules/doc/content/python/TestHarness.md b/modules/doc/content/python/TestHarness.md
@@ -81,7 +81,7 @@ PYTHONPATH functions just as PATH does (semi-colon separate list of paths, for w
 Set the `METHOD` environment variable to one of the following to control which type of application binary to use:
 
 | Variable Name | Argument | Usage |
-| - | - | - |
+| :- | :- | :- |
 | METHOD | opt | TestHarness will use the binary built with optimizations while running tests: `your_appname-opt` (the default) |
 | METHOD | dbg | TestHarness will use the binary built with debugging symbols while running tests: `your_appname-dbg` |
 | METHOD | oprof | TestHarness will use the binary built with code profiling while running tests: `your_appname-oprof` |
@@ -96,7 +96,7 @@ The methods described here can also be controlled via command line arguments. Se
 Set `MOOSE_TERM_FORMAT` to any or all of the following, as well as in a particular order and case (restricted) to control where, what, and how the TestHarness prints that specific item:
 
 | Variable Name | Argument | Usage |
-| - | - | - |
+| :- | :- | :- |
 | MOOSE_TERM_FORMAT | c | Print caveats |
 | MOOSE_TERM_FORMAT | j | Print justification filler |
 | MOOSE_TERM_FORMAT | p | Print pre-formated status (10 character buffer fill) |
@@ -119,7 +119,7 @@ Caveats with the... caveats of MOOSE_TERM_FORMAT; When caveats are requested to
 Set `MOOSE_TERM_COLS` to a positive integer, to set the available terminal column count to this amount:
 
 | Variable Name | Argument | Usage |
-| - | - | - |
+| :- | :- | :- |
 | MOOSE_TERM_COLS | (int) | Allow for this many columns when printing output |
 
 Example, if we set MOOSE_TERM_COLS to 50, we will restrict the default amount of columns the TestHarness normally uses while printing output:

diff --git a/modules/doc/content/python/index.md b/modules/doc/content/python/index.md
@@ -8,3 +8,4 @@ MOOSE.  Click on each one for further information
 | :- | :- |
 | [TestHarness.md] | Tool testing that applications work correctly as code is developed. |
 | [memory_logger.md] | Tool for gathering memory usage of a running process. |
+| [CSVDiff.md] | Tool for performing differentiations with comma separated value (CSV) files. |
diff --git a/python/TestHarness/testers/CSVDiff.py b/python/TestHarness/testers/CSVDiff.py
@@ -20,7 +20,6 @@ def validParams():
         params.addParam('override_columns',   [], "A list of variable names to customize the CSVDiff tolerances.")
         params.addParam('override_rel_err',   [], "A list of customized relative error tolerances.")
         params.addParam('override_abs_zero',   [], "A list of customized absolute zero tolerances.")
-        params.addParam('only_compare_custom', False, "Only compare (and require) the listed custom columns.")
         params.addParam('comparison_file', "Use supplied custom comparison config file.")
         params.addParam('rel_err', "A customized relative error tolerances.")
         params.addParam('abs_zero', "A customized relative error tolerances.")
@@ -45,7 +44,7 @@ def processResultsCommand(self, moose_dir, options):
         commands = []
 
         for file in self.specs['csvdiff']:
-            csvdiff = [os.path.join(moose_dir, 'python', 'csvdiff', 'csvdiff.py')]
+            csvdiff = [os.path.join(moose_dir, 'scripts', 'csvdiff.py')]
 
             # Due to required positional nargs with the ability to support custom positional args (--argument), we need to specify the required ones first
             csvdiff.append(os.path.join(self.specs['test_dir'], self.specs['gold_dir'], file) + ' ' + os.path.join(self.specs['test_dir'], file))
@@ -57,7 +56,12 @@ def processResultsCommand(self, moose_dir, options):
                 csvdiff.append('--abs-zero %s' % (self.specs['abs_zero']))
 
             if self.specs.isValid('comparison_file'):
-                csvdiff.append('--comparison-file %s' % (self.specs['comparison_file']))
+                comparison_file = os.path.join(self.specs['test_dir'], self.specs['comparison_file'])
+                if os.path.exists(comparison_file):
+                    csvdiff.append('--comparison-file %s' % (comparison_file))
+                else:
+                    self.setStatus(self.fail, 'MISSING COMPARISON FILE')
+                    return commands
 
             if self.specs.isValid('override_columns'):
                 csvdiff.append('--custom-columns %s' % (' '.join(self.specs['override_columns'])))
@@ -68,9 +72,6 @@ def processResultsCommand(self, moose_dir, options):
             if self.specs.isValid('override_abs_zero'):
                 csvdiff.append('--custom-abs-zero %s' % (' '.join(self.specs['override_abs_zero'])))
 
-            if self.specs.isValid('only_compare_custom') and self.specs['only_compare_custom']:
-                csvdiff.append('--only-compare-custom')
-
             commands.append(' '.join(csvdiff))
 
         return commands

diff --git a/python/TestHarness/tests/test_CSVDiffer.py b/python/TestHarness/tests/test_CSVDiffer.py
diff --git a/python/TestHarness/tests/test_CSVDiffs.py b/python/TestHarness/tests/test_CSVDiffs.py
@@ -22,3 +22,24 @@ def testDiffs(self):
         self.assertRegexpMatches(e.output, r'test_harness\.test_csvdiff.*?FAILED \(Override inputs not the same length\)')
         self.assertRegexpMatches(e.output, r'test_harness\.test_badfile.*?FAILED \(MISSING GOLD FILE\)')
         self.checkStatus(e.output, failed=2)
+
+    def testMissingComparison(self):
+        """
+        Verify the TestHarness will detect and report a missing comparison file error
+        """
+        with self.assertRaises(subprocess.CalledProcessError) as cm:
+            self.runTests('-i', 'csvdiff_missing_comparison_file')
+
+        e = cm.exception
+        self.assertRegexpMatches(e.output, r'test_harness\.test_csvdiff_comparison_file_missing.*?FAILED \(MISSING COMPARISON FILE\)')
+        self.checkStatus(e.output, failed=1)
+
+    def testCSVDiffScript(self):
+        """
+        Test features of the csvdiff.py script via the TestHarness specification test file.
+
+        Due to the fact we can not pass csvdiff arguments via the ./run_tests script, and that these tests are creating actual
+        CSV output files, the tests will compare against intentionally faulty gold files. These tests would otherwise fail if
+        certain options being passed to csvdiff did not work.
+        """
+        self.runTests('-i', 'csvdiff_tests')