DM-34355: Improve analysis_drp scatterPlot unit test #32

taranu · 2022-04-07T18:53:33Z

No description provided.

TallJimbo

I really like a lot of this - what separates it from most of our other (disliked) hard-coded-numbers tests is that it has an easy-to-use flag that regenerates the reference data, and there are some really good ideas here for testing plot-generation that I never would have come up with on my own. It will still require frequent updates to the test data, but for the most part that looks easy enough to not be a burden on developers, who really should be looking at the plots they've changed before merging anything to main anyway. I've made one comment on the text comparisons that I hope could make them less fragile without much more work.

That said, I think the PNG-image comparison has to go - turning it into a warning just lets the images get out of sync, so that seems worse than an actual failing test rather than an improvement to me. The big problem IMO is the need to add a new PNG to the repo (git never forgets, so we're not replacing the old one!) whenever the plot changes; that just doesn't scale.

TallJimbo · 2022-04-13T16:54:55Z

python/lsst/analysis/drp/plotUtils.py

+    texts : `List[str]`
+        A list of all text strings (title and axis/legend/tick labels).
+    line_xys : `List[np.ndarray]`
+        A list of all line _xy attributes.


Shape information would be very helpful to document here, too.

Does `List[numpy.ndarray]`, (N, 2) make sense?

I can guess what that means, assuming the sphinx/documenteer logic doesn't mangle it, but I think it'd be just better to put it in the prose description, e.g.

Suggested change

A list of all line _xy attributes.

A list of all line ``_xy`` attributes. Each element is an array of shape ``(N, 2)``.

TallJimbo · 2022-04-13T17:17:14Z

tests/test_scatterPlot.py

+            with open(filename, 'w') as f:
+                f.writelines(f'{text}\n' for text in texts)
+
+        self.assertTrue(filecmp.cmp(filename_texts_tmp, filename_texts_ref))


I think we could make this check a bit more robust to minor changes by replacing the full-file comparisons with logic that checks that the new plot's text lines (as a set of str lines) are a superset of those saved in the file. That would let us choose save just the lines we think are particularly important (e.g. by git interactive staging after running with resave=True), and more importantly it would insulate us from matplotlib deciding to order the lines differently in response to e.g. figure elements being moved around or added to the figure in a different order.

TallJimbo · 2022-04-13T17:19:23Z

tests/test_scatterPlot.py

+            if resave:
+                np.savetxt(filename, line)
+            arr = np.loadtxt(filename)
+            assert np.isclose(arr, line).flat


Are you sure this is doing what you think it's doing? I would have thought coercing a .flat access to a boolean would not automatically imply something like np.all.

You could also just use assertFloatsAlmostEqual (from lsst.utils.tests.TestCase).

I used assertFloatsAlmostEqual as suggested, but the default tolerances failed on macOS at the 1e-12 level, which is somewhat concerning (I hadn't run Jenkins before). I set atol=1e-8, rtol=1e-10 for now, but I'm not sure how robust that's going to be.

Given that the result is normally consumed by just human eyes, I'm not particularly bothered by really any loss of precision here (who knows what shenanigans matplotlib might be up to?) and would recommend tolerances around 1E-3.

taranu force-pushed the tickets/DM-34355 branch 2 times, most recently from 8f25d08 to 5411692 Compare April 12, 2022 18:57

TallJimbo reviewed Apr 13, 2022

View reviewed changes

taranu added 5 commits April 13, 2022 14:57

Add license preambles

506c318

Add statistics.py with sigmaMad partial func

1b358d1

Remove unused assignments

d327570

Cache modified copy of patch colormap

5ee8f5c

Add get_and_remove_axis/figure_text

b8fa5a5

taranu force-pushed the tickets/DM-34355 branch 2 times, most recently from 3ea70c2 to faca789 Compare April 13, 2022 20:12

Test scatterPlot lines/text diff only

e798d94

taranu force-pushed the tickets/DM-34355 branch from faca789 to e798d94 Compare April 14, 2022 02:06

taranu merged commit a762b6f into main Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-34355: Improve analysis_drp scatterPlot unit test #32

DM-34355: Improve analysis_drp scatterPlot unit test #32

taranu commented Apr 7, 2022

TallJimbo left a comment

TallJimbo Apr 13, 2022

taranu Apr 13, 2022

TallJimbo Apr 13, 2022

TallJimbo Apr 13, 2022

TallJimbo Apr 13, 2022

taranu Apr 13, 2022

TallJimbo Apr 13, 2022

	A list of all line _xy attributes.
	A list of all line ``_xy`` attributes. Each element is an array of shape ``(N, 2)``.

DM-34355: Improve analysis_drp scatterPlot unit test #32

DM-34355: Improve analysis_drp scatterPlot unit test #32

Conversation

taranu commented Apr 7, 2022

TallJimbo left a comment

Choose a reason for hiding this comment

TallJimbo Apr 13, 2022

Choose a reason for hiding this comment

taranu Apr 13, 2022

Choose a reason for hiding this comment

TallJimbo Apr 13, 2022

Choose a reason for hiding this comment

TallJimbo Apr 13, 2022

Choose a reason for hiding this comment

TallJimbo Apr 13, 2022

Choose a reason for hiding this comment

taranu Apr 13, 2022

Choose a reason for hiding this comment

TallJimbo Apr 13, 2022

Choose a reason for hiding this comment