DM-38739: add a utility function for Tony-scaling #44

hsinfang · 2023-04-18T21:29:28Z

No description provided.

erykoff · 2023-04-18T21:35:37Z

python/lsst/summit/utils/utils.py

+    histVals, binEdges = np.histogram(flatData, bins=nBins, range=(minVal, maxVal))
+
+    cmap = np.arange(nColors, dtype=np.int64)
+    cdf = cmap[np.floor((cmap.size - 1)*np.cumsum(histVals)/size).astype(np.int64)]


Those two lines can just be:

cdf = ((nColors - 1)*np.cumsum(histVals)/size).astype(np.int64)

mfisherlevine · 2023-04-24T19:01:40Z

python/lsst/summit/utils/utils.py

+    minVal = np.floor(np.nanmin(flatData)) - 1.0
+    maxVal = np.ceil(np.nanmax(flatData)) + 1.0


Given you're using floor and ceil here, do we need the -1 and +1 on top of that? Shouldn't those be enough to guarantee covering the whole range?

About adjusting the maxVal, the as-is digitizeData implementation would fail in the case that the largest data point is an integer and falls on rightmost edge, because it'd then want the non-existing (maxVal - minVal) bin. Making maxVal a little larger does the trick.

Not adjusting the minVal would work too, I think.

Ah, I see, that makes sense, thanks.

mfisherlevine · 2023-04-24T19:06:54Z

python/lsst/summit/utils/utils.py

+    Parameters
+    ----------
+    data : `np.array`
+        The input image data


I find it really hard to care about stuff like this, but if we're going by the DM style guide these should end with a period...

mfisherlevine · 2023-04-24T19:07:13Z

python/lsst/summit/utils/utils.py

+    Parameters
+    ----------
+    data : `np.array`
+        The input image data


mfisherlevine · 2023-04-24T19:08:45Z

tests/test_utils.py

+        # We understand that our algorithm gives very large rounding error
+        # compared to the generic numpy method. But still test it.
+        np.random.seed(1234)
+        data = np.random.normal(100, 10, 100)


I think it might be worth having some 2d data in here (to make sure we don't break the ravelling) and maybe something with some bigger values, to make sure things will work when we have values up to 2^18 (which is the number of bits in our ADC).

👍 about 2d data.

I hesitate about much bigger values. Two numbers are also of interest here:

(maxVal - minVal) which is then the histogram range and we fix the bin size to 1

nColors which we'd just use 256
Larger values means larger imprecision too. Sure we can adjust the test to allow that. But I'm not sure how meaningful it really is.

Thoughts? How about the updated numbers in the test?

Rather than iterating in the comments here, I just pushed a commit with the changes I was imagining - I know that's a little irregular, I hope that's OK, obviously feel free to edit it however you like!

I added: several different ranges and numbers of colours, and also set an element in the array to nan to check we're nan-safe (and therefore changed the np.quantile call to be np.nanqualite)

Let me know what you think (and if you don't like me pushing to your branch in general!)

Good ideas in testing a variety of parameters, in testing nan, and so on.

The only thing is that I cannot convince myself that data of (mean, width)= (2, 0.1) would be meaningful here. So I adjusted it to be a little bigger, yet small for possible input images I think. Also I'm adjusting the tolerance slightly depending on the inputs.

Ah, nice - it was indeed right on the edge of the possible tolerance when I left it like that, so that's a very good shout. I deliberately made it a (2, 0.1) because I thought testing something that's way way smaller than you'd ever get here might help for Eli's nanojanksy case, but perhaps it's not relevant/worth it, and we can leave that for the next ticket. I'm fine either way.

Most pixels on such image would be approximated into just 1 and 2 with the current implementation. Probably easier if we have some example images. I'll leave it to a future ticket.

Its output can be used with matplotlib.colors.BoundaryNorm to plot the data with the correct colorbar.

nBin needs to be larger than (maxVal - minVal) to work. Instead of adjusting the nBin depending on the data, just use (maxVal - minVal) bins, that is, binSize = 1. Also, nans are ignored in the histogram, hence size sacling too.

erykoff reviewed Apr 18, 2023

View reviewed changes

hsinfang force-pushed the tickets/DM-38739 branch 6 times, most recently from ca40328 to f5620ec Compare April 21, 2023 22:41

mfisherlevine approved these changes Apr 24, 2023

View reviewed changes

hsinfang added 3 commits April 24, 2023 16:35

Add Eli's implementation in CDF-scaling the image data

509512e

Add a utility getQuantiles

71fcfdb

Its output can be used with matplotlib.colors.BoundaryNorm to plot the data with the correct colorbar.

Update algorithm details in the new CDF-scaling

1264738

nBin needs to be larger than (maxVal - minVal) to work. Instead of adjusting the nBin depending on the data, just use (maxVal - minVal) bins, that is, binSize = 1. Also, nans are ignored in the histogram, hence size sacling too.

hsinfang force-pushed the tickets/DM-38739 branch from f5620ec to 1264738 Compare April 24, 2023 23:37

Add more test values, color ranges and check nans

be90181

hsinfang force-pushed the tickets/DM-38739 branch from 0f33a1d to be90181 Compare April 25, 2023 18:01

hsinfang merged commit 9d1cf9f into main Apr 25, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-38739: add a utility function for Tony-scaling #44

DM-38739: add a utility function for Tony-scaling #44

hsinfang commented Apr 18, 2023

erykoff Apr 18, 2023

mfisherlevine Apr 24, 2023

hsinfang Apr 24, 2023

mfisherlevine Apr 24, 2023

mfisherlevine Apr 24, 2023

mfisherlevine Apr 24, 2023

mfisherlevine Apr 24, 2023

hsinfang Apr 24, 2023

mfisherlevine Apr 25, 2023 •

edited

hsinfang Apr 25, 2023

mfisherlevine Apr 25, 2023

hsinfang Apr 25, 2023

		minVal = np.floor(np.nanmin(flatData)) - 1.0
		maxVal = np.ceil(np.nanmax(flatData)) + 1.0

DM-38739: add a utility function for Tony-scaling #44

DM-38739: add a utility function for Tony-scaling #44

Conversation

hsinfang commented Apr 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfisherlevine Apr 25, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfisherlevine Apr 25, 2023 •

edited