-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38739: add a utility function for Tony-scaling #44
Conversation
python/lsst/summit/utils/utils.py
Outdated
histVals, binEdges = np.histogram(flatData, bins=nBins, range=(minVal, maxVal)) | ||
|
||
cmap = np.arange(nColors, dtype=np.int64) | ||
cdf = cmap[np.floor((cmap.size - 1)*np.cumsum(histVals)/size).astype(np.int64)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those two lines can just be:
cdf = ((nColors - 1)*np.cumsum(histVals)/size).astype(np.int64)
ca40328
to
f5620ec
Compare
python/lsst/summit/utils/utils.py
Outdated
minVal = np.floor(np.nanmin(flatData)) - 1.0 | ||
maxVal = np.ceil(np.nanmax(flatData)) + 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given you're using floor
and ceil
here, do we need the -1 and +1 on top of that? Shouldn't those be enough to guarantee covering the whole range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About adjusting the maxVal
, the as-is digitizeData
implementation would fail in the case that the largest data point is an integer and falls on rightmost edge, because it'd then want the non-existing (maxVal - minVal)
bin. Making maxVal
a little larger does the trick.
Not adjusting the minVal
would work too, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, that makes sense, thanks.
python/lsst/summit/utils/utils.py
Outdated
Parameters | ||
---------- | ||
data : `np.array` | ||
The input image data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it really hard to care about stuff like this, but if we're going by the DM style guide these should end with a period...
python/lsst/summit/utils/utils.py
Outdated
Parameters | ||
---------- | ||
data : `np.array` | ||
The input image data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And again.
tests/test_utils.py
Outdated
# We understand that our algorithm gives very large rounding error | ||
# compared to the generic numpy method. But still test it. | ||
np.random.seed(1234) | ||
data = np.random.normal(100, 10, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth having some 2d data in here (to make sure we don't break the ravelling) and maybe something with some bigger values, to make sure things will work when we have values up to 2^18 (which is the number of bits in our ADC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 about 2d data.
I hesitate about much bigger values. Two numbers are also of interest here:
(maxVal - minVal)
which is then the histogram range and we fix the bin size to 1nColors
which we'd just use 256
Larger values means larger imprecision too. Sure we can adjust the test to allow that. But I'm not sure how meaningful it really is.
Thoughts? How about the updated numbers in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than iterating in the comments here, I just pushed a commit with the changes I was imagining - I know that's a little irregular, I hope that's OK, obviously feel free to edit it however you like!
I added: several different ranges and numbers of colours, and also set an element in the array to nan
to check we're nan-safe (and therefore changed the np.quantile
call to be np.nanqualite
)
Let me know what you think (and if you don't like me pushing to your branch in general!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good ideas in testing a variety of parameters, in testing nan
, and so on.
The only thing is that I cannot convince myself that data of (mean, width)= (2, 0.1) would be meaningful here. So I adjusted it to be a little bigger, yet small for possible input images I think. Also I'm adjusting the tolerance slightly depending on the inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice - it was indeed right on the edge of the possible tolerance when I left it like that, so that's a very good shout. I deliberately made it a (2, 0.1) because I thought testing something that's way way smaller than you'd ever get here might help for Eli's nanojanksy case, but perhaps it's not relevant/worth it, and we can leave that for the next ticket. I'm fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most pixels on such image would be approximated into just 1 and 2 with the current implementation. Probably easier if we have some example images. I'll leave it to a future ticket.
Its output can be used with matplotlib.colors.BoundaryNorm to plot the data with the correct colorbar.
nBin needs to be larger than (maxVal - minVal) to work. Instead of adjusting the nBin depending on the data, just use (maxVal - minVal) bins, that is, binSize = 1. Also, nans are ignored in the histogram, hence size sacling too.
f5620ec
to
1264738
Compare
0f33a1d
to
be90181
Compare
No description provided.