Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38739: add a utility function for Tony-scaling #44

Merged
merged 4 commits into from
Apr 25, 2023
Merged

Conversation

hsinfang
Copy link
Contributor

No description provided.

histVals, binEdges = np.histogram(flatData, bins=nBins, range=(minVal, maxVal))

cmap = np.arange(nColors, dtype=np.int64)
cdf = cmap[np.floor((cmap.size - 1)*np.cumsum(histVals)/size).astype(np.int64)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two lines can just be:

cdf = ((nColors - 1)*np.cumsum(histVals)/size).astype(np.int64)

@hsinfang hsinfang force-pushed the tickets/DM-38739 branch 6 times, most recently from ca40328 to f5620ec Compare April 21, 2023 22:41
Comment on lines 853 to 854
minVal = np.floor(np.nanmin(flatData)) - 1.0
maxVal = np.ceil(np.nanmax(flatData)) + 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given you're using floor and ceil here, do we need the -1 and +1 on top of that? Shouldn't those be enough to guarantee covering the whole range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adjusting the maxVal, the as-is digitizeData implementation would fail in the case that the largest data point is an integer and falls on rightmost edge, because it'd then want the non-existing (maxVal - minVal) bin. Making maxVal a little larger does the trick.

Not adjusting the minVal would work too, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, that makes sense, thanks.

Parameters
----------
data : `np.array`
The input image data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it really hard to care about stuff like this, but if we're going by the DM style guide these should end with a period...

Parameters
----------
data : `np.array`
The input image data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And again.

# We understand that our algorithm gives very large rounding error
# compared to the generic numpy method. But still test it.
np.random.seed(1234)
data = np.random.normal(100, 10, 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth having some 2d data in here (to make sure we don't break the ravelling) and maybe something with some bigger values, to make sure things will work when we have values up to 2^18 (which is the number of bits in our ADC).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 about 2d data.

I hesitate about much bigger values. Two numbers are also of interest here:

  • (maxVal - minVal) which is then the histogram range and we fix the bin size to 1
  • nColors which we'd just use 256
    Larger values means larger imprecision too. Sure we can adjust the test to allow that. But I'm not sure how meaningful it really is.

Thoughts? How about the updated numbers in the test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than iterating in the comments here, I just pushed a commit with the changes I was imagining - I know that's a little irregular, I hope that's OK, obviously feel free to edit it however you like!

I added: several different ranges and numbers of colours, and also set an element in the array to nan to check we're nan-safe (and therefore changed the np.quantile call to be np.nanqualite)

Let me know what you think (and if you don't like me pushing to your branch in general!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good ideas in testing a variety of parameters, in testing nan, and so on.

The only thing is that I cannot convince myself that data of (mean, width)= (2, 0.1) would be meaningful here. So I adjusted it to be a little bigger, yet small for possible input images I think. Also I'm adjusting the tolerance slightly depending on the inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice - it was indeed right on the edge of the possible tolerance when I left it like that, so that's a very good shout. I deliberately made it a (2, 0.1) because I thought testing something that's way way smaller than you'd ever get here might help for Eli's nanojanksy case, but perhaps it's not relevant/worth it, and we can leave that for the next ticket. I'm fine either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most pixels on such image would be approximated into just 1 and 2 with the current implementation. Probably easier if we have some example images. I'll leave it to a future ticket.

Its output can be used with matplotlib.colors.BoundaryNorm to plot
the data with the correct colorbar.
nBin needs to be larger than (maxVal - minVal) to work.
Instead of adjusting the nBin depending on the data, just
use (maxVal - minVal) bins, that is, binSize = 1.

Also, nans are ignored in the histogram, hence size sacling too.
@hsinfang hsinfang merged commit 9d1cf9f into main Apr 25, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants