Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectrum rebinning #29

Closed
bplimley opened this issue Mar 8, 2017 · 23 comments · Fixed by #78
Closed

Spectrum rebinning #29

bplimley opened this issue Mar 8, 2017 · 23 comments · Fixed by #78

Comments

@bplimley
Copy link
Contributor

bplimley commented Mar 8, 2017

  • Rebinning according to an input bin edges vector
  • rebin_like(spec) to use the binning from another spectrum

Please expand on this, I'm not very familiar with rebinning methods or needs.

@markbandstra
Copy link
Member

markbandstra commented Mar 10, 2017

@thjoshi and @jccurtis might have some thoughts on this. There are many ways one could do this, off the top of my head:

  • Preserve bin edges: Require the preservation of Poisson statistics by not splitting any bins. This of course which would highly restrict the possible rebinnings one could do, e.g., spectrum of 1024 bins can be downsampled to 512 bins by combining adjacent bins.
  • Interpolation: Allow counts to be split into fractions that are moved into the new bins. Spectrum is converted from array of ints to array of floats.
  • Stochastic rebinning: Convert the histogram to list-mode energies by sampling energies uniformly from each bin, then bin those energies into the new binning. The statistical properties are largely preserved:

Someone else might know more about what is practically and mathematically desirable.

This comment was moved over from #28 and expanded. Thanks @chunhochow for the references and finding a relevant discussion in Knoll, 4th Edition, 18.IV.B.

@chunhochow
Copy link
Contributor

chunhochow commented Mar 10, 2017

Also found this: https://github.com/jhykes/rebin, which claims to implement the (interpolation) routine described in Knoll (4th ed) 18.IV.B. It has a general copyleft license (attribute + copyright notice + disclaimer).

@jccurtis
Copy link
Collaborator

@thjoshi - I'll put a version of the interpolation rebin from GRPPinto bq unless you have objection.

@bplimley
Copy link
Contributor Author

bplimley commented Apr 5, 2017

I will try to start using Bq for the FERMI project if we can get this feature soon. @jccurtis @thjoshi can we start putting this together? I can help integrate code if needed.

@jccurtis
Copy link
Collaborator

jccurtis commented Apr 6, 2017 via email

@bplimley
Copy link
Contributor Author

bplimley commented Apr 10, 2017

If a spectrum is rebinned with arbitrary (unequal) bin widths, you can't assign a calibration curve based on those new bins, right? In contrast to bins that are equal width (in channel or energy) which just scale a variable in the calibration curve.

For arbitrary bin widths, it seems to me we would either abandon the idea of a "curve" and just keep track of the bin edges in keV, or remember the original bins that the calibration curve applies to.

Looking forward to seeing the GRPP version when you have a chance @jccurtis and the code will help address my questions/thoughts.

@jccurtis
Copy link
Collaborator

jccurtis commented Apr 12, 2017

I've created a new branch (feature-rebinning) to integrate the interpolation rebinning functionality as written by @thjoshi. I added 2 tests (one for count summation) and another for plotting). I also added a tool to spectrum.py to turn bin edges and heights (counts) to x/y for stepped plotting. I will modify the rebin function with numba but it is already pretty fast (and definitely enough for RadWatch's use).

@thjoshi - Please advise if you think usage of this code is inappropriate or poses another conflict. I think we should squash the commits before merging to remove the code which is currently a blatant rip from GRPP.

@bplimley
Copy link
Contributor Author

The counts aren't quite lining up. The relative difference in total counts ranges from about 2e-5 to 2e-3. I'm documenting here for reference.

# some rearranging has been done for clarity
In [1]: import becquerel as bq   # on feature-rebinning branch
In [2]: import numpy as np

In [3]: old_data = np.random.poisson(lam=10, size=8192).astype(int)
In [4]: old_edges = np.arange(8193) - 0.5
In [5]: old_edges[:3]
Out[5]: array([-0.5,  0.5,  1.5])
In [6]: old_edges[-3:]
Out[6]: array([ 8189.5,  8190.5,  8191.5])
In [7]: new_edges = np.arange(-1, 8195, 0.3)
In [8]: new_data = bq.core.rebin(old_data, old_edges, new_edges)
In [10]: np.sum(old_data)
Out[10]: 82572.0
In [11]: np.sum(new_data)
Out[11]: 82573.600000000006
In [17]: (np.sum(new_data) - np.sum(old_data)) / np.sum(old_data)
Out[17]: 1.9377028532745009e-05

In [12]: old_counts = np.random.poisson(lam=50, size=8192)
In [13]: new_counts = bq.core.rebin(old_counts, old_edges, new_edges)
In [14]: np.sum(old_counts)
Out[14]: 408267
In [15]: np.sum(new_counts)
Out[15]: 408276.59999999998
In [16]: (np.sum(new_counts) - np.sum(old_counts)) / np.sum(old_counts)
Out[16]: 2.3514023910766036e-05

In [19]: old_counts = np.random.poisson(lam=5000, size=8192)
In [20]: new_counts = bq.core.rebin(old_counts, old_edges, new_edges)
In [23]: np.sum(old_counts), np.sum(new_counts)
Out[23]: (40961581, 40962619.199999996)
In [21]: (np.sum(new_counts) - np.sum(old_counts)) / np.sum(old_counts)
Out[21]: 2.5345701377969997e-05

In [24]: old_counts = np.random.poisson(lam=5000, size=256)
In [25]: old_edges = np.arange(257) - 0.5
In [26]: new_edges = np.arange(-1, 267, 1.3)
In [27]: new_counts = bq.core.rebin(old_counts, old_edges, new_edges)
In [28]: np.sum(old_counts), np.sum(new_counts)
Out[28]: (1279119, 1281618.0)
In [29]: (np.sum(new_counts) - np.sum(old_counts)) / np.sum(old_counts)
Out[29]: 0.0019536884371196112

The test is also not passing because of this. At least on my computer - are you getting a passing test @jccurtis?

@jccurtis
Copy link
Collaborator

@bplimley The test passes for me. I'll need to look at this tomorrow.

@jccurtis
Copy link
Collaborator

Hmmm, I just ran my tests again and they failed. Some trickery is afoot!

@bplimley
Copy link
Contributor Author

I just pushed more thorough parametrized tests for rebin. It seems to be fine when the first and last edge values of the new bins are identical to the first and last edge values of the old bins. But when the new bins cover a larger range, they get more counts somehow.

@thjoshi
Copy link
Collaborator

thjoshi commented Apr 18, 2017

Yo, negative value for new_edges is the problem.
This could be fixed with a small modification to the code. @jccurtis has volunteered to solve that.

@jccurtis
Copy link
Collaborator

jccurtis commented May 5, 2017

I've updated rebin.py to resolve the previously noted bug. Another issue is the case of overflow/underflow bins for output bins that don't fully cover the input bin structure. I could add a flag which optionally handles this. Thoughts?

@jccurtis
Copy link
Collaborator

jccurtis commented May 5, 2017

I also have a couple more edge cases to flush out which I'll get to next week.

@bplimley
Copy link
Contributor Author

bplimley commented May 5, 2017

The overflow/underflow bins sound useful. What would be default? No overflow/underflow?

If we are unsure how we want to implement it, we can also make it an issue and postpone it for later.

@jccurtis do you want to make a pull request so we can take a look?

@jccurtis
Copy link
Collaborator

jccurtis commented May 5, 2017 via email

@jccurtis
Copy link
Collaborator

I've updated the rebin code with numba, docs and a 2D option for multiple spectra. I still need to flush out some bugs for underflow/overflow edge cases

@chunhochow
Copy link
Contributor

np.searchsorted() may be faster than

while in_edges[in_idx] < out_edges[out_idx]:
    in_idx += 1
    in_idx -= 1

?

searchsorted is numba-compatible. I feel that since in_idx moves slowly it probably would be faster to do it the current way (benchmarks needed), but searchsorted() should definitely be faster for the initial guess rather than the current in_idx = 1.

while in_edges[in_idx] < out_edges[out_idx]:
in_idx += 1
in_idx -= 1

@chunhochow
Copy link
Contributor

chunhochow commented May 22, 2017

I rewrote @thjoshi / @jccurtis 's code. I need help merging the pytest files though because they've diverged too much and I have no clue what's going on in there.

======

I also looked into stochastic rebinning and the two papers referenced in Knoll. See Mark's comment which is actually inaccurate.

There's two stochastic methods here:
listmode-method, AKA Mark's method is described above in the comment.
knoll/sao-paolo-method (since both papers are from Sao Paolo I'm christening it that) is to take the interpolation exact result and sampling from that, while still conserving the total counts.

The paper says that the sao-paolo-method should preserve statistical properties of the bin-counts (e.g. Poisson errors). The deterministic interpolation/Tenzing-Joey's method would give a smaller chi-squared value because the statistical fluctuation is less.

I'm not sure/haven't thought out if Mark's method preserves the statistical properties. Or whether the sao-paolo-method solves the moiré issue when the in/out bin widths are different.

@bplimley
Copy link
Contributor Author

Thanks @chunhochow! I haven't had the time to finish absorbing what you're saying about the methods. But if you have a bunch of rebin code that you are happy with, can you make a pull request? (Despite the impending Travis build crisis, which isn't really a crisis, just run tests yourself before pushing or merging)

@chunhochow
Copy link
Contributor

It should probably be cleaned up a bit and tests written before it be pul requested.

@chunhochow
Copy link
Contributor

chunhochow commented Sep 12, 2017

Todo/Moved:

@thjoshi
Copy link
Collaborator

thjoshi commented Aug 21, 2018

@chunhochow @jccurtis when is rebinning coming to Spectrum? I was trying to use Becquerel last week for a project and was quite surprised to find that any form of rebinning wasn't in the primary branch. If additional features come later that makes sense, but in my view there should be a basic functionality already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants