Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear_Bin: function to bin spectrum by a non-integer number of pixels in each dimension. #1306

Merged

Conversation

k8macarthur
Copy link
Contributor

No description provided.

k8macarthur and others added 30 commits July 15, 2016 14:45
…Binning

Revert "Revert "Non Integer Binning""
@thomasaarholt
Copy link
Contributor

I would remove "Rebin Array" in the very first line of the docstring as well. It doesn't fit with the style of the rest of the docstrings.

@k8macarthur
Copy link
Contributor Author

So the only thing failing now is:
def test_rebin(self):
self.signal.estimate_poissonian_noise_variance()
new_s = self.signal.rebin(scale=(2, 2, 1))
var = new_s.metadata.Signal.Noise_properties.variance
assert new_s.data.shape == (1, 2, 6)

  assert var.data.shape == (1, 2, 6)

E assert (2, 2, 1) == (1, 2, 6)
E At index 0 diff: 2 != 1
E Use -v to get the full diff
but I'm not sure how the variance function works. Any ideas?

@thomasaarholt
Copy link
Contributor

thomasaarholt commented May 24, 2017 via email

@francisco-dlp
Copy link
Member

rebin should also rebin the variance in metadata if present. variance should have the same dimensions as the signal if it exists, so the operation consists only on running rebin with the same parameters on variace.

@francisco-dlp
Copy link
Member

@k8macarthur
Copy link
Contributor Author

Ah, thanks, good spot. I missed that the rebin function calls itself...

@francisco-dlp
Copy link
Member

Why did you remove the lazy version?

@k8macarthur
Copy link
Contributor Author

Because as @to266 pointed out, the function no longer runs lazily. Therefore I thought it might be cleaner to remove it completely so that there's no confusion later on if someone wants to add this back in.

@francisco-dlp
Copy link
Member

Wasn't the idea to use the "old" rebin when applicable and the linear one otherwise? In that case, rebin could still run lazily when the problem falls in the scope of the old rebin. Being able to rebin big datasets lazily is not a feature that we would like to miss.

@k8macarthur
Copy link
Contributor Author

Ok, try this then... I don't understand enough about the Lazy stuff but I've put it back in with the same if statements infront. Might need some testing.

@k8macarthur
Copy link
Contributor Author

Tomorrow's a bank holiday weekend but I'll try and do everything I can for final tweaks on this function if you send them tonight.

@francisco-dlp
Copy link
Member

@k8macarthur, how is it going? Would you like a hand with this?

@k8macarthur
Copy link
Contributor Author

k8macarthur commented May 25, 2017 via email

Copy link
Contributor

@to266 to266 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments regarding the code structure that should be addressed before merging. The changes should be fairly straightforward though

determined by the user.
Rebinning of the spectral data will be carried out by two methods.
The fast method will run if the new_shape is a divisor of the original shape.
Otherwise a slower linear interpolation method will be applied. Both methods are incorporated into the :py:meth:`~.signals.eds.rebin` function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not groundbreaking, but it should probably not point to specifically eds.rebin method.

>>> spectrum.data[1, 2, 9] = 5
>>> print(spectrum)
<EDSTEMSpectrum, title: , dimensions: (4, 4|10)>
>>> print ('Sum = ', sum(sum(sum(spectrum.data))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sum over all elements like that: spectrum.data.sum()

>>> test = spectrum.rebin(scale=scale)
>>> print(test)
<EDSTEMSpectrum, title: , dimensions: (8, 8|2)>
>>> print('Sum =', sum(sum(sum(test.data))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sum again

>>> spectrum.data[1, 2, 9] = 5
>>> print(spectrum)
<EDSTEMSpectrum, title: , dimensions: (4, 4|10)>
>>> print ('Sum = ', sum(sum(sum(spectrum.data))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum

>>> test = spectrum.rebin(new_shape)
>>> print(test)
<EDSTEMSpectrum, title: , dimensions: (8, 8|2)>
>>> print('Sum =', sum(sum(sum(test.data))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum

#Series of if statements to check that only one out of new_shape or scale
#has been given. New_shape is then converted to scale. If both or neither
#are given the function raises and error and wont run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout the function, comparisons if something is None should be done using is operator, like so:

if new_shape is None and scale is not None:
    # do stuff

Not both.")
elif new_shape != None:
scale = []
for i, axis in enumerate(self.data.shape):
Copy link
Contributor

@to266 to266 May 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not check that new_shape length is appropriate.

I assume we require that full new shape is specified? I.e with signal shape (5, 3, 2) and new_shape=[2, 1] it fails?
Alternatively we could assume that the untouched shapes are kept constant, or any surplus shape is ignored.

In the first case (fails) the code should be:

#...
elif new_shape is not None:
    if len(new_shape) != len(self.data.shape):
        raise ValueError("Not enough dimensions in the new shape")
    scale = [datashape/newshape for datashape, newshape in zip(self.data.shape, new_shape)]

If we "pad" with ones and ignore the unnecessary, the code should be:

elif new_shape is not None:
    lns = len(new_shape)
    lds = len(self.data.shape)
    if lns < lds:
        new_shape = tuple(new_shape) + self.data.shape[lns:]
    elif lns > lds:
        new_shape = new_shape[:lds]
    scale = [datashape/newshape for datashape, newshape in zip(self.data.shape, new_shape)]

scale.append(self.data.shape[i]/new_shape[i])
else:
new_shape = new_shape
scale = scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this piece of code gets repeated at least 3-4 times, and thus should be moved to a separate function. I would move it to array_tools module or some place similar (i.e. not within a class at all), and its signature should be something along the lines of (feel free to use a better function name):

new_shape, scale = _rebin_preprocess(new_shape, scale, data_shape)

Here the parameters should be whatever the user entered plus the actual current data shape, and the results should be what you'd have at this point in the function where this comment is. The new function should have the necessary corrections according to my previous comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on this branch and I have addressed some of the code issues that you mention already. I'll push it soon, could you have a look? k8macarthur#12

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed, thanks!

@francisco-dlp francisco-dlp merged commit f3fc244 into hyperspy:RELEASE_next_minor May 26, 2017
@thomasaarholt
Copy link
Contributor

thomasaarholt commented May 26, 2017 via email

@k8macarthur
Copy link
Contributor Author

Excellent! Sorry I wasn't around for the final push.

@k8macarthur k8macarthur deleted the Non_Integer_Binning branch May 30, 2017 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants