Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve continuous wavelet transform #661

Merged
merged 1 commit into from Aug 23, 2016
Merged

Conversation

@OlgaVorokh
Copy link
Member

@OlgaVorokh OlgaVorokh commented Aug 3, 2016

This PR improves CWT, which was added in #25 and recently improved directly in Gammapy master.

On this PR the follow work was done:

  • quality of CWT code was improved
  • all the data images and information about using kernels was been separated from CWT class into CWTData and CWTKernels classes
  • the tests were added for all the created classes
  • docs was been modernized
  • ipython tutorial was written for CWT algorithm (see here)
R. Terrier, L. Demanet, I.A. Grenier, and J.P. Antoine. Wavelet
analysis of EGRET data. Universite Paris VII & Sap CEA Saclay.
Institut de physique theorique et mathematique, Universite
catholique de Louvain, Louvain-la-Neuve

This comment has been minimized.

@cdeil

cdeil Aug 4, 2016
Member

The most important thing to add is the link to http://adsabs.harvard.edu/abs/2001ICRC....7.2923T so that others can quickly find what publication exactly you are referencing.
The description can be shortened to only contain the key information like this:

R. Terrier et al (2001) "Wavelet analysis of EGRET data"
See http://adsabs.harvard.edu/abs/2001ICRC....7.2923T

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 4, 2016
Author Member

OK, I'll fixed it now.

Set input images.
TODO : RENAME.
Like fit methods in sklearn. I like that.

This comment has been minimized.

@cdeil

cdeil Aug 4, 2016
Member

One option to handle the data could be a separate class:

class CWTData:
    def __init__(self, counts, background, ...):
       self.counts = counts
       self.background = background

and then to use it in the algorithm class:

class CWT:
    def __init__(self, data, parameters):
        self.data = data

i.e. collect all the images in a data object.

And a user would use it like this:

from gammapy.detect import CWT, CWTData

# read input data and pass it to the data container class
data = CWTData(...)
cwt = CWT(data, parameters)
cwt.run()

You could also just use a http://docs.gammapy.org/en/latest/api/gammapy.image.SkyImageCollection.html or an OrderedDict instead of inventing a CWTData.

There's many ways to structure such code / algorithms and we don't have established patterns yet in Gammapy.

@OlgaVorokh - Please think about it and make a proposal what your preference would be for discussion before implementing it.

This comment has been minimized.

@adonath

adonath Aug 4, 2016
Member

Just wanted to support the idea of using a SkyImageCollection as data container. That's the use case it was originally made for. 👍

This comment has been minimized.

@cdeil

cdeil Aug 4, 2016
Member

I'm not sure SkyImageCollection is a good idea. The only methods it has are read, write and info. And for this application, we probably want to have our own read, write and info that's tailored to our application (e.g. how to group HDUs from the iterations in the FITS file, or in the info printout show the most useful numbers for this algorithm.
My gut feeling is that a CWTData class that doesn't sub-class SkyImageCollection would work nicely here and that would be the first thing I try.

But as I said, there's many ways to structure the code ... let's see what @OlgaVorokh proposes.

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 4, 2016
Author Member

I think it is a good idea to create a new class CWTData for all the images, because we have many of them. And also it will be easier to save blocks of images in class and to make a history of images after each iteration (than we can implement store_all_results tool in easier way).

Returns
-------
TODO: for now it's stored as `self.support`. Maybe return

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 4, 2016
Author Member

@cdeil self.support is updated on each iteration, so I think it will be better not to return it. And it will be great to save it in CWTData object.

This comment has been minimized.

@cdeil

cdeil Aug 4, 2016
Member

+1 to not return and hand around some objects between the methods.
Just have it stored in CWTData which is a data member of CWT and access as needed.

TODO: for now it's stored as `self.support`. Maybe return
(i.e. source detection).
TODO: document?

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 4, 2016
Author Member

I improve this method and think that it will be easier to understand what's going on here after my modifications. @cdeil @adonath could you review this to check the logic? And than I write the docs to this method.

self.support[idx_scale] += mask
self.support[idx_scale] = self.support[idx_scale] > 0.

def _inverse_transform(self):

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 4, 2016
Author Member

@cdeil @adonath I don't understand what's going on here. I try to understand tomorrow but maybe you have thoughts about it.

self.support[key] = self.support[key] > 0.
# Produce a list of connex structures in the support
labeled_mask, n_structures = label(mask)
for struct_label in range(1, n_structures + 1):

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 4, 2016
Author Member

@cdeil @adonath Which variant is more suitable here?

for struct_label in range(1, n_structures + 1):
       coords = np.where(labeled_mask == struct_label)

or

for struct_label in range(n_structures):
       coords = np.where(labeled_mask == struct_label + 1)

This comment has been minimized.

@cdeil

cdeil Aug 4, 2016
Member

I prefer the second option.

TODO: give references
Read more in ???.
nsigma (significance max???) : double, optional (default 3.0)

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 4, 2016
Author Member

@cdeil @adonath
I think nsigma and nsigmap should be renamed. I understand what for I need this parameters, but I don't understand how to renamed them. Look at CWT._compute_support() method for better understanding of how to use it, I left comments there.

This comment has been minimized.

@cdeil

cdeil Aug 4, 2016
Member

@OlgaVorokh - nsigma and nsigmap are measures of statistical significance, "multiples of sigma" or "multiples of standard deviation":
https://en.wikipedia.org/wiki/Statistical_significance#Stringent_significance_thresholds_in_specific_fields

In other parts of Gammapy that quantity is called "significance", e.g. here you can see how it relates to a p-value:
https://gammapy.readthedocs.io/en/latest/api/gammapy.stats.significance_to_probability_normal.html

Maybe rename to nsigma to significance_threshold and nsigmap to significance_threshold_clean?
The second threshold is only used for the cleaning of isolated pixel islands that are slightly above significance_threshold.
That's a non-standard thing to do and should be made optional.
Maybe have significance_threshold_clean=None by default and unless the user gives a float number for that, don't do the pixel island cleaning.

Does this help?

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 5, 2016
Author Member

Yes, thanks!

number of scales considered
scale_step : float
base scaling factor
n_scale : int

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 4, 2016
Author Member

Can I do parameters n_scale, min_scale and step_scale optional and if I can what default values I can setup?

@OlgaVorokh
Copy link
Member Author

@OlgaVorokh OlgaVorokh commented Aug 4, 2016

@cdeil @adonath I prefer to implement the follow style of using CWT:

cwt = CWT(... set parameters... )
data = .. some data, maybe CWTData ...
cwt.analyze(data)

So we firstly setup the parameters and than start to analyze data and after computing we can use created CWT for analyzing other data.

What are you thinking about it?

@cdeil
Copy link
Member

@cdeil cdeil commented Aug 4, 2016

No strong opinion on this, maybe small +1 to what you propose, to use the sklearn style of passing data only later to run or whatever it's called and not already to __init__.

@cdeil
Copy link
Member

@cdeil cdeil commented Aug 4, 2016

I think my preference would be to split out the scales and kernels into a separate class.
Mostly to have it nicely grouped, not necessarily for code re-use from other applications, although that might be possible later.

class CWTKernels:
    def __init__(self, scales):
        scales = np.array(scales)
        # compute kernels here, like previously done in `CWT.__init__`

    @classmethod
    def from_scale_grid(cls, n, min, step):
        scales = [min * step ** _ for _ in range(n)]
        return cls(scales)

    def __str__():
        # print useful info about the grid of scales, can be added later

We will be able to re-use that class for other multi-scale algorithms, so I think it's good to split it out now. To use it for CWT:

from gammapy.detect import CWTKernels, CWT

kernels = CWTKernels.from_scale_grid(... parameters ...)
print(scales) # for debugging
cwt = CWT(scales)
cwt.run(data)

OK to split out the scale / kernel handling this way, or do you prefer to keep the scales / kernel handling in CWT.__init__?

self.support = None

# Strange result after each iteration of CWT...
self.mysterious_result = None

@property
def max_scale_image(self):

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 5, 2016
Author Member

@cdeil I moved this method here, but as you can see it will be not work, because we don't have some information about the scales in that class. Maybe it will be better to create kernel class, but I don't know

return self.support

def inverse_transform(self):
if data.support is None:

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 5, 2016
Author Member

Maybe also it will be better to create kernel and scales class for avoiding this situation: just to put object of that class as a parameter of CWTData to initialize the data

remove_isolated : boolean, optional (default True)
If ``True``, isolated pixels will be removed.
Examples

This comment has been minimized.

@cdeil

cdeil Aug 8, 2016
Member

I'm not sure this "short example" in the docstring how to use the classes is really helpful, compared to the alternative: add a link to the IPython notebook. @OlgaVorokh - What do you think?

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 8, 2016
Author Member

Yes, I also think that it will be better to just add a link to the ipython notebook. I remove this example from the docs

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 12, 2016
Author Member

Can you tell me how to add a link in a normal way in the docs?

max_iter : int, optional (default 10)
The maximum number of iterations of the CWT algorithm.
tol : double, optional (default 1e-5)
Tolerance for stopping criterion.

This comment has been minimized.

@cdeil

cdeil Aug 8, 2016
Member

You should add an explanation what the stopping criterion is.
A sentence or formula or something like that. Maybe as part of the main docstring text, or in a Notes section below?

self.model += res * (res > 0)
return res
def __str__(self, ):
# print useful info about the grid of scales, can be added later

This comment has been minimized.

@cdeil

cdeil Aug 8, 2016
Member

Maybe for now just print the list of scales, as a start?

answer : boolean
Answer if CWT has converged.
"""
answer = False

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

You can remove this line answer=False here.
It's unused and overwritten by answer = variance_ratio < self.tol below.

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 11, 2016
Author Member

I can't do that, because on the first iteration self.previous_variance = None, but I want to return some answer in the end.

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

Then it would be more clear code IMO if you start the method with that simple special case:

if not self.previous_variance:
    return False

and then proceed with the other case.

But see my other comment ... I think the class becomes simpler if it's "state-less" (except history list) and the previous variance is handled just in the method that does the iteration (and still available afterwards from the history if needed).

One more thought: maybe make this variance be a property on CWTData?

else:
log.info('Convergence not formally reached at iteration {0}'.format(n_iter + 1))

def get_history(self):

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

Remove get_history?

I don't really see how it's useful.
The user can just access the history attribute, and only if needed make a copy.
Usually they will not want to make a copy, but just inspect the results and avoid the duplication in required memory.

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 12, 2016
Author Member

Done.

First scale used.
step_scale : double
Base scaling factor.
old : boolean (default True)

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

Did you investigate the difference between old and new way to compute the kernels?
If no -- should I do it or explain how to?

Once that's done and results understood this option "old" in CWTKernels should be removed.

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 12, 2016
Author Member

Maybe it will be better, if you explain how to do this

>>> from gammapy.detect import CWTKernels
>>> kernels = CWTKernels(n_scale=2, min_scale=6.0, step_scale=1.3)
>>> print (kernels.scales)
[ 6. 7.8]

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

Can you maybe choose a slightly more interesting example where you have 3 or 4 scales,
and then show the output of kernels.info here?

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 12, 2016
Author Member

Done.

info_list.append('Minimal scale: {:.2f}'.format(self.min_scale))
info_list.append('Step scale: {:.2f}'.format(self.step_scale))
info_list.append('Scales: {0}'.format(str(self.scales)))
# TODO: add information about the kernels.

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

Can you print the kernel sizes here?
I presume it's always the same size in x and y, i.e. it's enough to print the "width" of the kernel?

We always use this style in Gammapy to make info strings:

ss = 'spam\n'
ss += 'spam spam spam\n'
...
return ss

Please do that here as well.
(efficiency of this method is not a concern at all, it's very fast anyways)

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

Besides scale and width, the kernel sum and max would be other useful info parameters.

Maybe you could add a property info_table to CWTKernels that fills a table with those four columns so that one can simply do

kernels.info_table.pprint()

Here's another example where I added such an info_table attribute to a class:

def info_table(self):

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 12, 2016
Author Member

Done

self.cwt = CWT(kernels=self.kernels, significance_threshold=70., keep_history=True)

def test_cwt_kernels_creation(self):
assert_allclose(self.kernels.scales, [6., 7.8])

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

Before we continue with the CWT review, please debug the old / new kernels and add one test that asserts the kernels here.

Having two kernels that are almost the same scale (6, 7.8) is probably non-ideal, because CWT results obtained with them are so similar. Why not make the kernels more different in size, e.g. scale = 3 and 7.8?

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 12, 2016
Author Member

Exchange parameters for kernels. Add tests for kernels too.

assert_allclose(cwt_data.approx_bkg[10, 10], 0.605164586712)

zeros = np.zeros_like(cwt_data.support, dtype=bool)
assert_allclose(cwt_data.support, zeros)

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

If you have supportandmodel` as all zeros, you're running the algorithm with parameters that aren't useful / typical. I think in the example that I wrote when we started we had some useful parameters?

This poisson test image has a single gauss peak on flat background. That source should be excluded in the support mask after running the algorithm.

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 12, 2016
Author Member

With first parameters support and model also were zeros arrays... I try to exchange parameters now..

self._execute_iteration(data=data)
if self.history is not None:
self.history.append(copy.deepcopy(data))
if self._is_converged(data=data):

This comment has been minimized.

@cdeil

cdeil Aug 11, 2016
Member

I think the class would be simpler if you don't make previous_variance an attribute on self, but instead a local variable here in analyze.

All this method does is book-keeping to run iteratively, so I think doing the stopping criterion here is a good fit.

The utility function you call should then just compute the number you use as stopping criterion.
Or, if you really think it makes sense, pass the previous_variance in from here, but don't store it as a CWT attribute.

@OlgaVorokh OlgaVorokh force-pushed the OlgaVorokh:improve_cwt branch from e224497 to 545b6de Aug 12, 2016
# Strange result after each iteration of CWT...
self.mysterious_result = np.zeros(shape_2d)
@property
def max_scale_image(self):

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 18, 2016
Author Member

If I understand correct, this method should look like this. As I understand, maximal image should be created on the same way as we create data.transform_2d in the CWT._inverse_transform.

"""Compute the maximum scale image as '~gammapy.image.SkyImage'."""
# idx_scale_max = np.argmax(self._transform_3d, axis=0)
# return kernels.scales[idx_scale_max] * (self._support.sum(0) > 0)
transform_2d_max = np.max(self._transform_3d, axis=0)

This comment has been minimized.

@OlgaVorokh

OlgaVorokh Aug 18, 2016
Author Member

I don't create property for this image, because we already have transform_2d, but this image we calculate by another way. We can make a parameter and remove property from CWTData.transform_2d(mode='max/sum'). @cdeil what do you think?

@OlgaVorokh OlgaVorokh force-pushed the OlgaVorokh:improve_cwt branch 2 times, most recently from dda5643 to 4353999 Aug 20, 2016
@OlgaVorokh OlgaVorokh force-pushed the OlgaVorokh:improve_cwt branch from ad288ca to 55471fd Aug 21, 2016
@cdeil
Copy link
Member

@cdeil cdeil commented Aug 23, 2016

Merging this no.

@OlgaVorokh - Thank you for this big contribution and your patience with review / merging on this PR!

@cdeil cdeil merged commit e55c751 into gammapy:master Aug 23, 2016
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@cdeil cdeil changed the title Improve detect.CWT Improve continuous wavelet transform Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants