Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scipy.signal.spectral.lombscargle is unclear about normalisation of Periodogram (Trac #1637) #2162

Open
scipy-gitbot opened this issue Apr 25, 2013 · 16 comments
Labels

Comments

@scipy-gitbot
Copy link

Original ticket http://projects.scipy.org/scipy/ticket/1637 on 2012-03-30 by trac user hamogu, assigned to @cournape.

While this routine calculates the Lomb-Scargle periodogram beautifully, it does not address the normalization. This is mentioned in passing in the 'Examples' section of the documentation, but I suggest to either

  • automatically normalize all input arrays 'y' in such a way that y.mean()=0 and y.var()=1. That is the normalization used in the original Scargle 1982 article. The advantage of this approach is that the usual formulas for the false alarm probablility for LS periodograms can be used on the output.

or

  • extent the documentation string in the following way:

    Parameters

    x : array_like
    Sample times.
    y : array_like
    Measurement values. If y obeys y.mean()=0 and y.var()=1 then the periodogram returned by this function will follow the normalization of [2] and thus allows to calculate the false alarm probability (FAP).
    freqs : array_like
    Angular frequencies for output periodogram.

Additionally, it might be useful to provide a function in signal.spectral to calculate the value of the periodogram for a given FAP.

@scipy-gitbot
Copy link
Author

@rgommers wrote on 2012-04-01

@pschella: as the author of this function, can you please comment on this issue? Thanks!

@scipy-gitbot
Copy link
Author

trac user pschella wrote on 2012-04-01

Although I like the idea of normalizing in principle, I do think we should give the user the option of turning it off for increased speed.
Perhaps adding an optional normalize=True keyword argument (default True or False?) would be best.
How do other scipy functions, such as fft, handle normalization?

@scipy-gitbot
Copy link
Author

@rgommers wrote on 2012-04-01

For fft a choice (fft nisn't normalized, ifft is normalized by 1/N) is made and that choice is documented. No choice is given to the user, because multiplying the output by a constant factor is just as easy as specifying normalization with a keyword.

The normalization here seems to be more than multiplying by a constant though, so it may be good to implement. For backwards compatibility the default should be normalize=False I think.

@jradavenport
Copy link

Just ran in to this problem, 2 years later.

A simple 1 sentence addition to the docstring would have saved me a couple hours of frustration. It could be as simple as:

Note: Y data must be registered so the mean is zero.

@pschella
Copy link

Sorry, this dropped of my radar completely.
Perhaps this would indeed suffice:

y : array_like
    Measurement values (must be registered so the mean is zero).

But I will have to look into it in a bit more detail next week before I send a pull request upstream. Thanks!

Regards,

Pim Schellart

On 11 Jun 2015, at 23:05, James Davenport notifications@github.com wrote:

Just ran in to this problem, 2 years later.

A simple 1 sentence addition to the docstring would have saved me a couple hours of frustration. It could be as simple as:

Note: Y data must be registered so the mean is zero.


Reply to this email directly or view it on GitHub.

@rgommers
Copy link
Member

@pschella if you're going to look at this, @jakevdp just happened to write a blog post about this exact same topic (and a few more): https://jakevdp.github.io/blog/2015/06/13/lomb-scargle-in-python/.

@pschella
Copy link

Thanks for the info! It is definitely an interesting post. I think it would
be great to (also) have an N log N implementation in scipy. Maybe they
would be willing to incorporate their code or otherwise perhaps I can find
time to write one based on the paper. Not sure about fixing the mentioned
'quirks' though, since that would break users existing code. Perhaps
documenting the assumptions a bit better and adding an alternative N log N
implementation is the best approach?
Op 14 jun. 2015 11:33 schreef "Ralf Gommers" notifications@github.com:

@pschella https://github.com/pschella if you're going to look at this,
@jakevdp https://github.com/jakevdp just happened to write a blog post
about this exact same topic (and a few more):
https://jakevdp.github.io/blog/2015/06/13/lomb-scargle-in-python/.


Reply to this email directly or view it on GitHub
#2162 (comment).

@rgommers
Copy link
Member

Quirk 1, mean has to be zero: if the results are simply incorrect if this is not the case, then this can be fixed without adding a keyword right? Return values may change then, but only for cases where the current return value isn't right anyway.

Quirk 2, non-normilized output: this is more a convention than the current return values being incorrect, so here I'd say add a normalize=False keyword.

Quirk 3, angular frequencies: this is correctly documented and not a big deal, so leave as is?

@rgommers
Copy link
Member

Adding a See Also section to the docstring which links to the AstroML and Gatspy would also be useful.

@rgommers
Copy link
Member

Adding an O(N logN) method sounds attractive, would be a nice enhancement. Do note though that that's not the only thing that Gatspy has over the Scipy implementation - see for example the period grid selection. So linking to Gatspy in the docs remains a good idea.

@jakevdp
Copy link
Member

jakevdp commented Jun 14, 2015

One place where automatically pre-centering the data would break current code: if you are passing-in an array of 1s to find the power associated with your observing window, centering the data leads to garbage.

@pschella
Copy link

Automatically subtracting the mean first will give a computational overhead
when the mean is already zero. Therefore I think both operations should be
optional. Attached is an (untested) proposed patch that introduces a new
pure Python function lombscarge with the optional arguments "normalize" and
"pre_center" which calls the Cython lombscarge routine (renamed to
_lombscargle). Perhaps a more elegant solution is prefered instead. Your
thoughts?

On Sun, Jun 14, 2015 at 6:36 PM, Jake Vanderplas notifications@github.com
wrote:

One place where automatically pre-centering the data would break
current code: if you are passing-in an array of 1s to find the power
associated with your observing window, centering the data leads to garbage.


Reply to this email directly or view it on GitHub
#2162 (comment).

@jakevdp
Copy link
Member

jakevdp commented Jun 19, 2015

Attachments don't come through on Github. Can you open a pull request labeled Work In Progress to show us your changes?

@pschella
Copy link

Done.

On Fri, Jun 19, 2015 at 3:37 PM, Jake Vanderplas notifications@github.com
wrote:

Attachments don't come through on Github. Can you open a pull request
labeled Work In Progress to show us your changes?


Reply to this email directly or view it on GitHub
#2162 (comment).

@jakevdp
Copy link
Member

jakevdp commented Jun 19, 2015

Thanks! Moving discussion there.

@modichirag
Copy link

I ran into the same normalization problem earlier this week and was later pointed to this thread by my instructor.
I would like to reiterate that even if the discussion is ongoing on whether to add pre_center and normalize options to this function, a simple one line edit in the documentation mentioning that this issue exists will help a lot of people. Something like following:
Note: If you the think the result is incorrect/unexpected, try after centering the Y data by subtracting the mean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants