Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lomb-Scargle detection function #1070

Merged
merged 9 commits into from Sep 7, 2017
Merged

Add Lomb-Scargle detection function #1070

merged 9 commits into from Sep 7, 2017

Conversation

@wegenmat
Copy link
Contributor

@wegenmat wegenmat commented Jun 20, 2017

This pull request adds the Lomb-Scargle algorithm for period detection to gammapy/time. The Lomb-Scargle is from astropy. It is extended by different significance criteria for periodogram peaks. The spectral window function is introduced to investigate the influence of sampling on the periodogram.

Matthias Wegen
@cdeil cdeil self-assigned this Jun 20, 2017
@cdeil cdeil added the feature label Jun 20, 2017
@cdeil cdeil added this to the 0.7 milestone Jun 20, 2017
Copy link
Member

@cdeil cdeil left a comment

@wegenmat - Thanks!

I've done a first round of review and left inline comments. After we've iterated a bit on this and polished the code / tests / docs, I would then ask someone else to review the API / method.

# from lombscargle
from astropy.stats import LombScargle
# from PyAstronomy.pyTiming.pyPeriod import Gls
# from PyAstronomy.pyTiming import pyPeriod

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

Please remove commented out code and all print statements.
Rename lombscargle_gammapy.py to lombscargle.py


__all__ = [
'lombscargle',
'plotting',

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

How important is this plotting function?
Maybe it can be removed, or if you want to keep it, please rename to lombscargle_plot.
Otherwise people will not know what it is when it shows up in the gammapy.time namespace.

return quant_pre, quant_cvm, quant_nll, sig

def lomb_scargle(t, mag, dmag, K, N_bootstraps, FAP):
"""This function processes the lomb-scargle-algorithmus in autopower mode

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

So this is the most important function for users.
Please write the docstring here with Parameters and Returns section documenting the inputs / outputs.
And maybe, if you have additional info to share (e.g. a link to a paper with the method), add that in the docstring as well.

Here's an example for the Numpy docstring format we use:

if best_period == 0:
best_period = np.nan

return freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, PLS_win

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

Here you return a tuple of 8 computed results.
People will never get that order right.

Suggest to return a dict instead with that info (and then document the key names and values in the docstring)


return freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, PLS_win

def plotting(t, mag, dmag, freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, N_bootstraps, PLS_win):

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

Same here: if this is a "public" function, i.e. something users should use, then it needs a good docstring describing the inputs, outputs and what it does.

If you change the lombscargle function above to return a results dict, then here you don't need so many parameters and can just reference the other function from the docstring?

@@ -0,0 +1,30 @@
import numpy as np
# import sys
# sys.path.insert(0, '/afs/ifh.de/group/amanda/scratch/wegenmat/time-series-analysis/time-series-analysis')

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

Again: remove commented out code here and change filename to test_lombscargle.py.

FAP = 0.05 # false alarm probability
N_bootstraps = 100 # number of bootstrap resamplings
freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, PLS_win = lomb_scargle(t_obs, mag_obs, dmag_obs, K, N_bootstraps, FAP)
plotting(t, mag, dmag, freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, N_bootstraps, PLS_win)

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

Plotting test should be separate from the computation test.
People should be able to use the function that does the computation and also run it's test without having matplotlib installed.
So there should be two test functions (probably called test_lombscargle and test_lombscargle_plot), and if you want to avoid code duplication, have the common code in a third function where the name doesn't start with "test" that you call from both test functions.

freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, PLS_win = lomb_scargle(t_obs, mag_obs, dmag_obs, K, N_bootstraps, FAP)
plotting(t, mag, dmag, freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, N_bootstraps, PLS_win)
print('Best period: ' + str(best_period))
assert np.array([freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, PLS_win]) == True

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

This assert statement is weird. You should have some assert_allclose calls on computed results, like a significance or something, no?

assert np.array([freq, PLS, best_period, quant_pre, quant_cvm, quant_nll, quant_boot, PLS_win]) == True

if __name__ == '__main__':
test_lombscargle()

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

Remove this "if name == main" part, it's not needed.
To run tests, you can use python -m pytest -v gammapy ... or python setup.py test -V and have extra arguments to select the tests you want to run.
If you don't know how to do this locally, let's discuss on Slack.

# from PyAstronomy.pyTiming import pyPeriod
import numpy as np
# import matplotlib
import matplotlib.pyplot as plt

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

matplotlib and scipy are optional dependencies of Gammapy.
Usually we put those imports as "delayed" imports into the functions where it's used.
Having them at the module level (i.e. at the top) means that you can't even import this module if the dependency is not installed.
One should be able to use your lombscargle function to compute something even if matplotlib isn't installed.
Is it clear? Can you change these imports?

@wegenmat
Copy link
Contributor Author

@wegenmat wegenmat commented Jun 20, 2017

I did some of the recommend minor changes. I am not quite sure about the plotting function. On the one hand, it is not essential for the periodogram, but one the other hand, since at least the spectral window function needs to be inspected by eyesight, plotting comes in handy for the analysis.

@cdeil
Copy link
Member

@cdeil cdeil commented Jun 20, 2017

I did some of the recommend minor changes. I am not quite sure about the plotting function. On the one hand, it is not essential for the periodogram, but one the other hand, since at least the spectral window function needs to be inspected by eyesight, plotting comes in handy for the analysis.

OK to keep the plotting function. But then you have to write a good docstring explaining what the inputs are and what the plots show.

.. _time:
*******************************************
Period detection (`gammapy.time.lombscargle`)
*******************************************

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

There's a few formatting issues with your docs/time/period.rst file.

In order to be able to resolve those, you need to run the Sphinx docs build locally and fix the errors / warnings, and look at the generated HTML docs, using these commands:

python setup.py build_docs
open docs/_build/html/index.html

Since you're introducing a new sub-page besides index.rst in the gammapy.time docs, you have to add something to docs/time/index.rst so that the new page gets linked to and becomes part of the docs.

An example is here:
https://github.com/gammapy/gammapy/blame/master/docs/spectrum/index.rst#L75
so adding this at the end of docs/time/index.rst (before the API ref) should do the trick:

Content
=======
.. toctree::
   :maxdepth: 1

   period
Copy link
Member

@cdeil cdeil left a comment

@wegenmat - I left a few more inline comments with suggestions. Let me know if you have any questions or if I should look again at something by leaving a comment.

@@ -0,0 +1,174 @@
from astropy.stats import LombScargle

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

We have these two lines of boilerplate that should be added at the top of any new Python file in Gammapy (i.e. here lombscargle.py and test_lombscargle.py:

# Licensed under a 3-clause BSD style license - see LICENSE.rst
from __future__ import absolute_import, division, print_function, unicode_literals

Also, please move the matplotlib imports into the plotting function (or tests in continuous integration will fail, because import should work without matplotlib present).

);
# save plot
plt.tight_layout()
plt.savefig('Lomb-Scargle periodogram', bbox_inches='tight')

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

You shouldn't save the plot. Gammapy is a library and functions should be as flexible as possible. That means for plotting functions to leave it up to the caller to save the fig after calling your function if they like.

@@ -0,0 +1,24 @@
import numpy as np
from lombscargle_gammapy import lomb_scargle, plotting

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

This import is broken, it should be:

from ..lombscargle import lomb_scargle, plot_lomb_scargle

You can run the test locally and check using this command:

python -m pytest -v gammapy/time/tests/test_lombscargle.py
import numpy as np
from lombscargle_gammapy import lomb_scargle, plotting

def test_lombscargle():

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

This test is reasonably complex that it's to someone like me not obvious what it does.
So please add a comment line or two at the start or within test_lombscargle with a brief high level description.
(don't repeat what the code does, just state intent / high-level summary).

t = np.linspace(0, 100, 1000)
t_obs = np.sort(rand.choice(t, 500, replace=False))
n_outliers = 50
dmag = np.random.normal(0, 1, 1000) * -1**(rand.randint(2, size=1000))

This comment has been minimized.

@cdeil

cdeil Jun 20, 2017
Member

Here you're accessing the global np.random.normal which will lead to non-deterministic random numbers in the test, no? Generate random numbers via your RandomState object everywhere.

@cdeil cdeil changed the title Lomb-Scargle for gammapy.time Add Lomb-Scargle detection function Jun 20, 2017
Matthias Wegen and others added 2 commits Jun 23, 2017
…scargle_plot.py, doc for period detection period.rst is still to be done
@wegenmat
Copy link
Contributor Author

@wegenmat wegenmat commented Jun 26, 2017

A new idea about the plotting function: since it is the same for the Lomb-Scargle and the robust regression techniques (upcoming), it may be smart to outsource it to an independent method that is dedicated to period detection in general, not only Lomb-Scargle. Your thoughts about this, @cdeil ?

@cdeil
Copy link
Member

@cdeil cdeil commented Jun 26, 2017

Sure, if the plotting function can be applied more generally, it's a good idea to make it work for both cases!

…r plot_periodogram and lomb_scargle, added doc string to test
@wegenmat
Copy link
Contributor Author

@wegenmat wegenmat commented Jun 26, 2017

ready for reviewing

Copy link
Member

@cdeil cdeil left a comment

@wegenmat - I left a bunch of inline comments. The most important one is the request to add a working example to the high-level docs. Having that will make it much more likely that people will use this, and also easier to do the review if there's a working example one can easily start playing with.

@@ -0,0 +1,27 @@
.. _time:

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

There's a few warnings from the Sphinx build, please fix:

/Users/deil/code/gammapy/docs/time/period.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent.
/Users/deil/code/gammapy/docs/time/period.rst:7: WARNING: Explicit markup ends without a blank line; unexpected unindent.
/Users/deil/code/gammapy/docs/time/period.rst:1: WARNING: duplicate label time, other instance in /Users/deil/code/gammapy/docs/time/index.rst
@@ -0,0 +1,27 @@
.. _time:
**********************************************************************************************
Period detection (`gammapy.time.lomb_scargle`) and plotting (`gammapy.time.plot_periodogram`)

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

We usually don't mention function names in the titles of sub-pages.
I.e. here I suggest to simply change to a short "Period detection" or "Lightcurve period detection" or something like this (without references to code).

------
`~gammapy.time.lomb_scargle` returns the frequency grid, the periodogram peaks of the Lomb-Scargle periodogram and the spectral window function, the percentiles of all significance criteria for a specified false alarm probability, as well as the best period if found.

Reference

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

There's no need for a "Reference" section here in the high-level docs, please remove.

The functions you're writing are exposed via the gammapy.time namespace.
Mainly I think this page would become interesting / useful if you show a working example, where you actually obtain and plot a result on an example light curve (of your choosing).

- ``fgrid`` (`~numpy.ndarray`) -- Frequency grid in inverse units of ``t``
- ``psd`` (`~numpy.ndarray`) -- Power spectral density of the Lomb-Scargle periodogram at the frequencies of ``fgrid``
- ``period`` (`float`) -- Location of the highest periodogram peak
- ``significance`` (`float`) or (`~numpy.ndarray`) -- Significance of ``period`` under the specified significance criterion. If the significance criterion is not defined, all significance criteria are used and their respective significance for the period is returned

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

@wegenmat - Generally, please try to limit you line length to ~ 80 characters. It's not super strict, but lines like this one are close to impossible to review, because e.g. on Github one has to scroll back and forth a lot to read it at all.

return significance


def _significance_cvm(time, freq, psd, psd_best_period):

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

You're not using time in this function. Is that correct? If yes, please remove that argument from the function signature.

significance[0] = _significance_pre(time, freq, psd_best_period)
significance[1] = _significance_nll(time, freq, psd, psd_best_period)
significance[2] = _significance_cvm(time, freq, psd, psd_best_period)
significance[3] = _significance_boot(time, flux, flux_error, freq, psd_best_period, n_bootstraps=100)

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

Here you don't pass n_bootstraps along, the input argument to _significance_all is unused.


def lomb_scargle(time, flux, flux_error, dt, criterion='None', n_bootstraps=100):
"""
This function computes the significance of periodogram peaks under certain significance criteria.

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

Always start a function with a one-line summary, then an empty line and then more explanations.
The phrase "This function" can be removed since it doesn't contain any new info, e.g. this would be OK I think:

Compute period and significance of a light curve using Lomb-Scargle PSD.
# find period with highest periodogram peak
psd_best_period = np.max(psd_data)
best_period = 1. / freq[np.argmax(psd_data)]

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

My suggestion would be to change the selection of criterion as follows:

criteria : list of str
   Select which significance methods you'd like to run (by default all are run)
   Available: {'pre', 'cvm', 'nll', 'boot'}

and then to use criteria=None as the default and do this in the function:

if criteria is None:
    criteria = ['pre', 'cvm', 'nll', 'boot']

significance = OrderedDict()
if 'pre' in criteria:
     significance['pre'] = _significance_pre(time, freq, psd_best_period)
if 'cvm' in criteria:
    significance['pre'] =  _significance_cvm(time, freq, psd_data, psd_best_period)

Then the user would have the full power to select exactly the list of criteria they want.
And the results would have names, whereas currently one has to access by index e.g. significance[2] to access "cvm"w which is hard to read code.


def plot_periodogram(time, flux, flux_error, freq, psd_data, psd_win, best_period='None', significance='None'):
"""
This function plots a light curve, its periodogram and spectral window function.

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

Remove "This function", add an empty line after the summary line.

test_data['t'], test_data['y'], test_data['dy'], test_case['dt'],
test_case['sig_criterion'], test_case['n_bootstraps'],
)
assert_allclose(

This comment has been minimized.

@cdeil

cdeil Jun 26, 2017
Member

This should nicely fit on one line now, no need to put on many lines like you have it now.

wegenmat and others added 3 commits Jun 26, 2017
wegenmat
Matthias Wegen
@cdeil
cdeil approved these changes Sep 7, 2017
@cdeil
Copy link
Member

@cdeil cdeil commented Sep 7, 2017

@wegenmat - Thanks for all your work on this!

There's several more points to discuss here, but I'm merging this now into Gammapy master because it's easier to continue via commits there or in follow-up pull requests.

@cdeil cdeil merged commit dd38597 into gammapy:master Sep 7, 2017
0 of 2 checks passed
0 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/pr AppVeyor build failed
Details
@cdeil
Copy link
Member

@cdeil cdeil commented Sep 7, 2017

@wegenmat - I've done some code clean-up in f2cd0cc and added you to the Gammapy contributor list in 2affbe7

Please review the changes in f2cd0cc and let me know if you have any questions or disagree with something. We've found that setting plotting style in library functions is not a good pattern because styles keep changing (e.g. matplotlib 1.0 vs 2.0) and it's best to leave styling to users.

Some questions / comments:

  • Is the implementation of _cvm correct? You have a for loop and a variable sumbeta, but then you don't actually accumulate in that variable, which in effect will lead to the values from just the last iteration being used (and the for loop not being needed). Looks like a bug, no?
  • The tests are very slow. Can you choose a faster test-case? Does it really help to run two test cases for the plotting function or would you test just as much by just running a single test case to execute it once?
  • The link to the reference to Süveges is super long. Is that reference listed in ADS? If yes, could you please link there instead?

@wegenmat - Maybe you could open a new follow-up PR (starting from master) where we can do further improvements for this code?

--

The docs build is currently failing because you're referencing PNG images in gammapy-extra that aren't there yet. Please make a pull request against gammapy-extra to add them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants