Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve normalization and plotting options in LombScarglePeriodogram #618

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

barentsen
Copy link
Collaborator

@barentsen barentsen commented Oct 31, 2019

Based on discussions with @danhey and @ojhall94 at the TASOC workshop this week, the idea has emerged to add amplitude and psd properties to LombScarglePeriodogram to enable anyone to use the normalization they prefer.

Example

Screenshot from 2019-10-30 16-36-18

TODO

  • Add unit tests.
  • Verify this works when the light curve has no units.
  • Verify this is correct.
  • Decide what to do with the existing normalization parameter.
  • Figure out how pg.plot() knows which normalization to use for its Y axis.
  • Verify that the pg.amplitude values match a sine with a known amplitude.
  • Consider enforcing ppm^2 / microhertz as the default for psd values.

@barentsen
Copy link
Collaborator Author

barentsen commented Nov 2, 2019

We had a productive in-person discussion on normalization involving @ojhall94 @danhey @jsk389 and others. Here is a picture of the whiteboard:

20191101_123322

@ojhall94 has kindly volunteered to write a paragraph to capture what we learned!

Useful references to cite in the Lightkurve docs on this topic include the PhD thesii by @jsk389 & @rhandberg and a KASC document written by Bill Chaplin on 2009 Nov 23 (filename: wg1_wgmail02.pdf).

@barentsen
Copy link
Collaborator Author

/cc @keatonb

@keatonb
Copy link
Contributor

keatonb commented Nov 2, 2019

Love it! This does just what I was asking for in the discussion of #570.

Just checking: does the power property really return flux_variance / [frequency unit], or is Hz hard-coded here as the frequency unit to be returned?

@barentsen
Copy link
Collaborator Author

Add the TASOC/TDA meeting earlier this month, I learned that asteroseismologists prefer the following definitions of the words power, amplitude, and psd:

power = (4 / N)  * lomb_scargle_power
amplitude = sqrt(power)
psd = (2/N) * (lomb_scargle_power / frequency_spacing)

where lomb_scargle_power is the power returned by a lomb-scargle periodogram and N is the number of data points.

Does that sound right? @ojhall94

@barentsen barentsen force-pushed the add-amplitude branch 2 times, most recently from a996e9a to f99fb21 Compare November 20, 2019 00:00
@ojhall94
Copy link
Contributor

Does that sound right? @ojhall94

It sounds right with what we learned at T'DA, but I haven't had the time yet to deep dive into the how and why. @keatonb, does this sound right to you?

@gully
Copy link
Contributor

gully commented Nov 25, 2019

It is useful to compare the observed frequency-domain periodogram PSD with analytic power spectra from their time-domain counterparts, Gaussian Process covariance kernels. This mapping is exact via Bochner's theorem. Currently lightkurve's PSD and celerité's .get_psd() methods differ in scale by a multiplicative constant, which I derived to be the number 4 below:

image

In other words, if you have a GP kernel kernel_mat, you need to multiply its PSD by 4 to match the default PSD units of lightkurve:

from celerite import terms
kernel_mat = terms.Matern32Term(log_sigma=np.log(sigma_guess), log_rho=np.log(rho_guess))
power_lk = kernel_mat.get_psd(2*np.pi*pg.frequency.value) * 4

I think most lightkurve/celerite users can simply multiply the celerite psd by the number 4. If we were really keen on highlighting this connection, we could add some sort of pg.psd_celerite property, that simply divides pg.psd by 4. Could get confusing and narrowly focused, so maybe a demo is better.

The celerite normalization tutorial is a bit inconsistent, the correct normalization was settled on here:
dfm/celerite#140 (comment)

(Addendum: As a sanity check, I have experimentally confirmed that the lightkurve and celerite PSDs match for known inputs, only if the scalar factor of 4 shown above is applied.)

@keatonb
Copy link
Contributor

keatonb commented Nov 26, 2019

Does that sound right? @ojhall94

It sounds right with what we learned at T'DA, but I haven't had the time yet to deep dive into the how and why. @keatonb, does this sound right to you?

Looks reasonable to me. I don't know who works in power directly or what they expect. Basu & Chaplin (Section 5.1.4, though this also advocates calibrating with Parseval's theorem which I'm not convinced about) define the regular power spectrum as power per bin without the extra factor of 2 (i.e., power = (2 / N) * lomb_scargle_power), so the amplitude spectrum would be amplitude = sqrt(2 * power), but again I don't know that there's a good argument for why that would be preferred. Also the frequency_spacing should be the "natural" spacing of 1/(time[-1] - time[0]) without the oversample factor.

@@ -62,17 +70,20 @@ class Periodogram(object):
Free-form metadata associated with the Periodogram.
"""
def __init__(self, frequency, power, nyquist=None, label=None,
targetid=None, default_view='frequency', meta={}):
targetid=None, default_view='frequency',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codacy Issue found: Trailing whitespace

@danhey
Copy link
Collaborator

danhey commented Mar 12, 2020

Hey all, I think this is an important feature we should push ASAP! Right now, if I call

pg = lc.to_periodogram(normalization='amplitude')
pg.plot()

then the plot says the y-units are in power (not amplitude!). I've had a few confused people ask me about this. The current behaviour is very confusing and we should either roll-back our current changes until this is ready or commit this soon.

@ojhall94
Copy link
Contributor

I agree with Dan!

I'm at that point in my thesis where I need to understand this stuff. I've run some tests, and as far as I can tell the following things are true:

Amplitude & Power normalisation:

power = (4 / N)  * lomb_scargle_power
amplitude = sqrt(power)

For an amplitude of A in the time-domain, this produces peaks at $A^2$ in power and $A$ in amplitude (as desired). This does not obey Parseval's theorem. This is okay because what we really want is to see the physical amplitude, so it doesn't really matter what Parseval's says. These normalisations ensure agreement with Kjeldsen+Bedding95, but are a factor of 2 off from Chaplin+Basu17, as @keatonb mentioned above.

PSD normalisation:

psd = (2/N) * (lomb_scargle_power / frequency_spacing)

This obey's Parseval's theorem (before division by the frequency spacing), and is as Chaplin+Basu17 suggest for solar-like oscillators.

Can somebody (@keatonb, @ jsk389 ?) confirm I'm not going crazy?

@keatonb
Copy link
Contributor

keatonb commented Mar 26, 2020

@ojhall94 This appears to agree with my understanding. To be clear, A there is the semiamplitude, like the light curve signal is Asin(omegat...)

@barentsen
Copy link
Collaborator Author

I pushed a new version to this branch which is ready for testing.

As explained before, Periodograms now have pg.amplitude, pg.power, and pg.psd properties so that everyone can work in their preferred space. The default units are ppm, ppm^2, and ppm^2/microhertz.

In addition, I have removed the default_view, normalization, and freq_unit keywords. These keywords were previously used to determine what the plots would show. I propose to use a much simpler method, which is to have pg.plot() accept xunit and yunit parameters.

For example:

# show frequency (microhertz) vs amplitude (ppm)
pg.plot()
# show amplitude vs period
pg.plot(xunit='day', yunit='ppm')
# show power vs frequency
pg.plot(xunit='1/day', yunit='ppm2')
 # show psd vs frequency
pg.plot(xunit='microhertz', yunit='ppm**2/microhertz')

I made pg.plot() smart enough to adjust the labels based on the units asked for. You can also pass default_xunit and default_unit arguments to lc.to_periodogram() to override the default.

@danhey @ojhall94 @keatonb Would you be willing to check out this branch and see if this works for you? The changes I made to periodogram.py are very significant. I'd be happy to add some backwards-compability magic to avoid breaking existing code, e.g. Pyriod, but I'll need your help to figure out if this is the right solution!

Something like this should work to try this PR:

git remote add geert https://github.com/barentsen/lightkurve.git
git fetch geert
git checkout add-amplitude
python setup.py develop

@danhey
Copy link
Collaborator

danhey commented Mar 28, 2020

Hi Geert, awesome work! I can confirm that this works and I have played around with a few test systems. My main concern is that pg.plot now defaults to power instead of the previously established amplitude.

Also, how can we just plot amplitude on the y-axis, i.e. not ppm or ppt?

And this is a weird bug, but it seems that when I import your branch of lightkurve then I can no longer render plots in Jupyter. Plots don't appear and calling plt.show() complains that the matplotlib AGG backend is enabled. Uninstalling lightkurve (or reverting to the main branch) removes the problem.

@barentsen
Copy link
Collaborator Author

@danhey Thanks for the feedback! I didn't pay attention to maintaining the existing defaults. I've changed it so you now get Amplitude [normalized] and Period [d] as the default plot. The amplitude is now in relative units (it used to be electron/s), but not ppm by default.

Your Jupyter problem is surprising. Does using %matplotlib inline resolve the issue?

Please keep playing with this branch! It's a big change so I'd like for all of us to use it for a week.

@ojhall94
Copy link
Contributor

ojhall94 commented Mar 30, 2020

Hey all,

I've had a play around with it, it works very smoothly! I have some feedback & questions:

  • With the default frequency is (cycles/day). This means that pg.frequency returns (cycles/day), and that pg.psd also comes in units of ppm^2/cycles/day. The old normalization kwarg in the to_periodogram call was super helpful because it allowed us to use the defaults for solar-like oscillators, and is probably baked into a lot of peoples' codes now, so this update would make periodogram a bit more tedious to use. Don't know if we want to include some kind of preset option?

  • Addendum to the previous comment: a lot of other kwargs assume input in microhertz, such as the mimum/maximum frequency kwargs in to_periodogram. The seismology package also assumes default in puts of microhertz, because its built for solar-like oscillators at the moment.

  • The plotting is intuitive and tidy!

  • The normalisations work as intended: setting minimum/maximum frequencies doesn't change the y-axis units.

  • I'm confused about @danhey's requested correction. With the normalisations we decided on earlier in this thread, a sine wave of semiamplitude A [ppm] should show a peak of height A[ppm] as well, as below:

image

I'm not sure what the [normalized] units represent in this case, because the amplitude output in ppm is already normalized. But maybe we're getting our wires crossed.

image

@danhey
Copy link
Collaborator

danhey commented Mar 30, 2020

I second @ojhall94's comment about keeping a normalization keyword. If I am completely honest, I think this PR makes the process of calculating a periodogram more complicated and less intuitive.

Can we keep the output of the periodogram in pg.power and ensure that the units are specified? This is similar to the docs for Lomb-Scargle in Astropy. This way, we can maintain the plotting x/y unit keywords but keep the division between solar-like and coherent oscillators. I can not think of a typical scenario where someone would need both the amplitude and psd of a periodogram simultaneously.

Ollie also raised a good point about breaking existing code which relies on lightkurve. We've already introduced one major (and backwards incompatible) change to how periodograms work. Whatever we end up deciding on should be agreed upon..!

@ojhall94
Copy link
Contributor

I've just been playing around with going from a periodogram to seismology--- the flatten() function no longer returns the correct output because it divides through by self.power, which is different to self.psd in the new framework. The same problem is scattered throughout, and its probably an easy fix, but I just thought I'd highlight!

@barentsen
Copy link
Collaborator Author

Thank you for the feedback both! This all seems sensible. I'm hoping most of it can be resolved using small changes, but it may take me some time to get to it.

@danhey
Copy link
Collaborator

danhey commented Apr 1, 2020

I've played around with @barentsen's changes and have implemented my thoughts. You can see it in action here: https://gist.github.com/danhey/53f1374ce87bf21cbaa5a8166292136c
I feel strongly about keeping a normalization parameter to maintain default units, but am flexible on other fronts.

There are a few options we can take here:

  1. Revert back to storing all output in .power but ensure compatibility with different plotting default units etc.
  2. Subclass LS Periodogram into an amplitude spectrum and PSD spectrum. This would be useful if we choose to implement some more advanced features in the future, like fitting Lorentzians on PSD peaks.
  3. Remove .power entirely and store things in .psd and .amplitude. Keep the normalization argument. Raise a deprecation warning when people try and access .power and return the amplitude instead to maintain compatibility (unless they've set normalization = 'psd', in which case we return the psd). Simplify the Periodogram base class by moving the bulk of the plotting logic into the LombScarglePeriodogram and calling the super function.

Anyway, I'm sure there are other options so please chime in and I'll try to implement some of these to play around with!

@barentsen
Copy link
Collaborator Author

Thanks @danhey!

I like the behavior demonstrated in the gist you posted. I do like the simplicity of just having power as a property, where the word "power" is used as a broad synonym for whatever measure of signal strength a user prefers (be it BLS power, amplitude, psd, ...)

I would be happy to merge your PR into this PR! Any objections?

(Footnote for others: this is a bit complicated, but, over on barentsen#4, @danhey posted a PR against the branch on which this PR is based, i.e. proposing to change this PR. This is actually a great way to propose changes to a PR. Dan is showing off some pro GitHub skills here!)

@barentsen barentsen changed the title Add psd and amplitude properties to LombScarglePeriodogram Improve normalization and plotting options in LombScarglePeriodogram Apr 1, 2020
@barentsen
Copy link
Collaborator Author

@ojhall94 @keatonb Do you want to give this branch another test? The normalization keyword is back and the pg.amplitude and pg.psd properties are gone, thanks to @danhey.

@keatonb
Copy link
Contributor

keatonb commented Apr 1, 2020

I just took a look and found a strange behaviors from trying to guess at an intuitive way to compute power rather than amplitude... (edited to remove an example from my original comment, that I realize comes from "normalization" changing the default oversampling, which I'm used to specifying explicitly)

per1 = lc.to_periodogram(default_xunit='uHz', default_yunit='ppt^2')
per2 = lc.to_periodogram(default_xunit='uHz', default_yunit='ppt')
print(np.allclose(per1.power.value/per2.power.value,1000))
> True

I could even generate periodograms with yunits of ppt^10, but they were just amplitude scaled way up.

Also, why not just call these xunit and yunit instead of default_...? "default" doesn't sound like something I should feel comfortable changing.

The documentation for lc.to_periodogram() needs to be updated too.

I don't mind keeping the normalization keyword if it just sets default units when they're not specified. I do think the idea that started this off of having access to both .amplitude and .psd properties is desirable and intuitive.

Overall, I think it would be best if both LightCurve and Periodogram objects could be defined and manipulated with similar-looking methods, so whatever users get used to is applicable to both (#657 (comment)). The LightCurve instantiation takes flux_unit,time_format,time_scale, while Periodogram objects here take default_xunit,default_yunit. And when calculating periodograms from light curves that have time units, those units are completely ignored (i.e., if the time units are not days, the periodogram units come out completely wrong).

@danhey
Copy link
Collaborator

danhey commented Apr 1, 2020

Thanks for taking a look @keatonb !

Also, why not just call these xunit and yunit instead of default_...? "default" doesn't sound like something I should feel comfortable changing.

That wasn't part of my code but I believe it's because .plot has xunit and yunit features. Setting default_xunit ensures that that the output of pg.frequency is always in your chosen unit, but can be changed in the plotting. @barentsen ?

I don't mind keeping the normalization keyword if it just sets default units when they're not specified. I do think the idea that started this off of having access to both .amplitude and .psd properties is desirable and intuitive.

I understand your point here -- and that is the most pythonic solution. The problem being that everyone who uses lightkurve now is used to .power being a catch-all for the output of the periodogram, regardless of normalization. Another problem I have with this is that it'd be unusual for someone to want both .psd and .amplitude simultaneously. Usually, we only look at one depending on the type of star (well, that is my experience anyway!).
If we went this route I think we should subclass Periodogram to have both AmplitudeSpectrum and PowerSpectralDensity classes. Again, this would be useful for fitting in the future if we ever decide to implement that, and it would also solve the "unexpected behaviour" part of choosing a normalization which specifies defaults.

Overall, I think it would be best if both LightCurve and Periodogram objects could be defined and manipulated with similar-looking methods, so whatever users get used to is applicable to both (#657 (comment)). The LightCurve instantiation takes flux_unit,time_format,time_scale, while Periodogram objects here take default_xunit,default_yunit. And when calculating periodograms from light curves that have time units, those units are completely ignored (i.e., if the time units are not days, the periodogram units come out completely wrong).

I agree with this. An issue I found yesterday while testing this PR is that the periodogram will completely ignore the units of the input light curve..! I am also not sure how to get the units of the input lightcurve time.. It looks like they're stripped out?

@ojhall94
Copy link
Contributor

ojhall94 commented Apr 3, 2020

Hi everyone! I've had a play around with it. This is definitely more intuitive and accessible. In it's current format, with the normalization=psd option turned on it loses the ability to plot in amplitude (as setting yunit='ppm' throws an error as ppm^2/muhz and ppm are not compatible units). I don't think this is a problem, as a user could go back and re-make the periodogram, but it loses that bit of functionality this PR was intended to include.

Thanks so much for working on this us Geert!!

Some further comments:

  • Some kwargs still take microhertz as default no matter the normalization, i.e. minimum/maximum_frequency. We should set their assumed units to be that of the chosen normalization--- the framework to do this should already be built in.

  • Normalisation behaves as expected (sine of semiamplitude A returns amplitude A in the amplitude spectrum, and a psd of A^2/2/(frequency spacing) in the psd spectrum.

  • A periodogram created with normalization='psd' will plot with microhertz on the x-axis by default. A SNRPeriodogram created from that periodogram will plot with period on the x-axis by default instead (although its frequency units are in microhertz). This doesn't seem intended.

  • Slight changes to mode heights appear when comparing a full PSD spectrum and when computing a periodogram with min/max frequency kwargs set (see below). This didn't happen in previous versions of this PR, and shouldn't happen here either. @keatonb solved this in previous PR's--- it's got to do with the oversample factor probably, because this doesn't happen for the amplitude spectrum (see below):

image

image

@keatonb
Copy link
Contributor

keatonb commented Apr 3, 2020

@ojhall94, I don't think I fixed this problem previously, but I do recall saying that I think this is the expected behavior. I would say that either one of the psds above (with or without min/max set) is equally valid and useful to analyze. You're right that we see the bigger difference in the psd rather than amplitude because of the sparser frequency sampling, but there should be differences in both if you look closely. This is one reason that I prefer to oversample when inspecting power spectra.

@barentsen
Copy link
Collaborator Author

Thank you all for the thoughtful comments! Would it be correct to say that most concerns could be addressed if we made the following changes:

  1. Make sure the light curve's flux_unit is used as the default yunit in Periodogram (including if the flux_unit is electron/second?)
  2. Rename default_xunit/yunit in Periodogram.from_lightcurve() to xunit and yunit.
  3. Fix the input units of minimum/maximum_frequency.
  4. Fix the xunit used in SNRPeriodogram plots.

Before doing so, I would love to think more about @danhey's suggestion to have separate AmplitudePeriodogram and PowerSpectralDensityPeriodogram classes. Do you want to post example pseudocode of how they would work?

@barentsen
Copy link
Collaborator Author

ping @danhey 👋

@danhey
Copy link
Collaborator

danhey commented May 19, 2020

Hi all, sorry it's taking me so long to get around to this. I've pushed some very small fixes to the current implementation which should get it mostly working. At this point, I consider new additions to be secondary to fixing the current bug which displays amplitude as power (as of v1.10).

My idea for subclassing would be having AmplitudeSpectrum and PowerSpectralDensity classes. The functionality would still be the same, but it would let us support fitting things like Lorentzians for PSDs in the future. On top of this, I would like to add a normalization option for the window function.

@barentsen
Copy link
Collaborator Author

@danhey That sounds like it may be an effective solution. Please ping me if you need help!

Base automatically changed from master to main February 10, 2021 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants