Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: native implementation of fftfit #777

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

aarchiba
Copy link
Contributor

This PR should introduce an implementation of fftfit that does not depend on PRESTO. There are several floating around, the goal is to come up with one that demonstrably:

  • Reliably produces correct results (e.g. good to at least 1/32 bin with noiseless shifted profiles)
  • Produces reasonably reliable uncertainties (verified with synthetic profiles of known noise and shift)
  • Is reasonably efficient (for example no bootstrapping required)

Currently the code includes two implementations and a test suite that uses hypothesis to test one of them. The hope is that we can have a robust test suite so that we know the fftfit we use is reliable.

Closes #231

@aarchiba
Copy link
Contributor Author

The point of hypothesis here is to feed in a variety of profile shapes and phase shifts, looking for ones that cause problems for each algorithm.

Tolerances can be a problem for this sort of testing - we need to be realistic about how exact the answers are going to be, particularly in noiseless test cases. hypothesis also has an unfortunate habit of showing failing examples that just barely fail, so it's not always clear whether the tolerances are just a little too small or totally wrong.

It is made more complicated by the fact that once we start introducing noise, we have tests that we expect to fail a certain fraction of the time; the current mechanism for this is to use random seeds and repeat failing tests - if a test that's supposed to fail 1% of the time fails three times in a row, something is wrong.

@Zac-HD might be interested in this for hypothesis development and general testing of numerical code; @matteobachetti might be interested in case problems turn up in their implementation of fftfit.

@scottransom
Copy link
Member

@aarchiba Just curious: Is the pure-python version better in various ways from the original Fortran Joe Taylor code that PRESTO uses? I like the idea of getting rid of ancient code, but I do think it is kind of cool that that code has had so much use over the years! (pulsar code nostalgia)

@matteobachetti
Copy link
Contributor

Good stuff @aarchiba. Let's see what these tests say!

@matteobachetti
Copy link
Contributor

I see: https://travis-ci.org/github/nanograv/PINT/jobs/722025712.
That brentq stuff was indeed what was giving me headaches during my tests, and only works with reasonable signal-to-noise. Good that your implementation doesn't fail there!

@aarchiba
Copy link
Contributor Author

@aarchiba Just curious: Is the pure-python version better in various ways from the original Fortran Joe Taylor code that PRESTO uses? I like the idea of getting rid of ancient code, but I do think it is kind of cool that that code has had so much use over the years! (pulsar code nostalgia)

Heh. I do see the charm of keeping ancient code in use. And if it works reliably, great! In fact might it be worth firing up f2py or something and putting it in here? As it stands this is meant basically as a shootout between the various options. Ultimately I know we only need one that works, but...

@aarchiba
Copy link
Contributor Author

I see: https://travis-ci.org/github/nanograv/PINT/jobs/722025712.
That brentq stuff was indeed what was giving me headaches during my tests, and only works with reasonable signal-to-noise. Good that your implementation doesn't fail there!

That particular error - f(a) and f(b) have the same sign - means that your first-pass effort at finding the peak is not actually successfully finding the peak; it's starting somewhere that might or might not contain a peak. My guess is that using only 32 harmonics for a 2048-bin profile (or whatever) is not accurate. And also unnecessary.

@aarchiba
Copy link
Contributor Author

One reason I implement this from scratch is that for finding the phase shift there is a cheap fast algorithm that is more or less guaranteed to succeed: FFT the profile and template, multiply, IFFT and compute the CCF at (say) sixteen times the number of bins in the larger of profile and template. Because the CCF is band-limited all it's peaks, noise or real, are at least sixteen bins wide, so you can pick the highest point and it's neighbours are more or less guaranteed to make a three-point maximization bracket. And you can compute points of the IDFT for precision maximization. The problem cases are the ones that are legitimately problems - pairs of very nearly equal maxima, or completely flat CCFs. (And once you have the phase shift everything else is a linear problem.)

You do have to work out the appropriate derivatives to get approximate error bars, though, and my existing code does not do an adequate job of that (as hypothesis demonstrates).

@aarchiba
Copy link
Contributor Author

I created a test where the counterexamples hypothesis comes up with should be more obvious - on my machine the profile that triggers bracket failure in the NuSTAR fftfit is a von Mises profile centered at 0 with kappa of about 80 and 32 bins; plot attached. I suspect that the problem arises from the exact symmetry.
vonmises

@aarchiba
Copy link
Contributor Author

Also - have people stopped blackening their code? The code formatting checker fails all over the place. It's easy to add a pre-commit hook that doesn't let you commit ill-formatted code.

@paulray
Copy link
Member

paulray commented Aug 28, 2020

Also - have people stopped blackening their code? The code formatting checker fails all over the place. It's easy to add a pre-commit hook that doesn't let you commit ill-formatted code.

As far as I know, everything should be black. The master branch passes the black test.

@matteobachetti
Copy link
Contributor

@aarchiba I definitely did not blacken my code, I follow pep8 as a guideline but never ran the code through black

@aarchiba
Copy link
Contributor Author

Also - have people stopped blackening their code? The code formatting checker fails all over the place. It's easy to add a pre-commit hook that doesn't let you commit ill-formatted code.

As far as I know, everything should be black. The master branch passes the black test.

https://travis-ci.org/github/nanograv/PINT/jobs/722068749

Oh no! 💥 💔 💥
61 files would be reformatted, 123 files would be left unchanged.

That job is marked as allowed to fail in Travis.

@aarchiba
Copy link
Contributor Author

@aarchiba I definitely did not blacken my code, I follow pep8 as a guideline but never ran the code through black

Oh I just hit f8 before saving and all the code in files I touch gets blackened. So that's not the problem here.

@aarchiba
Copy link
Contributor Author

Also - have people stopped blackening their code? The code formatting checker fails all over the place. It's easy to add a pre-commit hook that doesn't let you commit ill-formatted code.

As far as I know, everything should be black. The master branch passes the black test.

https://travis-ci.org/github/nanograv/PINT/jobs/722068749

Oh no! boom broken_heart boom
61 files would be reformatted, 123 files would be left unchanged.

I stand corrected: master does appear to pass, somehow, in spite of the 61 untouched files I have that now fail.

@aarchiba
Copy link
Contributor Author

Change in black: #778

@scottransom
Copy link
Member

As for the original fftfit, using f2py is exactly what PRESTO has in it (using Joe's original code). So we should be able to do a shootout just calling it. The way it is usually used is via measure_phase() in get_TOAs.py in PRESTO.

@aarchiba
Copy link
Contributor Author

As for the original fftfit, using f2py is exactly what PRESTO has in it (using Joe's original code). So we should be able to do a shootout just calling it. The way it is usually used is via measure_phase() in get_TOAs.py in PRESTO.

I can't pip install PRESTO can I? Does it work with python 3?

@scottransom
Copy link
Member

You can't pip install PRESTO, in general. You have to do a normal source install for the C-code (which on Linux is super easy). Then in the top level directory you do "pip install .". But yes, it works with python3 for almost a year now.

Copy link
Contributor

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticking to the testing part, this looks great to me.

You tend to reach for @composite when I'd go for a less powerful .map(f) or builds(f, ...), but in simple cases there's not much downside - just a personal style either way.

assert_rms_close() and assert_allclose_phase() do seem like good places to add hypothesis.target() though - I'd experiment with targeting both np.abs(a-b).max() and np.abs(a-b).sum() to drive up localised and non-localised imprecision. This should also help with the threshold problem, since I wrote some extra reporting for you.

Comment on lines 37 to 39
@composite
def vonmises_templates(draw, ns=powers_of_two(), phase=floats(0, 1)):
return fftfit.vonmises_profile(draw(floats(1, 1000)), draw(ns), draw(phase))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there are no dependencies between the drawn values, you could use builds() here instead.

Suggested change
@composite
def vonmises_templates(draw, ns=powers_of_two(), phase=floats(0, 1)):
return fftfit.vonmises_profile(draw(floats(1, 1000)), draw(ns), draw(phase))
def vonmises_templates(ns=powers_of_two(), phase=floats(0, 1)):
return builds(fftfit.vonmises_profile, floats(1, 1000), ns, phase)

This involves slightly less indirection, but the main benefit is readability rather than performance.

Comment on lines +50 to +106
@composite
def random_templates(draw, ns=powers_of_two()):
return np.random.randn(draw(ns))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could return builds(np.random.randn, ns) or even return ns.map(np.random.randn).

@aarchiba
Copy link
Contributor Author

Sticking to the testing part, this looks great to me.

You tend to reach for @composite when I'd go for a less powerful .map(f) or builds(f, ...), but in simple cases there's not much downside - just a personal style either way.

assert_rms_close() and assert_allclose_phase() do seem like good places to add hypothesis.target() though - I'd experiment with targeting both np.abs(a-b).max() and np.abs(a-b).sum() to drive up localised and non-localised imprecision. This should also help with the threshold problem, since I wrote some extra reporting for you.

Aha! target seems like a very good idea. I wasn't sure what the outcome of the numeric-counterexamples discussion had been.

I confess I haven't really looked into the functional tools for generating examples; composite allows me to write code to generate examples more or less the way I would generate them by hand. It was also less clear to me how to name an example generator that wasn't composite. I'll look into builds more. I also find when doing hypothesis testing I end up writing rather a lot of support code for the tests, which is fine if all the tests go in one file, but I'm not sure how best to structure testing code so that test files can share test infrastructure conveniently.

You may notice that the second half of the file doesn't use hypothesis much. The infrastructure there is to support statistical tests - if I take many instances of noise, 95% of them should fall within twice the claimed error bars. But of course there's a known probability that that will fail even if the code works. So there's some decorator jiggery-pokery to supply deterministic random seeds and retry the test - if a test is built to fail 1% of the time and fails for three different random seeds chances are literally one in a million that everything is fine. And successes you don't need to repeat. This allows one to directly test statistical claims, but it does take a fair amount of computation. It is also not clear how or if these tests can reasonably be combined with hypothesis to explore parameter space for places the code falls down.

@aarchiba
Copy link
Contributor Author

@matteobachetti
Copy link
Contributor

@aarchiba that version is more unstable even in my limited tests, so I don't think it's worth it

@Zac-HD
Copy link
Contributor

Zac-HD commented Aug 30, 2020

I find when doing hypothesis testing I end up writing rather a lot of support code for the tests, which is fine if all the tests go in one file, but I'm not sure how best to structure testing code so that test files can share test infrastructure conveniently.

I tend to go with

tests/
	__init__.py
	test_foo.py
    ...
	utils.py

and then in test_foo.py I can from .utils import ... all my support code.

It is also not clear how or if these [statistical] tests can reasonably be combined with hypothesis to explore parameter space for places the code falls down.

The obvious approach of using Hypothesis to choose parameters (and maybe also random seeds) would probably work OK... I haven't tried it, but while the performance wouldn't be as good as usual I don't think there are any conceptual problems.

@aarchiba
Copy link
Contributor Author

The version I wrote seems to be pretty much bulletproof but I do not have code that computes the uncertainty correctly.

I find when doing hypothesis testing I end up writing rather a lot of support code for the tests, which is fine if all the tests go in one file, but I'm not sure how best to structure testing code so that test files can share test infrastructure conveniently.

I tend to go with

tests/
	__init__.py
	test_foo.py
    ...
	utils.py

and then in test_foo.py I can from .utils import ... all my support code.

It is also not clear how or if these [statistical] tests can reasonably be combined with hypothesis to explore parameter space for places the code falls down.

The obvious approach of using Hypothesis to choose parameters (and maybe also random seeds) would probably work OK... I haven't tried it, but while the performance wouldn't be as good as usual I don't think there are any conceptual problems.

Definitely not random seeds - the point here is that there is a known (small) fraction of random seeds that will make the test fail, even if the code is correct. So searching for seeds that make the test fail and then keeping them around is exactly wrong.

Even if hypothesis is just varying the input parameters, if the fraction of expected failures is too large, hypothesis will just keep trying examples until it finds one where the seed causes a failure, then keep spamming that one. The best hope is that one has a one-in-a-million statistical failure and over the lifespan of the project and all such tests, hypothesis tests much less than a million examples so that the failure never arises.

@aarchiba
Copy link
Contributor Author

aarchiba commented Sep 1, 2020

Okay got error estimates that check with the actual scatter for a simple profile plus noise, now to come up with some more challenging test cases and/or a way to use hypothesis in conjunction with statistical tests.

@luojing1211
Copy link
Member

Hi @aarchiba, Could you join the telecon tomorrow Sept 2nd, 2020, at 10:00 am EDT. I would like to have a discussion on this pull request.

@aarchiba
Copy link
Contributor Author

aarchiba commented Sep 2, 2020

Hi @aarchiba, Could you join the telecon tomorrow Sept 2nd, 2020, at 10:00 am EDT. I would like to have a discussion on this pull request.

I don't have zoom coordinates but otherwise can make the telecon, sure.

@aarchiba
Copy link
Contributor Author

aarchiba commented Sep 3, 2020

Initial tests with PRESTO's fftfit produce... segmentation fault in test_fftfit_basic_integer. Not in test_fftfit_basic_integer_vonmises, so it's not a simple can't-talk-to-fortran. Probably a case it's not supposed to work in anyway. Got to do some tidying and chase that down.

@aarchiba
Copy link
Contributor Author

aarchiba commented Sep 7, 2020

Resolved the segfaults (results for too-long profiles). I think there is still something wrong with how I am calling PRESTO's fftfit as the results it returns seem to be inaccurate.

@aarchiba
Copy link
Contributor Author

TODO: test fftfit_classic and fftfit_cprof against PRESTO versions. event_optimize.py passes its test using non-PRESTO, though.

@paulray
Copy link
Member

paulray commented Dec 9, 2020

Any chance we can get this merged before v0.8?

@matteobachetti
Copy link
Contributor

We might want to merge Ann's algorithm, that seems the one with the best performance among Python ones, start using it and test it "on the road". I would be happy to call it from Stingray as well, frankly (we already have PINT among our optional dependencies).

@aarchiba
Copy link
Contributor Author

aarchiba commented Dec 9, 2020

I've had some trouble getting my version of fftfit to work as a drop-in replacement for the PRESTO version - there's some stuff about definitions of the FFT I was having trouble with. The code is all there, though, with an interface I'm happy with, so if I can get the cross-checking working okay this can go in. People might want to comment on the interface I've chosen.

@matteobachetti
Copy link
Contributor

@aarchiba your interface is good, I had no problem trying it out.
I guess I'm asking something obvious, but could you add some docstrings to explain input parameters, optional parameters, and return values in all "public" (user facing) methods?

@aarchiba
Copy link
Contributor Author

aarchiba commented Dec 23, 2020

@aarchiba your interface is good, I had no problem trying it out.
I guess I'm asking something obvious, but could you add some docstrings to explain input parameters, optional parameters, and return values in all "public" (user facing) methods?

Indeed that would be a very good idea! Sorry it's not there yet, I wanted to be surer that this was going to work before fleshing out the docs.

In fact the intention is that the public-facing interface is all in __init__.py and I think they all do have docstrings. The individual implementations/wrappers are sort of internal, though users can get at them if they want to.

@matteobachetti
Copy link
Contributor

In fact the intention is that the public-facing interface is all in __init__.py and I think they all do have docstrings. The individual implementations/wrappers are sort of internal, though users can get at them if they want to.

Ah, right! It's all good then. In my opinion, my implementation can be ditched unless there is at least one clear case where it performs better than the others.

@matteobachetti
Copy link
Contributor

Now that I think of it: is there a reason to put all those functions in __init__.py? In general, this file does not contain significant implementation code (e.g. see Astropy guidelines https://docs.astropy.org/en/stable/development/codeguide.html)

@codecov
Copy link

codecov bot commented Jan 10, 2021

Codecov Report

Merging #777 (0c64a3d) into master (e8ae8bf) will increase coverage by 0.24%.
The diff coverage is 71.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #777      +/-   ##
==========================================
+ Coverage   55.57%   55.82%   +0.24%     
==========================================
  Files          85       89       +4     
  Lines       17725    17999     +274     
  Branches     3015     3039      +24     
==========================================
+ Hits         9851    10048     +197     
- Misses       7245     7314      +69     
- Partials      629      637       +8     
Impacted Files Coverage Δ
src/pint/scripts/event_optimize.py 63.19% <33.33%> (+0.24%) ⬆️
src/pint/profile/fftfit_presto.py 45.00% <45.00%> (ø)
src/pint/profile/__init__.py 46.98% <46.98%> (ø)
src/pint/profile/fftfit_nustar.py 82.52% <82.52%> (ø)
src/pint/profile/fftfit_aarchiba.py 92.64% <92.64%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e8ae8bf...0c64a3d. Read the comment docs.

@aarchiba
Copy link
Contributor Author

Reported coverage isn't going to be great overall because we're not testing on a machine with the PRESTO version of FFTFIT.

@aarchiba
Copy link
Contributor Author

This would be a good place to put some hypothesis configuration - in particular to avoid time-based failures, and also introduce faster runs for most CI and one more thorough run.

@matteobachetti
Copy link
Contributor

Hi @aarchiba, in my opinion, at this point, your method works great and an incomplete/draft implementation would be much better than none (perfect is an enemy of good, someone says). I would love to see this merged :)

@matteobachetti
Copy link
Contributor

matteobachetti commented Mar 23, 2022

@aarchiba periodic ping to ask for this awesome piece of code to get merged :)

@paulray
Copy link
Member

paulray commented Apr 30, 2022

Yes, @aarchiba how about resolving the conflicts and merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dependency on fftfit
6 participants