Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve clock file handling #1239

Merged
merged 57 commits into from May 26, 2022
Merged

Conversation

aarchiba
Copy link
Contributor

@aarchiba aarchiba commented May 11, 2022

This is meant to be a modest change, not yet implementing any global clock-file updating scheme as suggested in #1180 but addressing as far as possible the issue of finding any local clock files you do have. It also does something to address #1235 by pulling in all observatories that are in the TEMPO/TEMPO2 official repositories.

Notable changes:

  • PINT now includes all clock correction files from TEMPO/TEMPO2 (including Jodrell corrections more recent than 1997)
  • You can now specify what to do about missing clock corrections when loading TOAs: get_TOAs("file.tim", limits="error")
  • You can now use telescopes without clock corrections; you just get warnings, the same as if you had data past the end of your clock corrections.
  • You can now ask a clock correction file or an observatory when the last MJD it covers is
  • check_for_new_clock_files_in_tempo12_repos is a first attempt at comparing PINT's clock corrections to the versions on the Web
  • Demo notebook shows the status of PINT's clock corrections (visible in PR rendered docs: https://nanograv-pint--1239.org.readthedocs.build/en/1239/examples/check_clock_corrections.html)

Goals:

  • PINT reports something useful when clock corrections are missing
  • PINT includes all clock corrections from TEMPO/TEMPO2
    - [ ] PINT pulls newer/extra clock corrections from $TEMPO/ or $TEMPO2/ if available
  • PINT handles missing clock corrections the same way as TOAs past the end of available clock corrections
  • Users have a way to request exception/warning behaviour for missing clog corrections

Closes #1241 #1235

@aarchiba
Copy link
Contributor Author

aarchiba commented May 11, 2022

Currently it simply reads all the clock files from the PINT repository, and xfails the observatories we don't currently have clock files for.

I'd like to see it pull in additional or newer clock files from $TEMPO/ or $TEMPO2/ if those are set and the clock files are found.

I think we should also handle missing corrections the same way we handle TOAs past the end of existing corrections. Which at the moment is, emit a warning but go on with uncorrected data.

Longer term we should be willing to pull in updated clock files using the astropy cache system to get them from a central server, triggered by TOAs we don't have corrections for, but I don't think that belongs in this PR.

@aarchiba aarchiba changed the title Improve clock file handling WIP: Improve clock file handling May 11, 2022
@codecov
Copy link

codecov bot commented May 11, 2022

Codecov Report

Merging #1239 (9d98345) into master (f2d26d7) will decrease coverage by 0.23%.
The diff coverage is 48.53%.

❗ Current head 9d98345 differs from pull request most recent head cd431fb. Consider uploading reports for the commit cd431fb to get more accurate results

@@            Coverage Diff             @@
##           master    #1239      +/-   ##
==========================================
- Coverage   61.26%   61.02%   -0.24%     
==========================================
  Files          88       88              
  Lines       19757    19895     +138     
  Branches     3456     3485      +29     
==========================================
+ Hits        12104    12141      +37     
- Misses       6897     6994      +97     
- Partials      756      760       +4     
Impacted Files Coverage Δ
src/pint/models/model_builder.py 92.60% <ø> (ø)
src/pint/observatory/observatories.py 100.00% <ø> (ø)
src/pint/observatory/__init__.py 37.89% <6.74%> (-14.39%) ⬇️
src/pint/observatory/special_locations.py 60.62% <63.33%> (-2.14%) ⬇️
src/pint/observatory/clock_file.py 78.66% <71.69%> (-5.08%) ⬇️
src/pint/observatory/topo_obs.py 77.64% <78.78%> (+4.51%) ⬆️
src/pint/toa.py 80.82% <100.00%> (ø)
src/pint/logging.py 65.15% <0.00%> (-7.58%) ⬇️
src/pint/models/pulsar_binary.py 75.28% <0.00%> (-0.41%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2d26d7...cd431fb. Read the comment docs.

@aarchiba
Copy link
Contributor Author

Unclear what we should do when someone specifies clock_dir="TEMPO" - still look in the PINT directories? What if they specify clock_dir="PINT" or nothing - do we check $TEMPO? The obvious thing to do, as far as I'm concerned, is ignore clock_dir unless it's an actual directory and just use clock_fmt to decide where to look for newer clock files based on that.

@aarchiba
Copy link
Contributor Author

I could put in a test that would fail if any of our favourite telescopes are more than (say) 90 days out of date, but I think that would annoy people trying to improve PINT tremendously when all their PRs started failing.

@aarchiba
Copy link
Contributor Author

aarchiba commented May 11, 2022

Quick list of observatories with clock corrections and the spans they cover:

for a,o in pint.observatory.Observatory._registry.items():
    m = o.last_clock_correction_mjd()
    if np.isfinite(m):
        print(f'{a:20s}\t{Time(m, format="mjd").iso}  \t{(Time.now()-Time(o.last_clock_correction_mjd(), format="mjd")).datetime}')

The second column is the date and time of the last correction, the third is how far in the past it is. Negative numbers, yes, mean values in the future.

geocenter           	2022-10-27 00:00:00.000  	-169 days, 16:31:59.847018
stl_geo             	2022-10-27 00:00:00.000  	-169 days, 16:31:59.849765
gbt                 	2022-02-28 12:00:00.000  	72 days, 4:31:59.851853
gbt_pre_2021        	2022-02-28 12:00:00.000  	72 days, 4:31:59.854798
arecibo             	2020-08-18 00:00:00.000  	631 days, 16:31:59.858017
arecibo_pre_2021    	2020-08-18 00:00:00.000  	631 days, 16:31:59.860589
vla                 	2021-03-07 12:00:00.000  	430 days, 4:31:59.862690
meerkat             	2021-02-22 23:44:59.971  	442 days, 16:46:59.893559
parkes              	2019-07-31 09:40:00.192  	1015 days, 6:51:59.674769
jodrell             	2022-10-27 00:00:00.000  	-169 days, 16:31:59.868711
jodrell_pre_2021    	2022-10-27 00:00:00.000  	-169 days, 16:31:59.871412
nancay              	2022-10-27 00:00:00.000  	-169 days, 16:31:59.873983
ncyobs              	2022-10-27 00:00:00.000  	-169 days, 16:31:59.876449
effelsberg          	2022-10-27 00:00:00.000  	-169 days, 16:31:59.878485
effelsberg_pre_2021 	2022-10-27 00:00:00.000  	-169 days, 16:31:59.880405
wsrt                	2015-06-29 02:24:00.000  	2508 days, 14:08:01.882449
fast                	2019-09-18 00:30:14.400  	966 days, 16:01:45.484203
most                	2022-10-27 00:00:00.000  	-169 days, 16:31:59.887346
chime               	2022-10-27 00:00:00.000  	-169 days, 16:31:59.889994
gb140               	1999-07-31 12:00:00.000  	8320 days, 4:32:04.896793
gb853               	2022-10-27 00:00:00.000  	-169 days, 16:31:59.898531

@aarchiba
Copy link
Contributor Author

I think reading TEMPO/TEMPO2 directories is maybe better put off to #1180 now that we have all the files from those distributions. If people have their own custom clock files they can put them in the PINT runtime directory.

@aarchiba aarchiba added the awaiting review This PR needs someone to review it so it can be merged label May 13, 2022
@aarchiba
Copy link
Contributor Author

Suggestion in telecon: add documentation describing how PINT finds/handles clock corrections, including the new tools for checking when they end. Also explain what you need to do to use updated clock files (it involves editing PINT source).

@aarchiba
Copy link
Contributor Author

Suggestion: allow observatories to specify clock_fmt="NONE" to indicate that they do not require clock corrections (presumably the data is referenced to GPS time). Currently most such observatories just have a clock file that specifies zero correction at MJDs 0 and 99999, but this is unnecessary.

@aarchiba
Copy link
Contributor Author

Now available: convenience functions.

pint.observatory.check_for_new_clock_files_in_tempo12_repos()

gives

Clock file time_gbt.dat has changed: 7618 lines in PINT, 6915 on web
    Differences:
        709 lines omitted
    Observatory file is in /home/naa280/.astropy/cache/download/url/932b32833dce11d53c3ec3f9191cf8f3/contents
Clock file time_ao.dat has changed: 9586 lines in PINT, 9584 on web
    Differences:
        --- 
        +++ 
        @@ -969,12 +969,10 @@
          49683.40      88.861      88.704 3 f  26-NOV-94
          49684.38      89.092      88.856 3 f  27-NOV-94
          49685.39      89.309      89.039 3 f  28-NOV-94
        -# Zapped these as they are out of order and the comments suggest they
        -# are invalid. Anne Archibald, 2022 May 11
        -# 49528.1       73.492      71.487 3 f  25-JUN-94  raw readouts
        -# 49544.9       72.0        73.0   3 f  11-JUL-94  no data, rough interp.-->
        -# 49545.9       72.0        73.1   3 f  12-JUL-94  --> need more precision
        -# 49557.79      74.213      74.287 3 f  24-JUL-94  your time.dat
        + 49528.1       73.492      71.487 3 f  25-JUN-94  raw readouts
        + 49544.9       72.0        73.0   3 f  11-JUL-94  no data, rough interp.-->
        + 49545.9       72.0        73.1   3 f  12-JUL-94  --> need more precision
        + 49557.79      74.213      74.287 3 f  24-JUL-94  your time.dat
         50155.0        0.000       -0.007 3    13-Mar-1996  post-upgrade data from
         50156.0        0.000       -0.060 3    14-Mar-1996  aosun:~joe/tac/aoclk.1
         50157.0        0.000       -0.103 3    15-Mar-1996  12 October 1999
    Observatory file is in /home/naa280/.astropy/cache/download/url/0bed8b7f5240a4cc30068748e50f5320/contents
Clock file time_gb140.dat has changed: 1674 lines in PINT, 1673 on web
    Differences:
        --- 
        +++ 
        @@ -1486,9 +1486,8 @@
         51206.500       0.0        24.086 a    28-Jan-99
         51207.500       0.0        24.123 a    29-Jan-99
         51208.500       0.0        24.150 a    30-Jan-99
        -# These seem to be repeated. Anne Archibald, 2022 May 11
        -# 51209.500       0.0        24.172 a    31-Jan-99
        -# 51210.500       0.0        24.197 a     1-Feb-99
        +51209.500       0.0        24.172 a    31-Jan-99
        +51210.500       0.0        24.197 a     1-Feb-99
         51209.500       0.0        24.190 a    31-Jan-99
         51210.500       0.0        24.218 a     1-Feb-99
         51211.500       0.0        24.272 a     2-Feb-99
    Observatory file is in /home/naa280/.astropy/cache/download/url/b2a64c81f6ec9da09f7ae6a520fad558/contents

and

pint.observatory.list_last_correction_mjds()

gives

gbt                      2022-02-28 12:00:00.000
    time_gbt.dat         2022-02-28 12:00:00.000
gbt_pre_2021             2022-02-28 12:00:00.000
    time_gbt.dat         2022-02-28 12:00:00.000
arecibo                  2020-08-18 00:00:00.000
    time_ao.dat          2020-08-18 00:00:00.000
arecibo_pre_2021         2020-08-18 00:00:00.000
    time_ao.dat          2020-08-18 00:00:00.000
vla                      2021-03-07 12:00:00.000
    time_vla.dat         2021-03-07 12:00:00.000
meerkat                  2021-02-22 23:44:59.971
    mk2utc.clk           2021-02-22 23:44:59.971
parkes                   2019-07-31 09:40:00.192
    pks2gps.clk          2019-07-31 09:40:00.192
jodrell                  2019-05-13 02:24:00.000
    time_jb.dat          2019-05-13 02:24:00.000
jodrell_pre_2021         2019-05-13 03:36:00.000
    time_jb.dat          2019-05-13 03:36:00.000
nancay                   2022-10-27 00:00:00.000
    ncy2gps.clk          2036-11-03 00:00:00.000
ncyobs                   2022-10-27 00:00:00.000
    ncyobs2obspm.clk     2132-08-31 23:59:59.914
    obspm2gps.clk        2132-08-31 23:59:59.914
effelsberg               2015-06-22 12:00:00.000
    eff2gps.clk          2015-06-22 12:00:00.000
effelsberg_pre_2021      2015-06-22 12:00:00.000
    eff2gps.clk          2015-06-22 12:00:00.000
gmrt                     MISSING
    gmrt2gps.clk         MISSING
wsrt                     2015-06-29 02:24:00.000
    wsrt2gps.clk         2015-06-29 02:24:00.000
fast                     2019-09-18 00:30:14.400
    time_fast.dat        2019-09-18 00:30:14.400
most                     2018-09-06 00:00:00.173
    mo2gps.clk           2018-09-06 00:00:00.173
chime                    2022-10-27 00:00:00.000
    time_chime.dat       2132-08-31 00:00:00.000
gb140                    1999-07-31 12:00:00.000
    time_gb140.dat       1999-07-31 12:00:00.000
gb853                    1997-08-28 09:50:24.000
    time_gb853.dat       1997-08-28 09:50:24.000

@scottransom
Copy link
Member

Hi All: Sorry that I missed the discussion about this due to my travels. I wanted to bring up that I'm not in favor of copying all of the TEMPO/Tempo2 clock files into PINT. To me that is basically exacerbating the problem that we have in the pulsar community with clock files -- it now means that clock files will likely be out of date in two or three packages at a time rather than one or two. And it means that we need to update every time TEMPO/Tempo2 updates. This is something that Jing and others of us talked about in the early days of PINT and decided explicitly to avoid.

IMO, if people want to use non-NANOGrav (or non-supported observatory) clock files, where they are primary in PINT, they should download TEMPO or Tempo2 and use those clock files as they like.

I think we should be constantly moving towards single locations where the latest observatory clock files are kept, and we use a caching mechanism to handle them all. This copying of files seems like a step backwards to me, and I don't think we should hold the hands of our users who want to use "non-standard" clock files.

@aarchiba
Copy link
Contributor Author

I agree that we should be moving towards a centralized repository; as I said in #1180 I am working on a rough draft of how that might work (it's at https://github.com/aarchiba/pulsar-clock-corrections but let me emphasize that it is a proof of concept and I need to discuss what PINT support would look like - perhaps during the busy day?).

If I understand correctly, your argument is that we should restrict PINT to having clock corrections only for a short list of observatories. Can you suggest such a list?

Before this PR PINT had clock corrections for:

  • GBT, gb853
  • Arecibo
  • Jodrell (only until 1997)
  • VLA
  • FAST

This PR adds:

  • Effelsberg
  • Meerkat
  • MOST
  • Parkes
  • WSRT

It would seem that your preference is that PINT be a NANOGrav-specific tool, with users of other instruments required to install additional software to use it?

As it currently stands (barring the sort of solution I mention above and we are discussing in #1180), PINT does not have any kind of search path. Thus with the current code, PINT developers must decide for each observatory between four options:

  1. PINT includes the clock file,
  2. PINT requires the user to have a $TEMPO pointing to tempo-format clock corrections they have installed separately,
  3. PINT requires the user to have a $TEMPO2 pointing to tempo2-format clock corrections they have installed separately, or
  4. PINT does no clock corrections for the observatory.

With this PR, observatories pointed at 2 or 3 emit a message and fall back to 4. (I don't get the impression that you have any problem with this.)

This PR also moves all observatories from 2 and 3 to 1. As I understand it, this is the part you have a problem with. I will point out that PINT also has a tool to easily check and update the clock files to track those in the TEMPO/TEMPO2 repository, and that this is a temporary measure awaiting a more global solution (and probably also a fallback if getting files across the Net is inconvenient).

Users who want custom clock files that aren't in any repository are in a difficult position with PINT, before or after this change; this does not attempt to solve their problem. I have some ideas on how to address this, but it's beyond the scope of this change.

@aarchiba
Copy link
Contributor Author

@scottransom
Copy link
Member

I truly don't think that PINT should have any clock files at all. Not even the NANOGrav ones (although given that PINT development has been NANOGrav-driven, I wouldn't be super opposed to keeping GBT, VLA, CHIME, and AO ones, but don't even think that is a good idea). So I'd be much more in favor of your (full) PR if it removed the ones that we had rather than adding new ones.

You mention this as a short-term solution, but I really don't think we can count on that. And this is one of the main problems.
The IPTA (and members of the pulsar community, in general) have been pushing for a global clock file repo (or, alternatively, "permanent" links where each observatory posts their clock corrections, as we have for GBT) for literally the last 12-15 years. And it still has not happened. That's because it needs the observatories to buy-in, and that hasn't happened (I think because most don't have the amount of support person-hours available to allocate for something that is just "nice" to have).

Moving these clock files into PINT does increase the maintenance burden since we do not have access to the providers of those files, and so we need to depend on Tempo or Tempo2 commits before updating ours. I really do think it is just making the problem worse.

Note that I have no problem explicitly always using the T2 runtime directory (if available) and falling back on the $TEMPO clock dir for clock files. Those packages are where the pulsar community gets most of their clock corrections now, so it isn't outrageous to demand that our users simply download those packages (they don't even need to install them!). Also note that one of the reasons why we started including GBT and AO clock files in PINT is because it is much harder to keep them up-to-date in TEMPO/Tempo2 as the PINT developers didn't all have write access to the TEMPO/Tempo2 repos. And we were using PINT for up-to-date NANOGrav development. So including them was a "short term solution" that is still (wrongly, IMO) being used.

I'll take a look at your global clock files doc.

@scottransom
Copy link
Member

TL;DR of thoughts on this: I love basically everything else in your PR and think they are all good improvements, except for adding additional clock files (and, once again, would prefer if your PR removed some or all of the ones we have!).

@aarchiba
Copy link
Contributor Author

I have a not-quite-PR-ready version of a draft of the global clock files machinery for PINT. But I think where we differ is on what to do until it's ready. Your suggestion is that we reduce the functionality of PINT in the meantime, or at least refuse to improve it. I think we should make it as usable as possible while we try to get the global clock corrections up and running.

Speaking of which, where is the GBT static clock file location? I can automate pulling it in to the pulsar_clock_corrections repository (which will result in my not-quite-PR-ready version pulling it in immediately).

@scottransom
Copy link
Member

"at least refuse to improve it" Hahaha. Interesting choice of words. I think this would possibly improve the usability of PINT for a small subset of folks in the short-term, but sets us back in multiple other ways (including maintenance). One point which hasn't been mentioned is that doing precise tests (especially with CI) is trickier if we don't have all of our own clock files, but would be doable with a bit of work.

As for the GBT clock corrections, they are here: https://www.gb.nrao.edu/~fghigo/timer/time_gbt.dat
And, unfortunately, AO clock corrections are now permanently static...

@aarchiba
Copy link
Contributor Author

It certainly improves the usability of PINT for me! I can now just use Parkes and Meerkat TOAs (the motivating example) and have them corrected.

Issues around CI and requiring a network connection are more properly the business of the global-clock-corrections tool. The question for now is: what do we do until that is available?

@scottransom
Copy link
Member

Right. But that same improvement would happen with your PR if you just had it use your Tempo/Tempo2 clock directories. Why is that not a better solution requiring less maintenance? And isn't that even an option in your PR? i.e. options 2 and 3 in your long explanation above?

In fact, it already works now! I just loaded MeerKAT TOAs and the clock corrections were automatically grabbed from the T2 runtime directory. So I don't really think this is much of an improvement at all.

@scottransom
Copy link
Member

Similarly, I've been automatically using Parkes clock corrections from the T2 runtime directory seamlessly for the last month or two while I've been working on pintk.

@aarchiba
Copy link
Contributor Author

Totalliy minor but I stopped the tiresome ERFA warning about a bad year in gps2utc.clk - it has a bogus 99999 at the end.

@scottransom
Copy link
Member

As I've mentioned, I'd much prefer that you just remove the Jodrell corrections (especially since they are effectively useless now) and replace them with the tempo2 runtime pointer. That's what we've done with the vast majority of other telescopes.

As for the global clock repo, I don't understand why we'd need to wait on that. As an absolute worst case, we keep it up-to-date with TEMPO / Tempo2 until we get folks used to updating their telescopes in it themselves (or providing a link where those corrections can be pulled, ala GBT). And keeping some clock files up to date is already the situation we are in for PINT.

I really don't buy the argument that PINT should be usable out of the box, because it isn't already, and never will be. We will always need to pull ephemerides and Earth rotation info and leap seconds (for instance). I don't see how clock corrections are any different. And having us include yet another copy of likely out of date ones is (IMO) a step backwards.

Would love to hear what @paulray and @demorest think about this.

Also, as for Sourceforge, I read the wiki page and it seems like most of those issues have been resolved or have gone away. I've tried to get David N and the other TEMPO devs (not me) to move the main repo to github in the past without success. So I doubt that it will change, and in the meantime that is the main repo.

@aarchiba
Copy link
Contributor Author

If you want the global clock corrections to happen soon, go discuss with me on #1247. There's a lot to do. In the meantime, let's keep PINT as functional as possible. If the meantime isn't very long, hooray! But let's not remove functionality.

@scottransom
Copy link
Member

Fine. Then just forget the Jodrell additions, and change the default to Tempo2. That improves functionality and does't add a ton of very-soon-to-be-useless cruft:
image

@aarchiba
Copy link
Contributor Author

Fine. Then just forget the Jodrell additions, and change the default to Tempo2. That improves functionality and does't add a ton of very-soon-to-be-useless cruft: image

If I change the "default" to TEMPO2 users can no longer even use the Jodrell clock corrections in PINT. We don't have search paths.

Who cares how many lines are in the git repo? It was created specifically to be fast for the Linux kernel source tree (legend has it Linus Torvalds wrote git while he was waiting for the existing source control tool to do some operation on the repo). This seems like a weird point to block merging of this PR.

@scottransom
Copy link
Member

If I change the "default" to TEMPO2 users can no longer even use the Jodrell clock corrections in PINT. We don't have search paths.

That's exactly the point. The Jodrell clock corrections in PINT right now should never have been put in there. I have no idea why they are even there (unless it was for a very limited test case for observatory handling). We should treat that file as a bug and fix the issue. (And yes, your solution does "fix" the issue, but with other problems, IMO)

Who cares how many lines are in the git repo?

That's the symptom. The problem is having duplicated (from TEMPO and Tempo2) and out-of-date clock files in PINT. (However, there is substantial impact for stuff like this going into and out of a repo. It definitely accumulates over time. Just look at the NANOGrav timing repos which are way bigger than they should be because of our changes of many large text files.)

@paulray
Copy link
Member

paulray commented May 26, 2022

Please merge and I'll tag a new version.

@scottransom scottransom merged commit fa1dc4c into nanograv:master May 26, 2022
@scottransom
Copy link
Member

I resolved the conflict using the web-resolver. Feel free to tag-away, @paulray

@aarchiba
Copy link
Contributor Author

yikes. I was in the midst of resolving them locally. I'd better figure out what kind of situation the repo is in.

@scottransom
Copy link
Member

It was literally just a one-line conflict with CHANGELOG.md. So I suspect that the repo is just fine.

@paulray
Copy link
Member

paulray commented May 26, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review This PR needs someone to review it so it can be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TEMPO-format observatories must have a TEMPO site code to find clock corrections
3 participants