[Feature Request] Single Command for Caching Remote Files for Runs on Machines with No Access to the Internet #1372

JPGlaser · 2022-08-05T18:17:22Z

Hey All,

As I said in last week's call, we should work towards adding a command that grabs the necessary files that normally it would attempt to grab from the web, and instead caches them locally. This would allow PINT to be used on compute nodes which usually have no outside access to the Internet.

Right now, I get around this by running python -c "import pint.toa" and that will nab some files from astropy and cache them for ~ a week.

It would also be nice to force PINT to tell astropy to use old cached files if the user requires it -or- if there is no internet connection, instead of killing the script.

~ Joe G.

The text was updated successfully, but these errors were encountered:

dlakaplan · 2022-08-05T18:36:19Z

This should accept as options a list of telescopes, or perhaps have a default list.

abhisrkckl · 2022-08-07T04:42:01Z

I think it will be good to download new files (including metadata) only if it is necessary. In principle, we can just look at the latest TOA value and decide if the local files are sufficient.

dlakaplan · 2022-08-08T18:25:19Z

https://nanograv-pint.readthedocs.io/en/latest/_autosummary/pint.observatory.topo_obs.export_all_clock_files.html#pint.observatory.topo_obs.export_all_clock_files does this if you have loaded in data pertaining to the relevant observatories. So this could be run as the last step in the timing analysis. But it should also be possible to pre-load various observatories.

dlakaplan · 2022-08-08T18:47:47Z

@JPGlaser : does #1373 do what you want? Is that the interface you want?

dlakaplan · 2022-08-08T20:43:09Z

Note that this also doesn't deal explicitly with whatever files astropy uses on it's own. that is discussed at https://docs.astropy.org/en/stable/utils/data.html#astropy-data-and-clusters: do you need something to do that as well?

aarchiba · 2022-08-10T16:50:41Z

This should accept as options a list of telescopes, or perhaps have a default list.

I think for Joe's application the easiest thing to do is just export all the telescope clock corrections PINT knows about.

aarchiba · 2022-08-10T16:51:59Z

I think it will be good to download new files (including metadata) only if it is necessary. In principle, we can just look at the latest TOA value and decide if the local files are sufficient.

The clock corrections code does this. You only get new clock corrections if you have a TOA past the end of the ones you currently have - or if the repository has flagged your current version of a clock correction file as needing updates (presumably because it contained errors).

aarchiba · 2022-08-10T16:56:29Z

To make this happen, I think we need:

A single command to download all clock corrections into the astropy cache.
A single command to download all ephemerides we think the user might want into the astropy cache. (This is a bit speculative, who knows what ephemerides our users might want, but we can grab a generous selection and allow people doing this to ask for extras.)
A single command to trigger downloading of all Astropy data into the Astropy cache.
A single command that calls all of the above.
Easy instructions on how to tell PINT/Astropy never to go looking for anything on the Internet no matter what.

Then someone setting up a cluster would call the single command to preload the cache, and then set the "never check the internet" flag on the nodes.

aarchiba · 2022-08-10T16:57:37Z

@JPGlaser : does #1373 do what you want? Is that the interface you want?

I think, from the discussion, that Joe wants everything preloaded into the Astropy cache, rather than exported anywhere else.

aarchiba · 2022-08-11T13:43:31Z

The way to check this works is to use clear_download_cache, run the (as yet hypothetical) command, list what's in the cache with cache_contents, then do a bunch of pulsar timing (including run the PINT test suite), and use cache_contents again to see if anything has changed.

Issues that will arise:

Although PINT is smart enough not to ask for new clock corrections unless it needs them, Astropy is not so smart with IERS data; it'll ask for more if the data is "old" whether you need new data or not. This behaviour can be suppressed with appropriate Astropy configuration.
PINT allows for the possibility that I might need to change old clock correction entries, so it periodically checks the index for invalidation markers even if you're only using old TOAs. This can also be suppressed with appropriate Astropy configuration.
Containers or cluster nodes will need access to the Astropy cache that has been preloaded. This usually lives in ~/.astropy/cache so that needs to somehow be shared. (Concurrent read-only access to the cache should pose no problems.)
Although the Astropy documentation includes a section (https://docs.astropy.org/en/stable/utils/data.html#using-astropy-with-limited-or-no-internet-access) on dealing with exactly this problem the question keeps coming up, so it's not clear how to let people know what the solution is.

JPGlaser · 2022-08-11T16:36:15Z

For completeness, this is an example of the type of things that need to be pulled down during an import:

Singularity> python -c "import pint.toa"
Downloading http://hpiers.obspm.fr/iers/eop/eopc04/eopc04_IAU2000.62-now
|===================================================================| 3.4M/3.4M (100.00%)         0s
Downloading https://hpiers.obspm.fr/iers/bul/bulc/Leap_Second.dat
|===================================================================| 1.3k/1.3k (100.00%)         0s

My astropy cache looks like this:

['[ftp://anonymous:mail%40astropy.org@gdc.cddis.eosdis.nasa.gov/pub/products/iers/finals2000A.all](ftp://anonymous:mail%40astropy%2Eorg@gdc.cddis.eosdis.nasa.gov/pub/products/iers/finals2000A.all)', 

'http://hpiers.obspm.fr/iers/eop/eopc04/eopc04_IAU2000.62-now', 

'https://data.nanograv.org/static/data/ephem/de436.bsp', 

'https://data.nanograv.org/static/data/ephem/de440.bsp', 

'https://hpiers.obspm.fr/iers/bul/bulc/Leap_Second.dat', 

'https://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/de430.bsp', 

'https://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/de432s.bsp', 

'https://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/de440.bsp', 

'https://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/de440s.bsp']

aarchiba · 2022-08-12T09:06:35Z

The IERS_Auto table in recent versions of Astropy seems to only update if you have new data: https://github.com/astropy/astropy/blob/11b3214f18b74aea5e3f8349e50ae1b09c39d30e/astropy/utils/iers/iers.py#L756

dlakaplan added the feature request label Aug 5, 2022

dlakaplan assigned dlakaplan and aarchiba and unassigned dlakaplan Aug 5, 2022

dlakaplan mentioned this issue Aug 8, 2022

Script to export clock files #1373

Open

aarchiba linked a pull request Aug 11, 2022 that will close this issue

WIP: single function call to preload Astropy cache #1374

Open

dlakaplan mentioned this issue May 18, 2023

Wishlist for PINT development #1576

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Single Command for Caching Remote Files for Runs on Machines with No Access to the Internet #1372

[Feature Request] Single Command for Caching Remote Files for Runs on Machines with No Access to the Internet #1372

JPGlaser commented Aug 5, 2022

dlakaplan commented Aug 5, 2022

abhisrkckl commented Aug 7, 2022

dlakaplan commented Aug 8, 2022

dlakaplan commented Aug 8, 2022

dlakaplan commented Aug 8, 2022

aarchiba commented Aug 10, 2022

aarchiba commented Aug 10, 2022

aarchiba commented Aug 10, 2022

aarchiba commented Aug 10, 2022

aarchiba commented Aug 11, 2022

JPGlaser commented Aug 11, 2022 •

edited

aarchiba commented Aug 12, 2022

[Feature Request] Single Command for Caching Remote Files for Runs on Machines with No Access to the Internet #1372

[Feature Request] Single Command for Caching Remote Files for Runs on Machines with No Access to the Internet #1372

Comments

JPGlaser commented Aug 5, 2022

dlakaplan commented Aug 5, 2022

abhisrkckl commented Aug 7, 2022

dlakaplan commented Aug 8, 2022

dlakaplan commented Aug 8, 2022

dlakaplan commented Aug 8, 2022

aarchiba commented Aug 10, 2022

aarchiba commented Aug 10, 2022

aarchiba commented Aug 10, 2022

aarchiba commented Aug 10, 2022

aarchiba commented Aug 11, 2022

JPGlaser commented Aug 11, 2022 • edited

aarchiba commented Aug 12, 2022

JPGlaser commented Aug 11, 2022 •

edited