Skip to content

Use a cache to store PySM3 data files#147

Merged
ziotom78 merged 11 commits intomasterfrom
cache_pysm3
Feb 17, 2022
Merged

Use a cache to store PySM3 data files#147
ziotom78 merged 11 commits intomasterfrom
cache_pysm3

Conversation

@ziotom78
Copy link
Copy Markdown
Member

As it is often the case that PySM3 data files are not available due to network outage, this PR uses https://github.com/actions/cache to cache it. It should be quite efficient, as cache files are compressed using Zstandard, which in my tests showed very good compression ratios for PySM3 maps.

I use here the trick explained in the PySM3 User's Manual

@ziotom78
Copy link
Copy Markdown
Member Author

Not relevant for the PR, but just to keep a record of my tests and to keep in mind the amount of storage we can save with different compression schemes. It's also a testimony to Zstandard's awesomeness!

I downloaded the PySM3 data archive and create a tarball:

$ git clone https://github.com/galsci/pysm-data pysm3-data
$ tar cf archive.tar pysm3-data

and then I compressed archive.tar using gzip, bzip2, and zstd. Here are the file sizes:

File Size
Uncompressed 754M
BZip2 641M
Gzip 632M
Zstandard 615M

image

Zstandard is the winner here. However, the most impressive result comes from compression speed:

File Compression time
BZip2 100 s
Gzip 34 s
Zstandard 3 s

image

Not only Zstandard has achieved the best compression ratio, but it has performed the compression in a fraction of the time required by the other two algorithms.

@ziotom78
Copy link
Copy Markdown
Member Author

There is a problem here, because the cache gets saved and restored as desired, but it is never accessed when running the Mbs module.

After a few hours of debugging, I discovered that this happens because the environment variable PYSM_LOCAL_DATA (used to implement caching) is being overwritten in mbs/mbs.py.

@NicolettaK, if I understand correctly, you are using this feature so that we can pass our own CMB realizations to PySM3, is this correct?

@ziotom78
Copy link
Copy Markdown
Member Author

ziotom78 commented Dec 7, 2021

@NicolettaK , may you please have a look at mbs.py, line 568? I would like to remove the line where the code changes the value of the environment variable PYSM_LOCAL_DATA, but I am unsure how to do so in a way that lets Mbs keep working as expected.

@ziotom78
Copy link
Copy Markdown
Member Author

Testing the use of Path.absolute() after a suggestion in PySM issue #102.

@ziotom78 ziotom78 merged commit 381cde9 into master Feb 17, 2022
@ziotom78 ziotom78 deleted the cache_pysm3 branch February 17, 2022 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant