libsharp discontinued, alternatives for MPI smoothing? #92

zonca · 2021-10-21T00:26:03Z

PySM relies on libsharp to smooth maps over MPI.

Libsharp is not maintained anymore: https://gitlab.mpcdf.mpg.de/mtr/libsharp

Is anyone using PySM over MPI? Should we simplify the code and remove MPI support? or find an alternative for libsharp (@mreineck suggestions?) ?

@NicolettaK @giuspugl @bthorne93

The text was updated successfully, but these errors were encountered:

mreineck · 2021-10-21T06:26:31Z

If you are currently using libsharp and it is doing what you need, I don't think there is a reason to worry; if any bugs are found in the code, I'm still happy to fix them. My main reason to archive the repo was that I don't want people to start new projects using this code.
That said, if you have a genuine need for SHTs with MPI, I'd be very interested to hear about the use case. My impression currently is that single compute nodes are strong enough to compute SHTs of any "typical" size very quickly, and that using MPI within a single node is a waste of resources (at least for SHTs). I might of course be wrong, and if so, I'd like to hear about it :-)

zonca · 2021-10-21T15:33:16Z

We are doing foreground modelling with PySM at N_side 8192, each map in memory in double precision just one component is 6.5 GB, the more demanding model has 6 layers, each with 1 IQU map and 2 auxiliary, so it's about ~200 GB just the inputs. That is a single model, in a complete simulation we would have 4 galactic models, 3 extragalactic, 2 or 3 CMB components.

It's not doable yet on standard compute nodes. And we might need N_side 16382.

mreineck · 2021-10-21T18:12:40Z

Thanks for the data, that is very helpful!
The approach I would suggest (for optimizing SHT performance) in this situation is to

distribute the components over the compute nodes in a round-robin fashion
let every node do the necessary SHT steps (analysis+smothing+synthesis) assigned to it, using multithreading with all threads on the node
communicate the results to the appropriate MPI tasks again

This sould be the fastest way of carrying out this operation. Of course the additional communication makes things more complicated, so I can perfectly understand if this is not your preferred solution. However, doing an SHT, even if it is nside=8192, lmax=16384, with hybrid MPI/OpenMP (or even wose with pure MPI) parallelization over >100 threads is quite inefficient. If you have the chance to do several SHTs simultaneously on fewer threads each, this would be preferable.

mreineck · 2021-10-21T18:15:58Z

PS. Out of curiosity, if you go to map space for intermediate computations (not for the final data product), have you considered using grids other than Healpix, e.g. Gauss-Legendre? Depending on the operations you have to carry out, this could be advantageous, since the SHTs are potentially significantly faster and also exact.
If you require equal-area properties, this won't work of course.

zonca · 2021-10-21T19:54:54Z

@mreineck I don't know much about Gauss-Legendre, do you have a reference I could look into?

mreineck · 2021-10-21T20:05:14Z

If your band limit is lmax, the minimal corresponding Gauss-Legendre grid has (lmax+1) nodes in theta (non-equidistant) times (2*lmax+1) nodes in phi (equidistant). SHTs are exact in both direction as long as the band limit holds. This grid has the fastest SHTs you can get, roughly twice as fast as using a Healpix grid with 2*nside=lmax. Basics are briefly mentioned in, e.g., https://hal-insu.archives-ouvertes.fr/file/index/docid/762867/filename/sht.pdf.
libsharp and my other SHT libraries support this out of the box.

[Edit: fixed the nside expression]

zonca · 2021-10-29T23:05:36Z

thanks @mreineck, do you have an example of using:

https://mtr.pages.mpcdf.de/ducc/sht.html#ducc0.sht.experimental.synthesis

to transform a set of IQU Alms into a map in GL pixels and then use analysis to go back to Alms?

It's really hard to understand how to use it from the API reference.

mreineck · 2021-10-30T07:00:31Z

The good thing about the GL grid is that it is "regular", i.e. it

has the same number of pixels on every ring
every ring starts at the same azimuth
This allows using the https://mtr.pages.mpcdf.de/ducc/sht.html#ducc0.sht.experimental.synthesis_2d function and friends, which has a much simper interface.

Its usage is demonstrated, e.g., in https://gitlab.mpcdf.mpg.de/mtr/ducc/-/blob/ducc0/python/demos/sht_demo.py.
This is only without polarisation; I'll add polarisation and post the modified code here soon!

mreineck · 2021-10-30T07:15:39Z

Here it is. (Sorry, Github doesn't appear to let me attach it.)

import ducc0
import numpy as np
from time import time

rng = np.random.default_rng(48)

def nalm(lmax, mmax):
    return ((mmax+1)*(mmax+2))//2 + (mmax+1)*(lmax-mmax)

def random_alm(lmax, mmax, spin):
    spin = list(spin)
    ncomp = len(spin)
    res = rng.uniform(-1., 1., (ncomp, nalm(lmax, mmax))) \
     + 1j*rng.uniform(-1., 1., (ncomp, nalm(lmax, mmax)))
    # make a_lm with m==0 real-valued
    res[:, 0:lmax+1].imag = 0.
    # zero a few other values dependent on spin
    for i in range(ncomp):
        ofs=0
        for s in range(spin[i]):
            res[i, ofs:ofs+spin[i]-s] = 0.
            ofs += lmax+1-s
    return res

# just run on one thread
nthreads = 1

# set maximum multipole moment
lmax = 2047
# maximum m.
mmax = lmax

# Number of pixels per ring. Must be >=2*lmax+1, but I'm choosing a larger
# number for which the FFT is faster.
nlon = 2*lmax+2

alm = random_alm(lmax, mmax, [0, 2, 2])
print("testing Gauss-Legendre grid with lmax+1 rings")

# Number of iso-latitude rings required for Gauss-Legendre grid
nlat = lmax+1

# go from a_lm to map
t0 = time()
map = np.empty((3, nlat, nlon))
# unpolarised component
ducc0.sht.experimental.synthesis_2d(
    alm=alm[0:1], ntheta=nlat, nphi=nlon, lmax=lmax, mmax=mmax, spin=0,
    geometry="GL", nthreads=nthreads, map=map[0:1])
# polarised component
ducc0.sht.experimental.synthesis_2d(
    alm=alm[1:3], ntheta=nlat, nphi=nlon, lmax=lmax, mmax=mmax, spin=2,
    geometry="GL", nthreads=nthreads, map=map[1:3])
print("time for map synthesis: {}s".format(time()-t0))

# transform back to a_lm

t0 = time()
alm2 = np.empty_like(alm)
# unpolarised component
ducc0.sht.experimental.analysis_2d(
    map=map[0:1], lmax=lmax, mmax=mmax, spin=0, geometry="GL", nthreads=nthreads, alm=alm2[0:1])
# polarised component
ducc0.sht.experimental.analysis_2d(
    map=map[1:3], lmax=lmax, mmax=mmax, spin=2, geometry="GL", nthreads=nthreads, alm=alm2[1:3])
print("time for map analysis: {}s".format(time()-t0))

# make sure input was recovered accurately
print("L2 error: ", ducc0.misc.l2error(alm, alm2))

zonca · 2021-11-04T03:37:16Z

discussion about libsharp is completed, discussion about Gauss-Legendre pixelization in #91

zonca self-assigned this Oct 21, 2021

zonca mentioned this issue Oct 28, 2021

Provide a target_nside parameter to the smoothing function #91

Closed

zonca closed this as completed Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libsharp discontinued, alternatives for MPI smoothing? #92

libsharp discontinued, alternatives for MPI smoothing? #92

zonca commented Oct 21, 2021

mreineck commented Oct 21, 2021

zonca commented Oct 21, 2021

mreineck commented Oct 21, 2021

mreineck commented Oct 21, 2021 •

edited

Loading

zonca commented Oct 21, 2021

mreineck commented Oct 21, 2021 •

edited

Loading

zonca commented Oct 29, 2021

mreineck commented Oct 30, 2021

mreineck commented Oct 30, 2021

zonca commented Nov 4, 2021

libsharp discontinued, alternatives for MPI smoothing? #92

libsharp discontinued, alternatives for MPI smoothing? #92

Comments

zonca commented Oct 21, 2021

mreineck commented Oct 21, 2021

zonca commented Oct 21, 2021

mreineck commented Oct 21, 2021

mreineck commented Oct 21, 2021 • edited Loading

zonca commented Oct 21, 2021

mreineck commented Oct 21, 2021 • edited Loading

zonca commented Oct 29, 2021

mreineck commented Oct 30, 2021

mreineck commented Oct 30, 2021

zonca commented Nov 4, 2021

mreineck commented Oct 21, 2021 •

edited

Loading

mreineck commented Oct 21, 2021 •

edited

Loading