Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allgather does not always get a buffer as input #250

Open
iparask opened this issue Feb 16, 2024 · 3 comments
Open

Allgather does not always get a buffer as input #250

iparask opened this issue Feb 16, 2024 · 3 comments

Comments

@iparask
Copy link
Member

iparask commented Feb 16, 2024

  • pixell version: 0.19.2
  • Python version: 3.10.0
  • Operating System: Princeton Tiger OS
LSB Version:	:core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID:	Springdale
Description:	Springdale Linux release 7.9 (Verona)
Release:	7.9
Codename:	Verona

Description

I'm doing some mapmaking test runs using the multipass_fix branch of sotodlib. While running under a single MPI process, I got the following error:

  File "/scratch/gpfs/ip8725/tmp/mapmaker.py", line 338, in <module>
    for step in mapmaker.solve(maxiter=passinfo.maxiter, x0=x0):
  File "/home/ip8725/.local/lib/python3.10/site-packages/sotodlib/mapmaking/ml_mapmaker.py", line 122, in solve
    solver = utils.CG(self.A, rhs, M=self.M, dot=self.dof.dot, x0=x0)
  File "/projects/SIMONSOBS/tiger_envs/soconda_3.10/lib/python3.10/site-packages/pixell/utils.py", line 3059, in __init__
    self.rz  = self.dot(self.r, z)
  File "/home/ip8725/.local/lib/python3.10/site-packages/sotodlib/mapmaking/utilities.py", line 90, in dot
    res += dof.dot(a[b1:b2],b[b1:b2])
  File "/home/ip8725/.local/lib/python3.10/site-packages/sotodlib/mapmaking/utilities.py", line 65, in dot
    return np.sum(a*b) if self.comm is None else utils.allreduce(np.sum(a*b), self.comm)
  File "/projects/SIMONSOBS/tiger_envs/soconda_3.10/lib/python3.10/site-packages/pixell/utils.py", line 1225, in allreduce
    if op is None: comm.Allreduce(a, res)
  File "mpi4py/MPI/Comm.pyx", line 876, in mpi4py.MPI.Comm.Allreduce
  File "mpi4py/MPI/msgbuffer.pxi", line 748, in mpi4py.MPI._p_msg_cco.for_allreduce
  File "mpi4py/MPI/msgbuffer.pxi", line 701, in mpi4py.MPI._p_msg_cco.for_cro_recv
  File "mpi4py/MPI/msgbuffer.pxi", line 203, in mpi4py.MPI.message_simple
  File "mpi4py/MPI/msgbuffer.pxi", line 138, in mpi4py.MPI.message_basic
  File "mpi4py/MPI/asbuffer.pxi", line 365, in mpi4py.MPI.getbuffer
  File "mpi4py/MPI/asbuffer.pxi", line 148, in mpi4py.MPI.PyMPI_GetBuffer
  File "mpi4py/MPI/asbuffer.pxi", line 140, in mpi4py.MPI.PyMPI_GetBuffer
BufferError: scalar buffer is readonly

@tskisner suggested that a, res in comm.Allreduce might not be numpy arrays. After some debugging I found out that a was a numpy.float64.

Fix suggestion

Add a check in utils.allreduce for generic scalar numpy type and change them to arrays, and make sure that the returned value is also a scalar.

@amaurea
Copy link
Collaborator

amaurea commented Feb 16, 2024

This is weird. I wrote this code, and it worked in all my tests, despite the argument being a scalar. I wonder if something else is wrong.

@iparask
Copy link
Member Author

iparask commented Feb 16, 2024

I do not know what the exact reason is. Is there a chance the MPI implementation plays a role?

I did not know that asarray did not maintain the subclass. Also, I was looking at the code, and I'm wondering if the same needs to happen here, here, and here for example?

Unless for some reason you do not want to keep the subclass in those cases.

@amaurea
Copy link
Collaborator

amaurea commented Feb 21, 2024

Maybe that's a good idea, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants