Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI error when running LGST #43

Closed
gribeill opened this issue Feb 6, 2019 · 4 comments
Closed

MPI error when running LGST #43

gribeill opened this issue Feb 6, 2019 · 4 comments

Comments

@gribeill
Copy link

gribeill commented Feb 6, 2019

When running two-qubit GST using MPI I am getting the following error, on the 5th iteration of MLGST

Traceback (most recent call last):
  File "pygsti_2q_mpi.py", line 58, in <module>
    memLimit=memLim, verbosity=3, comm=comm)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/drivers/longsequence.py", line 466, in do_long_sequence_gst
    output_pkl, printer)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/drivers/longsequence.py", line 685, in do_long_sequence_gst_base
    gs_lsgst_list = _alg.do_iterative_mlgst(**args)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/algorithms/core.py", line 2845, in do_iterative_mlgst
    memLimit, comm, distributeMethod, profiler, evt_cache)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/algorithms/core.py", line 1439, in do_mc2gst
    verbosity=printer-1, profiler=profiler)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/optimize/customlm.py", line 209, in custom_leastsq
    new_f = obj_fn(new_x)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/algorithms/core.py", line 1207, in _objective_func
    gs.bulk_fill_probs(probs, evTree, probClipInterval, check, comm)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/objects/gateset.py", line 2637, in bulk_fill_probs
    evalTree, clipTo, check, comm)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/objects/gatematrixcalc.py", line 2067, in bulk_fill_probs
    mySubTreeIndices, subTreeOwners, mySubComm = evalTree.distribute(comm)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/objects/evaltree.py", line 441, in distribute
    _mpit.distribute_indices(list(range(nSubtreeComms)), comm)
  File "/home/gribeill/GitHub/pyGSTi/packages/pygsti/tools/mpitools.py", line 79, in distribute_indices
    loc_comm = comm.Split(color=color, key=rank)
  File "MPI/Comm.pyx", line 199, in mpi4py.MPI.Comm.Split (src/mpi4py.MPI.c:91864)
mpi4py.MPI.Exception: Other MPI error, error stack:
PMPI_Comm_split(471)..........: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=11, new_comm=0x7faa10d37178) failed
PMPI_Comm_split(453)..........:
MPIR_Comm_split_impl(222).....:
MPIR_Get_contextid_sparse(752): Too many communicators

This is with pyGSTi v0.9.5, and mpi4py v2.0.0, run with mpiexec -n 16 python3 pygsti_2q_mpi.py

Here's the script: pygsti_2q_mpi.py

Any hints as to what is going wrong would be appreciated. I'm rerunning this with mpi4py v3.0.0 right now...

@enielse
Copy link
Collaborator

enielse commented Feb 7, 2019

I was able to run your script successfully on multiple systems after replacing the actual data (loaded from files that I don't have) with simulated data (using pygsti.construction.generate_fake_data). I don't think this change should make any difference in the issue you're seeing, so I think there may be an issue with your MPI installation.

I've seen this same error once before, and in that case it was an MPI library error - not mpi4py but the underlying openmpi or mpich installation. To test this, please try to run the following script, which just tests whether your MPI split function works independently of anything pyGSTi related.

Put this in test.py:

from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
color = rank / 4
spltcomm = comm.Split(color=color, key=rank)
print("Rank %d: color=%d, split-comm rank=%d" % (rank, color, spltcomm.Get_rank()))

and then execute it via mpiexec -n 16 python test.py. This produces the following output on my system (line ordering is inconsequential and may be different on your machine):

Rank 3: color=0, split-comm rank=3
Rank 5: color=1, split-comm rank=1
Rank 6: color=1, split-comm rank=2
Rank 7: color=1, split-comm rank=3
Rank 8: color=2, split-comm rank=0
Rank 9: color=2, split-comm rank=1
Rank 10: color=2, split-comm rank=2
Rank 11: color=2, split-comm rank=3
Rank 12: color=3, split-comm rank=0
Rank 13: color=3, split-comm rank=1
Rank 14: color=3, split-comm rank=2
Rank 15: color=3, split-comm rank=3
Rank 0: color=0, split-comm rank=0
Rank 1: color=0, split-comm rank=1
Rank 2: color=0, split-comm rank=2
Rank 4: color=1, split-comm rank=0

If you get an error or different output this probably means your MPI library is broken. The last time I saw this, the MPI error you cited above was reproduced by the comm.Split call in test.py.

@gribeill
Copy link
Author

gribeill commented Feb 8, 2019

Thanks for the hints! Unfortunately the test script appears to work fine on this machine. I'll do some more investigating and try a clean install of MPI as soon as I can.

@matthewware
Copy link

@gribeill, did we decide this was due to a parser issue/malformed file?

@gribeill
Copy link
Author

Yes, fixing the data file (no extra comments!) and a clean install of MPI seems to have fixed everything so I will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants