Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetPyNe with Optuna batch - mpiexec not starting nrniv [Bug report] #797

Open
samnemo opened this issue Nov 20, 2023 · 2 comments
Open

NetPyNe with Optuna batch - mpiexec not starting nrniv [Bug report] #797

samnemo opened this issue Nov 20, 2023 · 2 comments
Labels

Comments

@samnemo
Copy link
Collaborator

samnemo commented Nov 20, 2023

Describe the bug

When I run an Optuna batch optimization with the A1 model, mpiexec has trouble running the nrniv processes for the simulation. NetPyNe doesn't check the return calls from subprocess Popen and then waits indefinitely since the output is never produced. It seems that nrniv processes might get started but are put to sleep immediately.

This is using conda on Ubuntu with following relevant packages:

python
Python 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import netpyne
netpyne.version
'1.0.5'
import neuron
neuron.version
'8.2.3'
import optuna
optuna.version
'3.4.0'

Reproducing the bug

Steps to reproduce the behavior:
Go to the A1 repo/branch here:
https://github.com/NathanKlineInstitute/A1/tree/samn

Then run
python batch.py

Expected behavior

I expected the mpiexec process to start nrniv properly, but nrniv fails to start. Running the mpiexec command directly runs simulations properly, but once using batch.py/NetPyNe batch with Optuna, nrniv does not start properly.

System information

See above

Additional context

Check with samn or James C for more details on reproducing the bug

@samnemo samnemo added the bug label Nov 20, 2023
@samnemo
Copy link
Collaborator Author

samnemo commented Nov 20, 2023

and this is the mpi version:
mpiexec --version
mpiexec (OpenRTE) 4.0.2

@samnemo
Copy link
Collaborator Author

samnemo commented Nov 21, 2023

in optuna_parallel.py the nrniv jobs seemed to be going to sleep/getting suspended
putting a quit in the right place seemed to allow the later mpiexec with nrniv processes start properly

jobString = f"""#!/bin/bash
echo '{paramLabels}'
echo '{candidate}'
nrniv -python -c 'from neuron import h;soma = h.Section(name="soma");h.psection();quit()'
echo $?
mpiexec -n 48 nrniv -python -c 'from neuron import h;h.nrnmpi_init();pc=h.ParallelContext();print(pc.id())'
echo $?

{command}    
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant