Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIs for MSMPI #590

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

CIs for MSMPI #590

wants to merge 5 commits into from

Conversation

jasmainak
Copy link
Collaborator

@jasmainak jasmainak commented Jan 20, 2023

closes #589

It looks like we can inherit the github action from mpi4py :)

@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2023

Codecov Report

Merging #590 (ef536c5) into master (2e53827) will decrease coverage by 0.08%.
The diff coverage is n/a.

❗ Current head ef536c5 differs from pull request most recent head f41bba4. Consider uploading reports for the commit f41bba4 to get more accurate results

@@            Coverage Diff             @@
##           master     #590      +/-   ##
==========================================
- Coverage   92.00%   91.93%   -0.08%     
==========================================
  Files          22       22              
  Lines        4253     4253              
==========================================
- Hits         3913     3910       -3     
- Misses        340      343       +3     
Impacted Files Coverage Δ
hnn_core/parallel_backends.py 82.22% <0.00%> (-0.84%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jasmainak
Copy link
Collaborator Author

It's pretty hard for me to debug any further without a Windows machine. @rythorpe feel free to take over. But if you feel it's too involved, we can move on and work on it some time after the release as well.

@chenghuzi
Copy link
Collaborator

Ryan reminded me of this, I then ran it in my local fork and it works. So I guess it might be a cache problem? That's the only difference I can think of now. @jasmainak could you try to update the cache key to check?

@rythorpe
Copy link
Contributor

I'll try to get around to testing this out later this week.

@rythorpe
Copy link
Contributor

rythorpe commented Feb 9, 2023

I've tried debugging this for a few hours on Windows 10 without much luck. I'm pretty sure this is some sort of unicode encoding/decoding error on the child process's side, however, it is extremely tricky to trace given that each mpi_child process is actually a child process within a child process (i.e., Popen starts a master subprocess that performs a system call to mpiexec, which in turn instantiates multiple parallel MPI processes). It's hard to know how and where a raw string pathname is passed to the system where backslashes get interpreted as escaped characters, and I suspect that one particular pathname (mpi_child.py) gets re-interpreted on multiple occasions throughout it's trajectory before finally getting called by the python interpreter.

On the bright side, I can confirm that MSMPI get's installed with NEURON on Windows and passes our parallel backend installation test (with removal of the backslashes of course).

@rythorpe
Copy link
Contributor

rythorpe commented Feb 9, 2023

Any thoughts @jasmainak?

@jasmainak
Copy link
Collaborator Author

jasmainak commented Feb 9, 2023

Does this work or help: #506 since there is one less nesting level to worry about?

The way I would debug this is by peppering the mpi_child.py code with dumb print (e.g., print('hi1'), print('hi2') etc) statements ... then you can figure out where exactly the failure occurs ...

If this is too much of a bother, I would say that for 0.3 release, we should prioritize:

  1. The GUI ... did you figure out the issue with the titles?
  2. Release Windows without MPI support for now since it is more for cluster set ups which will likely be running Linux

@rythorpe
Copy link
Contributor

rythorpe commented Feb 10, 2023

Does this work or help: #506 since there is one less nesting level to worry about?

That's a good idea. I'll try it out.

The way I would debug this is by peppering the mpi_child.py code with dumb print (e.g., print('hi1'), print('hi2') etc) statements ... then you can figure out where exactly the failure occurs ...

That's pretty much what I've done, except that the error occurs before mpi_child.py gets called.

If this is too much of a bother, I would say that for 0.3 release, we should prioritize:

  1. The GUI ... did you figure out the issue with the titles?

You mean when running it on Windows? If so, it was just a weird dependency problem that I resolved by creating a new conda environment.

  1. Release Windows without MPI support for now since it is more for cluster set ups which will likely be running Linux

This doesn't make much sense to me since the vast majority of Windows users are new users or workshop participants who will need to run single-trial simulations in a reasonable amount of time (e.g., when running the tutorials synchronously).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MPIBackend error on Windows 10: "Unknown option: --use-hwthread-cpus"
4 participants