-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
script hangs when using unbuffered output #269
Comments
Hi, @valerio-marra, looking at your log, it seems to build okay (using OpenMP), so maybe it's an affinity issue. Does the behavior change when you specify Can you also double-check that the Does the parallelism work locally, and fail through Slurm? Or does it always run single-threaded? If the parallelism worked with
The name |
hi, thanks! i tried what you suggested and nothing worked. Actually, i assumed the code was working in single thread (and killed it because it was taking too long) but when i call DDtheta_mocks, it just keeps running without doing anything. I reduced the number of particles to 10**4 and it does not produce anything (it takes a few seconds on my laptop). When i first installed Corrfunc via pip, it was working. I tried to re-install it via pip, but it does not work anymore (when i call DDtheta_mocks, it just keeps running without doing anything). I think the problem is that it is loaded the old system gcc instead of the conda one (that i installed as you suggested). |
It might be running, but just very very slowly because 48 threads are fighting for one core. If you run it with I also just realized I got the syntax wrong for the
You may have realized this already if you saw Corrfunc was still building with gcc instead of that long compiler name. |
Thanks @lgarrison For a one-line solution (may be in modern enough |
Hi @lgarrison, regarding the compilation, indeed it was using the system gcc, but i edited CC in common.mk. Regarding Regarding
Regarding running with |
I think I'm running out of ideas, other than to try yet more compilers and/or Python stacks. If your cluster has other compilers (e.g. clang, icc, other versions of gcc) available via modules ( Another thought: if you have a cluster where the submission nodes might have a different architecture than the compute nodes, make sure you build on the compute nodes. If you want to confirm that the issue is OpenMP related, you can disable OpenMP support by commenting out I wouldn't worry about the binutils bug for now, it's secondary to the code running at all. @manodeep do you have any ideas? |
thanks, @lgarrison , i'll try that (I already tried using a clean conda environment). Could it be that the uninstalled pip version is still being called? Otherwise, why One more thing: you said "double-check that the OMP_PROC_BIND and OMP_PLACES bash environment variables are unset" but it seems that it is |
Oh, that's true, I didn't read the logs carefully enough! I just assumed the C tests were passing, but it looks like the Python tests are passing too. Maybe the issue is exactly what you suggested, and you're installing in one environment and running in another. Make sure to repeat
|
@lgarrison , @manodeep i found the problem: if i set |
Wow, that's pretty unusual! Glad it's working. I see you were running with Python in unbuffered mode with I will note we've seen one other instance of |
@lgarrison if i remove Should I fix the binutils bug to increase performance? I'll run my code on hundreds of snapshots. |
Okay, I think I might understand the root cause here. It's probably this issue: minrk/wurlitzer#20 Specifically, we're filling up some buffer (or perhaps even blocking while trying to do an unbuffered write), but the code that drains the buffer (in Wurlitzer) is at the Python level. And that code can't run because we don't release the GIL when we call into Corrfunc. I'll need to think about the right way to fix this. Releasing the GIL is probably something we ought to be doing anyway, although it will need to be tested. In addition, it's possible that we're not doing the output redirection in the simplest/most robust way. On binutils/AVX-512, if you want the extra performance (usually a factor of < 2x), your best bet is if you can find another compiler stack to use, like clang, icc, or a more modern gcc. If one is not readily available, you can try to install one from scratch, although at that point it might not be worth your time! If you're feeling brave here are instructions that worked at least once: #196 (comment) |
Oops, actually we are releasing the GIL. In which case I'm not exactly sure what's happening. Will investigate... |
Hi @valerio-marra, can you please check if PR #270 fixes your issue? Just test your same code on the |
hi @lgarrison, it works! Now the verbose output is printed into the slurm job's standard error. Regarding binutils/AVX-512, is it necessary to create e new environment? also, shouldn't there be a |
It probably has the best chance of success in a new environment (and it has the least chance of disrupting any of your other work that uses an existing environment). Pip runs make behind the scenes, so an explicit make is not necessary. |
General information
Issue description
i’m running Corrfunc on a simulation snapshot: i’m computing the angular correlation function in thin shells using
DDtheta_mocks
. I first installed Corrfunc via pip, then, in order to increase performance, via source:$ git clone https://github.com/manodeep/Corrfunc/
$ make
$ make install
$ python -m pip install . --user
$ make tests
However, while with the pip install the code would parallelize on multiple threads, now it runs mostly on one thread. I’m submitting my SLURM job via:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --mem=480000
#SBATCH --exclusive
[…]
export OMP_NUM_THREADS=48
srun -n 1 python -u $PY_CODE > $LOGS
Expected behavior
To run on 48 threads at ~100%.
Actual behavior
To mostly run on 1 thread. I checked with
htop
.What have you tried so far?
To re-install it from source.
Minimal failing example
i’m attaching the log file corrfunc-logs.txt, including the 'make tests'.
The text was updated successfully, but these errors were encountered: