script hangs when using unbuffered output #269

valerio-marra · 2021-12-19T22:07:50Z

General information

Corrfunc version: 2.4.0
platform: cluster with CentOS 7.4.1708
installation method (pip/source/other?): first pip, then source

Issue description

i’m running Corrfunc on a simulation snapshot: i’m computing the angular correlation function in thin shells using DDtheta_mocks. I first installed Corrfunc via pip, then, in order to increase performance, via source:
$ git clone https://github.com/manodeep/Corrfunc/
$ make
$ make install
$ python -m pip install . --user
$ make tests

However, while with the pip install the code would parallelize on multiple threads, now it runs mostly on one thread. I’m submitting my SLURM job via:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --mem=480000
#SBATCH --exclusive
[…]
export OMP_NUM_THREADS=48
srun -n 1 python -u $PY_CODE > $LOGS

Expected behavior

To run on 48 threads at ~100%.

Actual behavior

To mostly run on 1 thread. I checked with htop.

What have you tried so far?

To re-install it from source.

Minimal failing example

i’m attaching the log file corrfunc-logs.txt, including the 'make tests'.

The text was updated successfully, but these errors were encountered:

lgarrison · 2021-12-19T22:33:42Z

Hi, @valerio-marra, looking at your log, it seems to build okay (using OpenMP), so maybe it's an affinity issue. Does the behavior change when you specify #SBATCH -c 48? I would have thought that --exclusive would have taken care of that, but you never know... Maybe try passing -c in srun -c 48 -n1 python -u $PY_CODE [...] as well.

Can you also double-check that the OMP_PROC_BIND and OMP_PLACES bash environment variables are unset? Setting OMP_DISPLAY_ENV=TRUE will print their values when the application starts and can help debug.

Does the parallelism work locally, and fail through Slurm? Or does it always run single-threaded?

If the parallelism worked with pip installation but not from source, and you're invoking it with the exact same Slurm script, then it might be an OpenMP library issue, e.g. python is linked against one OpenMP and Corrfunc another. Since you're using Anaconda, you can try to build Corrfunc with Anaconda's compilers instead:

$ conda install gcc_linux-64
$ cd Corrfunc/
$ make distclean
$ CC=x86_64-conda_cos6-linux-gnu-gcc make  # or better yet edit CC in common.mk
$ pip install -e ./

The name x86_64-conda_cos6-linux-gnu-gcc might be different on your platform. I think installing the conda compiler package is actually supposed to alias gcc to the conda compiler; you can check.

valerio-marra · 2021-12-20T17:31:10Z

hi, thanks! i tried what you suggested and nothing worked. Actually, i assumed the code was working in single thread (and killed it because it was taking too long) but when i call DDtheta_mocks, it just keeps running without doing anything. I reduced the number of particles to 10**4 and it does not produce anything (it takes a few seconds on my laptop).

When i first installed Corrfunc via pip, it was working. I tried to re-install it via pip, but it does not work anymore (when i call DDtheta_mocks, it just keeps running without doing anything). I think the problem is that it is loaded the old system gcc instead of the conda one (that i installed as you suggested).

lgarrison · 2021-12-20T17:47:55Z

It might be running, but just very very slowly because 48 threads are fighting for one core. If you run it with DDtheta_mocks(..., nthreads=1), does it complete? Adding verbose=True ought to give a progress bar.

I also just realized I got the syntax wrong for the make command, it should be:

$ make CC=x86_64-conda_cos6-linux-gnu-gcc

You may have realized this already if you saw Corrfunc was still building with gcc instead of that long compiler name.

manodeep · 2021-12-21T05:00:24Z

Thanks @lgarrison

For a one-line solution (may be in modern enough pip versions?), you can use the install-option parameter -- python -m pip install --install-option="CC=x86_64-conda_cos6-linux-gnu-gcc" -e . --verbose

valerio-marra · 2021-12-21T12:43:08Z

Hi @lgarrison, regarding the compilation, indeed it was using the system gcc, but i edited CC in common.mk.
I’m attaching the compilation logs, it gave this warnings:
../common.mk:371: DISABLING AVX-512 SUPPORT DUE TO GNU ASSEMBLER BUG. UPGRADE TO BINUTILS >=2.32 TO FIX THIS.
How can I update BINUTILS?

Regarding verbose=True, I’ve been using it and it works on my laptop but when i run the script via slurm it does not show anything. Again, it seems that DDtheta_mocks just keeps running without doing anything.

Regarding OMP_DISPLAY_ENV=TRUE, i’m attaching the logs.

OPENMP DISPLAY ENVIRONMENT BEGIN
  _OPENMP = '201511'
  OMP_DYNAMIC = 'FALSE'
  OMP_NESTED = 'FALSE'
  OMP_NUM_THREADS = '48'
  OMP_SCHEDULE = 'DYNAMIC'
  OMP_PROC_BIND = 'FALSE'
  OMP_PLACES = ''
  OMP_STACKSIZE = '0'
  OMP_WAIT_POLICY = 'PASSIVE'
  OMP_THREAD_LIMIT = '4294967295'
  OMP_MAX_ACTIVE_LEVELS = '2147483647'
  OMP_CANCELLATION = 'FALSE'
  OMP_DEFAULT_DEVICE = '0'
  OMP_MAX_TASK_PRIORITY = '0'
  OMP_DISPLAY_AFFINITY = 'FALSE'
  OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
OPENMP DISPLAY ENVIRONMENT END

Regarding running with nthreads=1, same as before: DDtheta_mocks just keeps running without doing anything.

lgarrison · 2021-12-21T13:07:14Z

I think I'm running out of ideas, other than to try yet more compilers and/or Python stacks. If your cluster has other compilers (e.g. clang, icc, other versions of gcc) available via modules (module load clang...), that would probably be the easiest way to try. Same with different Python environments (e.g. try a clean conda environment, or a non-conda environment if you have module load python).

Another thought: if you have a cluster where the submission nodes might have a different architecture than the compute nodes, make sure you build on the compute nodes.

If you want to confirm that the issue is OpenMP related, you can disable OpenMP support by commenting out OPT += -DUSE_OMP in common.mk.

I wouldn't worry about the binutils bug for now, it's secondary to the code running at all.

@manodeep do you have any ideas?

valerio-marra · 2021-12-21T13:12:42Z

thanks, @lgarrison , i'll try that (I already tried using a clean conda environment).

Could it be that the uninstalled pip version is still being called? Otherwise, why make tests is successful?

One more thing: you said "double-check that the OMP_PROC_BIND and OMP_PLACES bash environment variables are unset" but it seems that it is OMP_PROC_BIND = 'FALSE'. Is this a problem?

lgarrison · 2021-12-21T13:33:31Z

Oh, that's true, I didn't read the logs carefully enough! I just assumed the C tests were passing, but it looks like the Python tests are passing too. Maybe the issue is exactly what you suggested, and you're installing in one environment and running in another. Make sure to repeat pip uninstall Corrfunc until no more installations are remaining. Don't run it from inside the Corrfunc source directory. Then reinstall in a fresh environment, and make sure it's loaded when you are running your Python script. Use print(Corrfunc.__file__) to see what installation is being used. (This is all just general advice for managing Python packages, nothing here is specific to Corrfunc.)

OMP_PROC_BIND = 'FALSE' is fine, that's the same as unset.

valerio-marra · 2021-12-21T14:05:15Z

@lgarrison , @manodeep i found the problem: if i set verbose=False then it works! I was always using verbose=True. Is this a bug or a compilation issue?

lgarrison · 2021-12-21T14:08:39Z

Wow, that's pretty unusual! Glad it's working. I see you were running with Python in unbuffered mode with python -u $CODE, does verbose=True work if you remove the -u?

I will note we've seen one other instance of verbose causing problems here: #224, but it still seems to be a rare problem.

valerio-marra · 2021-12-21T15:27:29Z

@lgarrison if i remove -u it does work, although it updates the log file with low cadence and, actually, it does not print the info that verbose=True usually prints, that is, it is as if i set verbose=False. Does verbose=True work only in interactive mode?

Should I fix the binutils bug to increase performance? I'll run my code on hundreds of snapshots.

lgarrison · 2021-12-21T18:51:17Z

Okay, I think I might understand the root cause here. It's probably this issue: minrk/wurlitzer#20

Specifically, we're filling up some buffer (or perhaps even blocking while trying to do an unbuffered write), but the code that drains the buffer (in Wurlitzer) is at the Python level. And that code can't run because we don't release the GIL when we call into Corrfunc.

I'll need to think about the right way to fix this. Releasing the GIL is probably something we ought to be doing anyway, although it will need to be tested. In addition, it's possible that we're not doing the output redirection in the simplest/most robust way.

On binutils/AVX-512, if you want the extra performance (usually a factor of < 2x), your best bet is if you can find another compiler stack to use, like clang, icc, or a more modern gcc. If one is not readily available, you can try to install one from scratch, although at that point it might not be worth your time! If you're feeling brave here are instructions that worked at least once: #196 (comment)

lgarrison · 2021-12-21T20:26:05Z

Oops, actually we are releasing the GIL. In which case I'm not exactly sure what's happening. Will investigate...

#269.

lgarrison · 2021-12-21T21:48:47Z

Hi @valerio-marra, can you please check if PR #270 fixes your issue? Just test your same code on the fix-std-redir branch.

valerio-marra · 2021-12-22T13:47:30Z

hi @lgarrison, it works! Now the verbose output is printed into the slurm job's standard error.

Regarding binutils/AVX-512, is it necessary to create e new environment? also, shouldn't there be a make before pip in #196 (comment)?

lgarrison · 2021-12-22T14:55:14Z

It probably has the best chance of success in a new environment (and it has the least chance of disrupting any of your other work that uses an existing environment).

Pip runs make behind the scenes, so an explicit make is not necessary.

lgarrison · 2021-12-22T14:56:00Z

And thanks very much for confirming the fix! @manodeep, when you have a chance could you please review #270? Then we can close this issue.

lgarrison added a commit that referenced this issue Dec 21, 2021

Add additional check to tell if it's safe to redirect stdout/err. Closes

78ec4b8

#269.

lgarrison mentioned this issue Dec 21, 2021

Add additional check to tell if it's safe to redirect stdout/err #270

Merged

lgarrison changed the title ~~OpenMP not working when installing from source~~ script hangs when using unbuffered output Dec 22, 2021

manodeep closed this as completed in 68973d0 Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

script hangs when using unbuffered output #269

script hangs when using unbuffered output #269

valerio-marra commented Dec 19, 2021

lgarrison commented Dec 19, 2021 •

edited

Loading

valerio-marra commented Dec 20, 2021

lgarrison commented Dec 20, 2021

manodeep commented Dec 21, 2021

valerio-marra commented Dec 21, 2021

lgarrison commented Dec 21, 2021

valerio-marra commented Dec 21, 2021

lgarrison commented Dec 21, 2021

valerio-marra commented Dec 21, 2021

lgarrison commented Dec 21, 2021

valerio-marra commented Dec 21, 2021

lgarrison commented Dec 21, 2021

lgarrison commented Dec 21, 2021

lgarrison commented Dec 21, 2021

valerio-marra commented Dec 22, 2021

lgarrison commented Dec 22, 2021

lgarrison commented Dec 22, 2021

script hangs when using unbuffered output #269

script hangs when using unbuffered output #269

Comments

valerio-marra commented Dec 19, 2021

General information

Issue description

Expected behavior

Actual behavior

What have you tried so far?

Minimal failing example

lgarrison commented Dec 19, 2021 • edited Loading

valerio-marra commented Dec 20, 2021

lgarrison commented Dec 20, 2021

manodeep commented Dec 21, 2021

valerio-marra commented Dec 21, 2021

lgarrison commented Dec 21, 2021

valerio-marra commented Dec 21, 2021

lgarrison commented Dec 21, 2021

valerio-marra commented Dec 21, 2021

lgarrison commented Dec 21, 2021

valerio-marra commented Dec 21, 2021

lgarrison commented Dec 21, 2021

lgarrison commented Dec 21, 2021

lgarrison commented Dec 21, 2021

valerio-marra commented Dec 22, 2021

lgarrison commented Dec 22, 2021

lgarrison commented Dec 22, 2021

lgarrison commented Dec 19, 2021 •

edited

Loading