Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BrokenProcessPool issue (Debian 10, py3.9) #301

Closed
dirknbr opened this issue May 18, 2021 · 11 comments
Closed

BrokenProcessPool issue (Debian 10, py3.9) #301

dirknbr opened this issue May 18, 2021 · 11 comments
Labels

Comments

@dirknbr
Copy link

dirknbr commented May 18, 2021

I get the following error when I build and then sample

I previously was able to run other stan files on the same machine.

Exception in callback handle_create_fit.<locals>._services_call_done({'done': True, 'metadata': {'fit': {'name': 'models/xgkim...fits/wonoxjcj'}}, 'name': 'operations/wonoxjcj', 'result': {'code': 400, 'message': "Exception du...broken)\\n']`", 'status': 'Bad Request'}})(<Task finishe...ble anymore')>) at /home/dnachbar/.local/lib/python3.9/site-packages/httpstan/views.py:367
handle: <Handle handle_create_fit.<locals>._services_call_done({'done': True, 'metadata': {'fit': {'name': 'models/xgkim...fits/wonoxjcj'}}, 'name': 'operations/wonoxjcj', 'result': {'code': 400, 'message': "Exception du...broken)\\n']`", 'status': 'Bad Request'}})(<Task finishe...ble anymore')>) at /home/dnachbar/.local/lib/python3.9/site-packages/httpstan/views.py:367>
Traceback (most recent call last):
  File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/dnachbar/.local/lib/python3.9/site-packages/httpstan/views.py", line 392, in _services_call_done
    httpstan.cache.delete_fit(operation["metadata"]["fit"]["name"])
  File "/home/dnachbar/.local/lib/python3.9/site-packages/httpstan/cache.py", line 140, in delete_fit
    path.unlink()
  File "/usr/lib/python3.9/pathlib.py", line 1343, in unlink
    self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: '/home/dnachbar/.cache/httpstan/4.4.2/models/xgkim2uo/fits/wonoxjcj.jsonlines.lz4'
Traceback (most recent call last):
  File "/home/dnachbar/python/attribution/sim.py", line 79, in <module>
    fit = model.sample(num_samples=200, num_chains=2)
  File "/home/dnachbar/.local/lib/python3.9/site-packages/stan/model.py", line 74, in sample
    return self.hmc_nuts_diag_e_adapt(**kwargs)
  File "/home/dnachbar/.local/lib/python3.9/site-packages/stan/model.py", line 94, in hmc_nuts_diag_e_adapt
    return self._create_fit(kwargs)
  File "/home/dnachbar/.local/lib/python3.9/site-packages/stan/model.py", line 279, in _create_fit
    return asyncio.run(go())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/dnachbar/.local/lib/python3.9/site-packages/stan/model.py", line 209, in go
    raise RuntimeError(operation["result"]["message"])
RuntimeError: Exception during call to services function: `BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.')`, traceback: `['  File "/home/dnachbar/.local/lib/python3.9/site-packages/httpstan/services_stub.py", line 153, in call\n    future.result()\n']`
@dirknbr dirknbr added the bug label May 18, 2021
@riddell-stan
Copy link
Contributor

What operating system and computer architecture are you using?

@dirknbr
Copy link
Author

dirknbr commented May 18, 2021

Unix Debian

@riddell-stan
Copy link
Contributor

What architecture (arm or x86-86)? What version of Debian (buster, bullseye)?

@dirknbr
Copy link
Author

dirknbr commented May 18, 2021

dnachbar@dnachbar:~$ uname -v
#1 SMP Debian 5.10.24-1rodete2 (2021-04-12)

@riddell-stan
Copy link
Contributor

What version of gcc? (gcc --version)

@dirknbr
Copy link
Author

dirknbr commented May 18, 2021

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 10.2.1-6+build2' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-10-i0uLfW/gcc-10-10.2.1/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-i0uLfW/gcc-10-10.2.1/debian/tmp-gcn/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.1 20210110 (Debian 10.2.1-6+build2) 

@amas0
Copy link
Contributor

amas0 commented May 18, 2021

Not sure if it applies in this situation, but I've seen this error before when I've had an httpstan subprocess die and I tried to fit a model without restarting the whole process.

For example, I started sampling from a pystan model in Jupyter, realized I had an issue so I killed the processes that were sampling from the model, and then tried to sample again immediately without restarting the parent process/Jupyter kernel. Since those processes were killed externally from httpstan/pystan, it caused issues, even though the underlying app for httpstan was still running.

Starting a fresh Jupyter kernel/process fixed the issue for me, so it might be worth checking to see if anything at the system level is killing those processes.

@dirknbr
Copy link
Author

dirknbr commented May 18, 2021

Thanks amas0, I actually run this on command line so my kernel should be clean, I could try a reboot

@riddell-stan
Copy link
Contributor

My guess is that this is an old Debian install and the httpstan wheel will not work due to libstc++ issues. I certainly could be wrong.

Ubuntu 18.04 is based on Debian 10 --- it also doesn't work. Version 20.04 is required.

The recommended solution here is to compile httpstan from scratch.

@riddell-stan riddell-stan changed the title BrokenProcessPool issue with py3.9 BrokenProcessPool issue (Debian 10, py3.9) May 19, 2021
@dirknbr
Copy link
Author

dirknbr commented May 21, 2021

I doubt it's an install issue since other stan files run just fine on the same machine

@riddell-stan
Copy link
Contributor

riddell-stan commented May 21, 2021

httpstan wheels from PyPI will not work on Ubuntu 18.04 or Debian 10, even with a more recent gcc. The recommended solution is to build httpstan wheels from scratch and install those. pystan depends on httpstan.

If people keep having this problem, we can create a FAQ item.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants