Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault cv2.abi3.so #92

Open
gregbugaj opened this issue Nov 28, 2023 · 0 comments
Open

segfault cv2.abi3.so #92

gregbugaj opened this issue Nov 28, 2023 · 0 comments

Comments

@gregbugaj
Copy link
Collaborator

Describe the bug

Getting an segfault while running marie server. This is problematic as it creates a defunct aka zombie process cause the kernel to leave a task stuck in uninterruptible "D" state. A task/process in that state cannot be killed kill -9.

log output from dmesg

[227392.162828] perf: interrupt took too long (2510 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[240010.209731] marie[111619]: segfault at 7f2987800b00 ip 00007f30a0f7895b sp 00007f2a655f71d0 error 4 in cv2.abi3.so[7f30a0745000+2f4f000] likely on CPU 10 (core 20, socket 0)
[240010.209740] Code: 48 63 4d 00 48 8b 7c 24 08 89 da 44 8d 43 01 49 03 7f 28 48 8b 47 18 48 8b b7 d0 00 00 00 85 c9 0f 8e a1 16 00 00 4d 8b 4f 18 <45> 8b 34 89 48 8b 4c 24 20 80 3c 19 00 0f 85 52 fe ff ff c7 45 00
[240215.295317] INFO: task marie:110689 blocked for more than 120 seconds.
[240215.295324]       Tainted: P           OE      6.2.0-37-generic #38~22.04.1-Ubuntu
[240215.295326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[240215.295328] task:marie           state:D stack:0     pid:110689 ppid:110658 flags:0x00000002
[240215.295331] Call Trace:
[240215.295332]  <TASK>
[240215.295335]  __schedule+0x2b7/0x5f0
[240215.295339]  schedule+0x68/0x110
[240215.295341]  do_exit+0xf3/0x6c0
[240215.295343]  do_group_exit+0x35/0x90
[240215.295347]  get_signal+0x8a5/0x8d0
[240215.295349]  ? __f_unlock_pos+0x12/0x20
[240215.295352]  arch_do_signal_or_restart+0x2a/0x120
[240215.295355]  ? exit_to_user_mode_prepare+0x3b/0xd0
[240215.295357]  exit_to_user_mode_loop+0xaf/0x140
[240215.295358]  exit_to_user_mode_prepare+0xb9/0xd0
[240215.295359]  irqentry_exit_to_user_mode+0x9/0x20
[240215.295361]  irqentry_exit+0x43/0x50
[240215.295363]  sysvec_reschedule_ipi+0x7b/0x120
[240215.295365]  asm_sysvec_reschedule_ipi+0x1b/0x20
[240215.295367] RIP: 0033:0x5634e4bfe3d3

Describe how you solve it


Environment

PIP versions of opencv

marie# pip list | grep opencv
opencv-python                                4.8.1.78
opencv-python-headless                       4.8.1.78
root@asp-gpu032:/marie# marie --version-full
UserWarning: multiprocessing start method is set to `fork` (raised from /opt/venv/lib/python3.10/site-packages/marie/__init__.py:75)
- marie 3.0.22
- docarray 0.39.1
- jcloud not-available
- jina-hubble-sdk v0.0.0
- marie-proto 0.1.27
- protobuf 3.20.2
- proto-backend cpp
- grpcio 1.47.5
- pyyaml 6.0.1
- python 3.10.12
- platform Linux
- platform-release 6.2.0-37-generic
- platform-version #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 2 18:01:13 UTC 2
- architecture x86_64
- processor x86_64
- uid 88974383001391
- session-id 7017dca0-8e09-11ee-a1e0-50ebf67e2f2f
- uptime 2023-11-28T16:16:08.016883
- ci-vendor (unset)
- internal False
* JINA_DEFAULT_HOST (unset)
* JINA_DEFAULT_TIMEOUT_CTRL (unset)
* JINA_DEPLOYMENT_NAME (unset)
* JINA_DISABLE_UVLOOP (unset)
* JINA_EARLY_STOP (unset)
* JINA_FULL_CLI (unset)
* JINA_GATEWAY_IMAGE (unset)
* JINA_GRPC_RECV_BYTES (unset)
* JINA_GRPC_SEND_BYTES (unset)
* JINA_HUB_NO_IMAGE_REBUILD (unset)
* JINA_LOG_CONFIG (unset)
* JINA_LOG_LEVEL DEBUG
* JINA_LOG_NO_COLOR (unset)
* JINA_MP_START_METHOD fork
* JINA_OPTOUT_TELEMETRY (unset)
* JINA_RANDOM_PORT_MAX (unset)
* JINA_RANDOM_PORT_MIN (unset)
* JINA_LOCKS_ROOT (unset)
* JINA_K8S_ACCESS_MODES (unset)
* JINA_K8S_STORAGE_CLASS_NAME (unset)
* JINA_K8S_STORAGE_CAPACITY (unset)
* JINA_STREAMER_ARGS (unset)

This could be possibly related to error seen in the logs

gbugaj@asp-gpu032:~$ docker logs marieai-dev-server-corr  | grep 'Exception' -A 10 | head
Exception ignored when trying to write to the signal wakeup fd:
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 115, in _read_from_self
    data = self._ssock.recv(4096)
BlockingIOError: [Errno 11] Resource temporarily unavailable
Exception ignored when trying to write to the signal wakeup fd:
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 115, in _read_from_self
    data = self._ssock.recv(4096)
BlockingIOError: [Errno 11] Resource temporarily unavailable

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant