Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: Runner object has no attribute _health_check_task while shutting down #1763

Closed
rapsealk opened this issue Dec 7, 2023 · 0 comments · Fixed by #1764
Closed
Assignees
Labels
platform:general General platform issues. Most issues are general. type:bug Reports about that are not working
Milestone

Comments

@rapsealk
Copy link
Member

rapsealk commented Dec 7, 2023

What Operating System(s) are you seeing this problem on?

macOS (Apple Silicon)

Backend.AI version

9b18ef3

Describe the bug

I found this log while shutting down a Python kernel on Backend.AI. The kernel seems to have been terminated with no problem.

If _health_check_task is only for model services, I suggest to use getattr() for safety.

if started:
if model_service_info.get("health_check"):
self._health_check_task = asyncio.create_task(
self.check_model_health(model_info["name"], model_service_info)
)
else:
await self.outsock.send_multipart(
[
b"model-service-status",
json.dumps(
{"model_name": model_info["name"], "is_healthy": True}
).encode("utf8"),
]
)

async def _shutdown(self) -> None:
try:
self.insock.close()
log.debug("shutting down...")
self._run_task.cancel()
self._main_task.cancel()
await self._run_task
await self._main_task
if self._health_check_task:
self._health_check_task.cancel()
await self._health_check_task
log.debug("terminating service processes...")

python-kernel: shutting down python kernel...
Traceback (most recent call last):
  File "/opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/base.py", line 254, in _shutdown
    if self._health_check_task:
       ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Runner' object has no attribute '_health_check_task'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/__main__.py", line 65, in <module>
    main(args)
  File "/opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/__main__.py", line 55, in main
    asyncio_run_forever(
  File "/opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/compat.py", line 93, in asyncio_run_forever
    loop.run_until_complete(shutdown_coro)
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/base.py", line 275, in _shutdown
    await self._log_task
  File "/opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/base.py", line 984, in _handle_logs
    await self.outsock.send_multipart(rec)
  File "/opt/backend.ai/lib/python3.11/site-packages/zmq/_future.py", line 513, in _add_send_event
    r = send(msg, **nowait_kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/backend.ai/lib/python3.11/site-packages/zmq/sugar/socket.py", line 743, in send_multipart
    self.send(msg, zmq.SNDMORE | flags, copy=copy, track=track)
  File "/opt/backend.ai/lib/python3.11/site-packages/zmq/sugar/socket.py", line 688, in send
    return super().send(data, flags=flags, copy=copy, track=track)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "zmq/backend/cython/socket.pyx", line 742, in zmq.backend.cython.socket.Socket.send
  File "zmq/backend/cython/socket.pyx", line 789, in zmq.backend.cython.socket.Socket.send
  File "zmq/backend/cython/socket.pyx", line 255, in zmq.backend.cython.socket._send_copy
  File "zmq/backend/cython/socket.pyx", line 250, in zmq.backend.cython.socket._send_copy
  File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Socket operation on non-socket
python-kernel: Task was destroyed but it is pending!
task: <Task pending name='Task-16' coro=<BaseRunner._wait_service_proc() running at /opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/base.py:872> wait_for=<Future pending cb=[Task.task_wakeup()]>>
--- Logging error ---
Traceback (most recent call last):
  File "/opt/backend.ai/lib/python3.11/logging/handlers.py", line 1498, in emit
    self.enqueue(self.prepare(record))
                 ^^^^^^^^^^^^^^^^^^^^
  File "/opt/backend.ai/lib/python3.11/logging/handlers.py", line 1482, in prepare
    record = copy.copy(record)
             ^^^^^^^^^^^^^^^^^
  File "/opt/backend.ai/lib/python3.11/copy.py", line 92, in copy
    rv = reductor(4)
         ^^^^^^^^^^^
ImportError: sys.meta_path is None, Python is likely shutting down
Call stack:
Logged from file __init__.py, line 1518
Message: "Task was destroyed but it is pending!\ntask: <Task pending name='Task-16' coro=<BaseRunner._wait_service_proc() running at /opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/base.py:872> wait_for=<Future pending cb=[Task.task_wakeup()]>>"
Arguments: ()
python-kernel: Task was destroyed but it is pending!
task: <Task pending name='Task-18' coro=<BaseRunner._wait_service_proc() running at /opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/base.py:872> wait_for=<Future pending cb=[Task.task_wakeup()]>>
--- Logging error ---
Traceback (most recent call last):
  File "/opt/backend.ai/lib/python3.11/logging/handlers.py", line 1498, in emit
    self.enqueue(self.prepare(record))
                 ^^^^^^^^^^^^^^^^^^^^
  File "/opt/backend.ai/lib/python3.11/logging/handlers.py", line 1482, in prepare
    record = copy.copy(record)
             ^^^^^^^^^^^^^^^^^
  File "/opt/backend.ai/lib/python3.11/copy.py", line 92, in copy
    rv = reductor(4)
         ^^^^^^^^^^^
ImportError: sys.meta_path is None, Python is likely shutting down
Call stack:
Logged from file __init__.py, line 1518
Message: "Task was destroyed but it is pending!\ntask: <Task pending name='Task-18' coro=<BaseRunner._wait_service_proc() running at /opt/backend.ai/lib/python3.11/site-packages/ai/backend/kernel/base.py:872> wait_for=<Future pending cb=[Task.task_wakeup()]>>"
Arguments: ()

To Reproduce

Run a single python session.

Expected Behavior

Terminated without such log.

Anything else?

No response

@rapsealk rapsealk added type:bug Reports about that are not working platform:general General platform issues. Most issues are general. labels Dec 7, 2023
@rapsealk rapsealk added this to the 23.09 milestone Dec 7, 2023
@rapsealk rapsealk self-assigned this Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:general General platform issues. Most issues are general. type:bug Reports about that are not working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant