feat: initial watchdog implementation #1341

mudler · 2023-11-25T23:01:58Z

Description

This PR fixes #1339 and fixes #1202. Besides should alleviate also issues like #1017 and ggerganov/llama.cpp#3969 once for good

The WatchDog implementation (disabled by default) is designed to monitor and manage multiple backends. It keeps track of the last active times and idle times of each backend, and can stop them if a backend has been busy or idle for too long.

Key components of the WatchDog struct include:

timetable: A map that stores the last active time of each backend.
idleTime: A map that stores the last idle time of each backend.
timeout and idletimeout: Duration values that represent the maximum allowed busy and idle times for a backend, respectively.

To turn on the watchdog, configure the following environment variables:

### Watchdog settings
###
# Enables watchdog to kill backends that are inactive for too much time
# WATCHDOG_IDLE=true
#
# Enables watchdog to kill backends that are busy for too much time
# WATCHDOG_BUSY=true
#
# Time in duration format (e.g. 1h30m) after which a backend is considered idle
# WATCHDOG_IDLE_TIMEOUT=5m
#
# Time in duration format (e.g. 1h30m) after which a backend is considered busy
# WATCHDOG_BUSY_TIMEOUT=5m

With the CLI: --enable-watchdog-idle, --enable-watchdog-busy, --watchdog-busy-timeout, --watchdog-idle-timeout.

Notes for Reviewers

Signed commits

Yes, I signed my commits.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

netlify · 2023-11-25T23:02:29Z

✅ Deploy Preview for localai canceled.

Name	Link
🔨 Latest commit	`de1abfc`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/65635d914fee5d0008fbbd2b

dave-gray101 · 2023-11-25T23:17:06Z

Can you give a quick example of how this is supposed to interact with multiple backends? Currently, it looks like the timeout setting isn't backend specific, which is probably fine for now - but since I'd like to leverage this to improve the monitoring endpoints, I want to make sure I understand when the timer is being reset and what that interval is measuring. Thanks!

mudler · 2023-11-26T09:01:36Z

Can you give a quick example of how this is supposed to interact with multiple backends? Currently, it looks like the timeout setting isn't backend specific, which is probably fine for now - but since I'd like to leverage this to improve the monitoring endpoints, I want to make sure I understand when the timer is being reset and what that interval is measuring. Thanks!

It currently monitor all active connections, connections are recorded by the GRPC client, and when a backend becomes busy (starts processing a request), the current time is recorded in timetable for that backend. If the backend remains busy for longer than timeout, an action (like logging a warning or shutting down the backend) could be triggered (like now stops the backend directly).

Similarly, when a backend becomes idle (finishes processing a request), the current time is recorded in idleTime for that backend. If the backend remains idle for longer than idletimeout, it gets killed. This was asked in #1202 and took the occasion to implement it here as most of the logic applies to as well.

At the moment is possible to define timeout durations, enable and/or disable it (defaults to disabled), keeping it very simple to have a starting point.

mudler · 2023-11-26T09:45:16Z

one enhancement for later: the current implementation - if a backend is stale - will cut the request. it should be possible instead to keep it alive and try again after the backend was shutdown

dave-gray101 · 2023-11-26T14:56:33Z

It currently monitor all active connections, connections are recorded by the GRPC client, and when a backend becomes busy (starts processing a request), the current time is recorded in timetable for that backend. If the backend remains busy for longer than timeout, an action (like logging a warning or shutting down the backend) could be triggered (like now stops the backend directly).

Similarly, when a backend becomes idle (finishes processing a request), the current time is recorded in idleTime for that backend. If the backend remains idle for longer than idletimeout, it gets killed. This was asked in #1202 and took the occasion to implement it here as most of the logic applies to as well.

At the moment is possible to define timeout durations, enable and/or disable it (defaults to disabled), keeping it very simple to have a starting point.

Thanks for confirming that Mudler! That's pretty close to what I thought but it's good to check. The one feature request I have (even if it's not in the very first pr) is to expose that timetable from Watchdog, so that the monitoring endpoints can dig up data like when a backend was last used. Thanks!!

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler · 2023-11-26T15:03:01Z

It currently monitor all active connections, connections are recorded by the GRPC client, and when a backend becomes busy (starts processing a request), the current time is recorded in timetable for that backend. If the backend remains busy for longer than timeout, an action (like logging a warning or shutting down the backend) could be triggered (like now stops the backend directly).
Similarly, when a backend becomes idle (finishes processing a request), the current time is recorded in idleTime for that backend. If the backend remains idle for longer than idletimeout, it gets killed. This was asked in #1202 and took the occasion to implement it here as most of the logic applies to as well.
At the moment is possible to define timeout durations, enable and/or disable it (defaults to disabled), keeping it very simple to have a starting point.

Thanks for confirming that Mudler! That's pretty close to what I thought but it's good to check. The one feature request I have (even if it's not in the very first pr) is to expose that timetable from Watchdog, so that the monitoring endpoints can dig up data like when a backend was last used. Thanks!!

make totally sense, not exposing it now as it would not be used in the code and would be confusing but should be easy to iterate on it

mudler · 2023-11-26T15:06:06Z

I'm not super-satisfied, but it's ok for a first stab at it. It works locally, and it's disabled by default, so should be good to go

feat: initial watchdog implementation

f1e4ecc

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

mudler and others added 4 commits November 26, 2023 00:04

fiuxups

f64c1fa

Add more output

6ca4be1

wip: idletime checker

c6c671b

wire idle watchdog checks

816d1be

enlarge watchdog time window

4c0c6e0

mudler added the enhancement New feature or request label Nov 26, 2023

mudler mentioned this pull request Nov 26, 2023

Allow grpc backend services to exit after idling for a while #1202

Closed

mudler added 2 commits November 26, 2023 14:50

small fixes

1d5b496

Use stopmodel

41fd3ef

mudler force-pushed the watchdog branch from 5dc7da4 to 41fd3ef Compare November 26, 2023 14:48

Always delete process

de1abfc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler merged commit 824612f into master Nov 26, 2023
24 checks passed

mudler deleted the watchdog branch November 26, 2023 17:36

mudler mentioned this pull request Dec 1, 2023

Reload a model in VRAM #892

Closed

v3DJG6GL mentioned this pull request Feb 15, 2024

Possibility to unload/reload model from VRAM/RAM after IDLE timeout ahmetoner/whisper-asr-webservice#196

Open

thfrei mentioned this pull request Apr 14, 2024

CUDA Memory - GRPCs do not get reused or alternatively removed #1729

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: initial watchdog implementation #1341

feat: initial watchdog implementation #1341

mudler commented Nov 25, 2023 •

edited

netlify bot commented Nov 25, 2023 •

edited

dave-gray101 commented Nov 25, 2023

mudler commented Nov 26, 2023

mudler commented Nov 26, 2023

dave-gray101 commented Nov 26, 2023

mudler commented Nov 26, 2023

mudler commented Nov 26, 2023

feat: initial watchdog implementation #1341

feat: initial watchdog implementation #1341

Conversation

mudler commented Nov 25, 2023 • edited

netlify bot commented Nov 25, 2023 • edited

✅ Deploy Preview for localai canceled.

dave-gray101 commented Nov 25, 2023

mudler commented Nov 26, 2023

mudler commented Nov 26, 2023

dave-gray101 commented Nov 26, 2023

mudler commented Nov 26, 2023

mudler commented Nov 26, 2023

mudler commented Nov 25, 2023 •

edited

netlify bot commented Nov 25, 2023 •

edited