Skip to content

Conversation

@victorges
Copy link
Member

@victorges victorges commented Dec 17, 2024

Changes

  • Add a healthcheck instead of relying on process (exiting and) exit code
  • Kill the process once the healthcheck fails
  • Allow runner to be reused after the process exits (on the happy path the sub-process will still exit)
  • Improve Trickle clients shutdown logic

Runner lifecycle

The high level description of the new behavior is:

  • When starting a managed container, the worker will start the watch container routine
  • That routine will not check the docker state but instead call the /health endpoint periodically (5s)
  • Runner API handles that endpoint and forwards to the running pipeline, which defaults to returning an OK
  • The live video pipeline has some custom logic instead that proxies the call to the infer.py process (if it's running)
  • So the runner health is:
    • IDLE if no process is running
    • ERROR if it's running but state is OFFLINE, or
    • OK otherwise
  • On the worker watch container logic:
    • If the runner is OK, keep going
    • If the runner is IDLE, return the container to the pool so another stream can use it!
    • If the runner is ERROR (or there's any other error getting the health), kill the container after 2 errors.

Notice that the infer.py process will automatically kill itself when the input stream stops, so that means we can now reuse containers! After input stream ends or input frames stop for 60s, the infer.py process will kill itself which means the runner API will start returning IDLE and then the worker will eventually put it back in the pool for other requests.

@victorges victorges requested a review from rickstaa December 17, 2024 01:58
@victorges victorges force-pushed the vg/feat/healthcheck-monitoring branch from bd13ec0 to 57f34a2 Compare December 17, 2024 20:45
@victorges
Copy link
Member Author

Status: Made sure the basic workflow runs successfully. Pending to test some failure scenarios to ensure things are good (and better than before)

@victorges victorges force-pushed the vg/feat/healthcheck-monitoring branch from 8eeeb1e to 9027415 Compare December 18, 2024 19:12
@victorges victorges marked this pull request as ready for review December 19, 2024 00:54
@victorges victorges merged commit c19289d into main Dec 19, 2024
12 checks passed
@victorges victorges deleted the vg/feat/healthcheck-monitoring branch December 19, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants