Summary
Two related lifecycle gaps:
1. No atexit / signal handler
The flush timer is a daemon thread — when the process exits normally (sys.exit(), end of __main__, AWS Lambda invocation completes, SIGTERM in a container), the daemon thread is killed and the current aggregator bucket is lost. Cron jobs, Lambdas, batch scripts, and one-shot CLIs report nothing unless the user manually calls handle.dispose().
2. Dispose during connect leaks FDs
recost/_transport.py:133-146, 199-210's dispose() sets self._running = False and queues a sentinel. If the loop is currently inside websockets.connect(url) (blocking await on TCP connect), the sentinel sits in the queue until the connect either succeeds or hits the OS TCP timeout (~75s on Linux). thread.join(timeout=2.0) returns without joining. The daemon thread + open socket FD leak until process exit. In long-lived processes that call init()/dispose() many times (test suites, Flask dev server with reload), FDs accumulate.
Fix
- In
init(), register atexit.register(_final_flush) and (optionally) signal.signal(SIGTERM, ...). Make handlers idempotent. Respect shutdown_flush_timeout_ms. Provide auto_shutdown_handlers=False opt-out.
- On dispose,
loop.call_soon_threadsafe(loop.stop) and cancel pending tasks. Then join with a timeout derived from shutdown_flush_timeout_ms (currently hardcoded 5s) and log if the thread didn't exit.
Files
recost/_init.py
recost/_transport.py
tests/test_init.py
Priority
P1 — short-lived processes report nothing; long-lived test suites leak FDs.
Summary
Two related lifecycle gaps:
1. No
atexit/ signal handlerThe flush timer is a daemon thread — when the process exits normally (
sys.exit(), end of__main__, AWS Lambda invocation completes, SIGTERM in a container), the daemon thread is killed and the current aggregator bucket is lost. Cron jobs, Lambdas, batch scripts, and one-shot CLIs report nothing unless the user manually callshandle.dispose().2. Dispose during connect leaks FDs
recost/_transport.py:133-146, 199-210'sdispose()setsself._running = Falseand queues a sentinel. If the loop is currently insidewebsockets.connect(url)(blocking await on TCP connect), the sentinel sits in the queue until the connect either succeeds or hits the OS TCP timeout (~75s on Linux).thread.join(timeout=2.0)returns without joining. The daemon thread + open socket FD leak until process exit. In long-lived processes that callinit()/dispose()many times (test suites, Flask dev server with reload), FDs accumulate.Fix
init(), registeratexit.register(_final_flush)and (optionally)signal.signal(SIGTERM, ...). Make handlers idempotent. Respectshutdown_flush_timeout_ms. Provideauto_shutdown_handlers=Falseopt-out.loop.call_soon_threadsafe(loop.stop)and cancel pending tasks. Then join with a timeout derived fromshutdown_flush_timeout_ms(currently hardcoded 5s) and log if the thread didn't exit.Files
recost/_init.pyrecost/_transport.pytests/test_init.pyPriority
P1 — short-lived processes report nothing; long-lived test suites leak FDs.