Description
IOHandler._render in iohandler.py temporarily replaces sys.stderr with io.StringIO() to capture Chevron/Mustache rendering warnings. This is not thread-safe — when multiple workflows execute concurrently, one thread can restore sys.stderr to the real TextIOWrapper before another thread calls .getvalue(), causing:
'_io.TextIOWrapper' object has no attribute 'getvalue'
The code has a TODO acknowledging this:
# TODO: protect from multithreaded where another thread will print to stderr, but thats a very rare case and we shouldn't care much
How to reproduce
- Create a workflow triggered by
incident.created
- Have a correlation rule that groups alerts into incidents
- Restart AlertManager (or send a batch of alerts) so multiple incidents are created simultaneously
- Multiple workflow executions start in parallel, and some fail with the error above
Impact
- Workflow execution fails entirely — no steps run
- Incidents are left without AI-generated titles/summaries
- In our case, ~25% of concurrent workflow executions fail
Suggested fix
Replace global sys.stderr substitution with contextlib.redirect_stderr or a thread-local buffer:
import threading
_stderr_local = threading.local()
def _render(self, key, safe=False, default="", additional_context=None):
...
_stderr_local.buffer = io.StringIO()
original_stderr = sys.stderr
sys.stderr = _stderr_local.buffer
try:
rendered = self.render_recursively(key, context)
rendered = rendered.replace(""", '"')
stderr_output = _stderr_local.buffer.getvalue()
finally:
sys.stderr = original_stderr
...
Environment
- Keep version: 0.48.1
- Chart version: 0.1.94
- Python: 3.13
Description
IOHandler._renderiniohandler.pytemporarily replacessys.stderrwithio.StringIO()to capture Chevron/Mustache rendering warnings. This is not thread-safe — when multiple workflows execute concurrently, one thread can restoresys.stderrto the realTextIOWrapperbefore another thread calls.getvalue(), causing:The code has a TODO acknowledging this:
# TODO: protect from multithreaded where another thread will print to stderr, but thats a very rare case and we shouldn't care muchHow to reproduce
incident.createdImpact
Suggested fix
Replace global
sys.stderrsubstitution withcontextlib.redirect_stderror a thread-local buffer:Environment