Skip to content

Comments

Customize SharedMemoryManager subprocess start mode#520

Merged
fpacifici merged 1 commit intomainfrom
fpacifici/spawn_manager
Feb 18, 2026
Merged

Customize SharedMemoryManager subprocess start mode#520
fpacifici merged 1 commit intomainfrom
fpacifici/spawn_manager

Conversation

@fpacifici
Copy link
Contributor

We have encountered scenarios where the SharedMemoryManager subprocess crashes right after the start.
The SharedMemoryManager needs a subprocess to manage the pool.
When it starts it forks the process and waits to hear from it.

We saw this process hanging:

MainThread (135499531520896):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/src/sentry/src/sentry/__main__.py", line 4, in <module>
    main()
  File "/usr/src/sentry/src/sentry/runner/main.py", line 143, in main
    func(**kwargs)
  File "/.venv/lib/python3.13/site-packages/click/core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
  File "/.venv/lib/python3.13/site-packages/click/core.py", line 1363, in main
    rv = self.invoke(ctx)
  File "/.venv/lib/python3.13/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/.venv/lib/python3.13/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/.venv/lib/python3.13/site-packages/click/core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/.venv/lib/python3.13/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "/.venv/lib/python3.13/site-packages/click/decorators.py", line 34, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/src/sentry/src/sentry/runner/decorators.py", line 82, in inner
    return ctx.invoke(f, *args, **kwargs)
  File "/.venv/lib/python3.13/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "/.venv/lib/python3.13/site-packages/click/decorators.py", line 34, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/src/sentry/src/sentry/runner/decorators.py", line 34, in inner
    return ctx.invoke(f, *args, **kwargs)
  File "/.venv/lib/python3.13/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "/usr/src/sentry/src/sentry/runner/commands/run.py", line 613, in basic_consumer
    run_processor_with_signals(
  File "/usr/src/sentry/src/sentry/utils/kafka.py", line 73, in run_processor_with_signals
    processor.run()
  File "/.venv/lib/python3.13/site-packages/arroyo/processing/processor.py", line 351, in run
    self._run_once()
  File "/.venv/lib/python3.13/site-packages/arroyo/processing/processor.py", line 452, in _run_once
    self.__message = self.__consumer.poll(timeout=1.0)
  File "/.venv/lib/python3.13/site-packages/arroyo/backends/kafka/consumer.py", line 467, in poll
    message: Optional[ConfluentMessage] = self.__consumer.poll(
  File "/.venv/lib/python3.13/site-packages/arroyo/backends/kafka/consumer.py", line 374, in assignment_callback
    on_assign(offsets)
  File "/.venv/lib/python3.13/site-packages/arroyo/processing/processor.py", line 50, in wrapper
    return f(*args, **kwargs)
  File "/.venv/lib/python3.13/site-packages/arroyo/processing/processor.py", line 233, in on_partitions_assigned
    _create_strategy(current_partitions)
  File "/.venv/lib/python3.13/site-packages/arroyo/processing/processor.py", line 194, in _create_strategy
    self.__processor_factory.create_with_partitions(
  File "/usr/src/sentry/src/sentry/consumers/__init__.py", line 756, in create_with_partitions
    rv = self.inner.create_with_partitions(commit, partitions)
  File "/usr/src/sentry/src/sentry/consumers/__init__.py", line 747, in create_with_partitions
    return self.inner.create_with_partitions(commit, partitions)
  File "/usr/src/getsentry/getsentry/consumers/outcomes_consumer.py", line 512, in create_with_partitions
    flush_step = run_task_with_multiprocessing(
  File "/usr/src/sentry/src/sentry/utils/arroyo.py", line 211, in run_task_with_multiprocessing
    return ArroyoRunTaskWithMultiprocessing(pool=pool.pool, function=function, **kwargs)
  File "/.venv/lib/python3.13/site-packages/arroyo/processing/strategies/run_task_with_multiprocessing.py", line 566, in __init__
    self.__shared_memory_manager.start()
  File "/usr/local/lib/python3.13/multiprocessing/managers.py", line 569, in start
    self._address = reader.recv()
  File "/usr/local/lib/python3.13/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/local/lib/python3.13/multiprocessing/connection.py", line 430, in _recv_bytes
    buf = self._recv(4)
  File "/usr/local/lib/python3.13/multiprocessing/connection.py", line 395, in _recv
    chunk = read(handle, remaining)

This seems to be caused by the process crashing:

2026-02-17 11:06:30.012 Exception ignored in: <function post_fork at 0x7b3c69931da0>
2026-02-17 11:06:30.012 Traceback (most recent call last):   File "/.venv/lib/python3.13/site-packages/datadog/dogstatsd/base.py", line 119, in post_fork     for c in _instances:   File "/usr/local/lib/python3.13/_weakrefset.py", line 65, in __iter__     for itemref in self.data: RuntimeError: Set changed size during iteration

There seems to be a race condition in the way the datadog agent maanges instances.
If, during the post_fork call, the set of instances changes (subprocess instantiating the sdk) this error can happen.

This PR attempts a fix by allowing us to make SharedMemoryManager initialize its process
by spawning the new interpreter rather than relying on fork. In this scenario we should
not have interactions between different sets of datadog integrations.

@fpacifici fpacifici requested review from a team as code owners February 17, 2026 23:40
@fpacifici fpacifici merged commit 217111e into main Feb 18, 2026
16 checks passed
@fpacifici fpacifici deleted the fpacifici/spawn_manager branch February 18, 2026 17:01
fpacifici added a commit to getsentry/sentry that referenced this pull request Feb 18, 2026
Support  getsentry/arroyo#520.

We single node multi process conwsumer hanging at times while the
SharedMemoryManager
starts. 
It seems the subprocess created by the SharedMemoryManager crashes due
to a race condition
in the datadog integration. See the PR above for reference.

This allows us to configure consumers to use 'spawn' start mode rather
than 'fork'

---------

Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
JonasBa pushed a commit to getsentry/sentry that referenced this pull request Feb 19, 2026
Support  getsentry/arroyo#520.

We single node multi process conwsumer hanging at times while the
SharedMemoryManager
starts. 
It seems the subprocess created by the SharedMemoryManager crashes due
to a race condition
in the datadog integration. See the PR above for reference.

This allows us to configure consumers to use 'spawn' start mode rather
than 'fork'

---------

Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
mchen-sentry pushed a commit to getsentry/sentry that referenced this pull request Feb 24, 2026
Support  getsentry/arroyo#520.

We single node multi process conwsumer hanging at times while the
SharedMemoryManager
starts. 
It seems the subprocess created by the SharedMemoryManager crashes due
to a race condition
in the datadog integration. See the PR above for reference.

This allows us to configure consumers to use 'spawn' start mode rather
than 'fork'

---------

Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants