First exception not flushed on Celery+Python until second occurrence #687

nicolasgrasset · 2020-05-13T15:38:59Z

Here is an issue that's a little hard to understand. We run Sentry on Django+Celery.

Django==2.2.12
celery==4.4.2
sentry-sdk==0.14.3

We run many packages on that project, so I suspect it's a conflict with another one but I do not know where to start.

Details

Exceptions are reported to Sentry as expected on the Django wsgi
Exceptions from code running on Celery will not be reported immediately
Triggering the same exception (same fingerprint, different message) a second time will "flush" both and they will both appear on Sentry
Triggering 2 different exceptions on a single Celery task will not report either of them to Sentry
Calling Hub.current.client.flush() doesn't change any of it

Celery task

@app.task
def sentry_logging_task(what):
    try:
        raise Exception("Sentry say what? {}".format(what))
    except Exception as e:
        logger.exception(str(e))

Sentry init

sentry_sdk.init(
    dsn=settings.DNS,
    integrations=[DjangoIntegration(), CeleryIntegration()],
    environment=settings.ENV
 )
 ignore_logger('django.security.DisallowedHost')

The text was updated successfully, but these errors were encountered:

untitaker · 2020-05-15T19:32:29Z

How are you launching the celery workers? Is there anything special about the way you are configuring celery?

nicolasgrasset · 2020-05-15T20:11:50Z

Thanks for the follow up @untitaker:

Nothing special. I can reproduce the issue 100% of the time locally and I launch it with pipenv so it loads a .env file as environment variables

pipenv run celery worker -A app -n app

On production it runs in a docker container.

RobbieClarken · 2021-05-11T03:40:48Z

I'm also seeing this issue. It appears to happen when an error is logged in the celery worker's main process before celery creates the process pool child processes. I can reproduce it with this code:

# tasks.py
import logging
import time
from secrets import token_hex

from celery import Celery
import sentry_sdk

sentry_sdk.init(dsn="CHANGE_ME", debug=True)
app = Celery(backend="rpc://")
logger = logging.getLogger(__name__)

logger.error(f"SETUP {token_hex(2)}")

@app.task
def my_task():
    logger.error(f"MY_TASK {token_hex(2)}")
    time.sleep(1)

To reproduce:

Start rabbitmq in docker: docker run -d --rm -p 5672:5672 rabbitmq:3.8.2
Wait 30 seconds to give rabbitmq time to start.
Start the celery worker:
```
python3 -m venv .venv
.venv/bin/pip install sentry-sdk==1.1.0 celery==5.0.5
.venv/bin/celery --app tasks worker -l INFO --autoscale 4,0
```
(I use --autoscale 4,0 here because it reproduces the issue 100% of the time. The error will also happen without --autoscale around 90% of the time.)

In a second terminal, trigger the worker to run my_task:

.venv/bin/python -c 'import tasks; tasks.my_task.delay()'

When the worker first starts we see in the logs:

SETUP 7eac
 [sentry] DEBUG: Sending event, type:null level:error event_id:bd247509b0ee42d88fd0f4dc90ac9cc3 project:5334031 host:o419569.ingest.sentry.io

The "Sending event" line indicates sentry_sdk successfully processed the error that was logged at module-level scope in tasks.py. This event is seen at sentry.io as expected.

When the task is submitted to the worker (step 4 above) we see in the worker logs:

[2021-05-11 12:13:50,786: INFO/MainProcess] Received task: tasks.my_task[7ebd1b39-1479-450c-9f83-0da8f774a0b3]
[2021-05-11 12:13:50,787: INFO/MainProcess] Scaling up 1 processes.
[2021-05-11 12:13:50,845: ERROR/ForkPoolWorker-1] MY_TASK 644f
[2021-05-11 12:13:51,872: INFO/ForkPoolWorker-1] Task tasks.my_task[7ebd1b39-1479-450c-9f83-0da8f774a0b3] succeeded in 1.0273930020000002s: None
[2021-05-11 12:13:51,873: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False

There is no "Sending event" line so sentry_sdk has not processed this error for some reason. The event never appears at sentry.io.

When we kill the celery worker we see in the warm shutdown logs:

[2021-05-11 12:14:00,754: DEBUG/ForkPoolWorker-1] background worker got flush request
[2021-05-11 12:14:00,855: DEBUG/ForkPoolWorker-1] 2 event(s) pending on flush
[2021-05-11 12:14:02,757: ERROR/ForkPoolWorker-1] flush timed out, dropped 2 events
[2021-05-11 12:14:02,758: DEBUG/ForkPoolWorker-1] background worker flushed

This indicates the error is dropped and so never gets reported to Sentry. The fact it says "2 event(s) pending on flush" is interesting because there is actually only one event that didn't get sent.

Note if we submit two events (and they are handled by the same pool worker process) we see:

[2021-05-11 12:17:41,899: INFO/MainProcess] Received task: tasks.my_task[06305752-a686-4404-b473-2533f71e1c65]
[2021-05-11 12:17:41,899: INFO/MainProcess] Scaling up 1 processes.
[2021-05-11 12:17:41,954: ERROR/ForkPoolWorker-1] MY_TASK cb7a
[2021-05-11 12:17:42,980: INFO/ForkPoolWorker-1] Task tasks.my_task[06305752-a686-4404-b473-2533f71e1c65] succeeded in 1.0271145849999996s: None
[2021-05-11 12:17:42,981: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False

[2021-05-11 12:17:46,377: INFO/MainProcess] Received task: tasks.my_task[8802bffe-a4fd-42f8-b78c-7ede8544ff65]
[2021-05-11 12:17:46,379: ERROR/ForkPoolWorker-1] MY_TASK a313
[2021-05-11 12:17:46,382: DEBUG/ForkPoolWorker-1] Sending event, type:null level:error event_id:e2ffcdc31ea74188a2cff7d888524da8 project:5334031 host:o419569.ingest.sentry.io
[2021-05-11 12:17:46,427: DEBUG/ForkPoolWorker-1] Sending event, type:null level:error event_id:2c2bef1f28784ed586e2bb2b94315124 project:5334031 host:o419569.ingest.sentry.io
[2021-05-11 12:17:47,383: INFO/ForkPoolWorker-1] Task tasks.my_task[8802bffe-a4fd-42f8-b78c-7ede8544ff65] succeeded in 1.004195029s: None
[2021-05-11 12:17:47,384: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False

This matches what @fellowshipofone reported: the first error is not processed until there is a 2nd occurrence. Unfortunately this could be a long time after the first error is logged or could be never if the pool worker is terminated before a 2nd error is logged. This is especially a problem when using celery's autoscale feature as worker processes will be terminated shortly after the task is complete.

I did a little bit of debugging and found if I put log messages before and after self.not_empty.wait() in sentry_sdk._queue.Queue.get():

while not self._qsize():
    logger.warning("before self.not_empty.wait()")
    self.not_empty.wait()
    logger.warning("after self.not_empty.wait()")

and put a sleep and a log before self.not_empty.notify() in sentry_sdk._queue.Queue.put():

sleep(1)
logger.warning("before self.not_empty.notify()")
self.not_empty.notify()
logger.warning("after self.not_empty.notify()")

we get the following log output:

[2021-05-11 12:25:50,812: INFO/MainProcess] Received task: tasks.my_task[56e9b4a6-dfa3-4956-bb38-8a042dabde20]
[2021-05-11 12:25:50,812: INFO/MainProcess] Scaling up 1 processes.
[2021-05-11 12:25:50,874: ERROR/ForkPoolWorker-1] MY_TASK 25ea
[2021-05-11 12:25:50,877: WARNING/ForkPoolWorker-1] before self.not_empty.wait()
[2021-05-11 12:25:51,879: WARNING/ForkPoolWorker-1] before self.not_empty.notify()
[2021-05-11 12:25:51,880: WARNING/ForkPoolWorker-1] after self.not_empty.notify()
[2021-05-11 12:25:52,904: INFO/ForkPoolWorker-1] Task tasks.my_task[56e9b4a6-dfa3-4956-bb38-8a042dabde20] succeeded in 2.0308109759999997s: None

Note that we see before self.not_empty.wait() but not after self.not_empty.wait() despite self.not_empty.notify() being called. It appears the not_empty threading.Condition is in some kind of invalid state.

My system configuration is:

OS: macOS 10.14.6 but also seen in docker using the python:3.9.2-buster base image
Python: 3.9.4
sentry-sdk: 1.1.0
celery: 5.0.5

Here is the full output of the celery worker:

worker output before task is run

$ .venv/bin/celery --app tasks worker -l INFO --autoscale 4,0
 [sentry] DEBUG: Setting up integrations (with default = True)
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.django.DjangoIntegration: Django not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.flask.FlaskIntegration: Flask is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.bottle.BottleIntegration: Bottle not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.falcon.FalconIntegration: Falcon not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sanic.SanicIntegration: Sanic not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.rq.RqIntegration: RQ not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.aiohttp.AioHttpIntegration: AIOHTTP not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.tornado.TornadoIntegration: Tornado not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sqlalchemy.SqlalchemyIntegration: SQLAlchemy not installed.
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.boto3.Boto3Integration: botocore is not installed
 [sentry] DEBUG: Setting up previously not enabled integration logging
 [sentry] DEBUG: Setting up previously not enabled integration stdlib
 [sentry] DEBUG: Setting up previously not enabled integration excepthook
 [sentry] DEBUG: Setting up previously not enabled integration dedupe
 [sentry] DEBUG: Setting up previously not enabled integration atexit
 [sentry] DEBUG: Setting up previously not enabled integration modules
 [sentry] DEBUG: Setting up previously not enabled integration argv
 [sentry] DEBUG: Setting up previously not enabled integration threading
 [sentry] DEBUG: Setting up previously not enabled integration celery
 [sentry] DEBUG: Enabling integration logging
 [sentry] DEBUG: Enabling integration stdlib
 [sentry] DEBUG: Enabling integration excepthook
 [sentry] DEBUG: Enabling integration dedupe
 [sentry] DEBUG: Enabling integration atexit
 [sentry] DEBUG: Enabling integration modules
 [sentry] DEBUG: Enabling integration argv
 [sentry] DEBUG: Enabling integration threading
 [sentry] DEBUG: Enabling integration celery
SETUP 7eac
 [sentry] DEBUG: Sending event, type:null level:error event_id:bd247509b0ee42d88fd0f4dc90ac9cc3 project:5334031 host:o419569.ingest.sentry.io

celery@mbp.local v5.0.5 (singularity)

macOS-10.14.6-x86_64-i386-64bit 2021-05-11 12:13:24

[config]
.> app:         __main__:0x11193a520
.> transport:   amqp://guest:**@localhost:5672//
.> results:     rpc://
.> concurrency: {min=0, max=4} (prefork)
.> task events: OFF (enable -E to monitor tasks in this worker)

[queues]
.> celery           exchange=celery(direct) key=celery


[tasks]
  . tasks.my_task

[2021-05-11 12:13:24,638: INFO/MainProcess] Connected to amqp://guest:**@127.0.0.1:5672//
[2021-05-11 12:13:24,654: INFO/MainProcess] mingle: searching for neighbors
[2021-05-11 12:13:25,703: INFO/MainProcess] mingle: all alone
[2021-05-11 12:13:25,741: INFO/MainProcess] celery@mbp.local ready.

worker output when task is run

[2021-05-11 12:13:50,786: INFO/MainProcess] Received task: tasks.my_task[7ebd1b39-1479-450c-9f83-0da8f774a0b3]
[2021-05-11 12:13:50,787: INFO/MainProcess] Scaling up 1 processes.
[2021-05-11 12:13:50,845: ERROR/ForkPoolWorker-1] MY_TASK 644f
[2021-05-11 12:13:51,872: INFO/ForkPoolWorker-1] Task tasks.my_task[7ebd1b39-1479-450c-9f83-0da8f774a0b3] succeeded in 1.0273930020000002s: None
 [sentry] DEBUG: Discarding transaction because sampled = False
[2021-05-11 12:13:51,873: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False

worker output when worker is stopped with ctrl-c

^C
worker: Hitting Ctrl+C again will terminate all running tasks!

worker: Warm shutdown (MainProcess)
 [sentry] DEBUG: Flushing HTTP transport
[2021-05-11 12:14:00,753: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2021-05-11 12:14:00,754: DEBUG/ForkPoolWorker-1] background worker got flush request
 [sentry] DEBUG: 2 event(s) pending on flush
[2021-05-11 12:14:00,855: DEBUG/ForkPoolWorker-1] 2 event(s) pending on flush
 [sentry] ERROR: flush timed out, dropped 2 events
[2021-05-11 12:14:02,757: ERROR/ForkPoolWorker-1] flush timed out, dropped 2 events
 [sentry] DEBUG: background worker flushed
[2021-05-11 12:14:02,758: DEBUG/ForkPoolWorker-1] background worker flushed
 [sentry] DEBUG: atexit: got shutdown signal
[2021-05-11 12:14:04,275: DEBUG/MainProcess] atexit: got shutdown signal
 [sentry] DEBUG: atexit: shutting down client
[2021-05-11 12:14:04,276: DEBUG/MainProcess] atexit: shutting down client
 [sentry] DEBUG: Flushing HTTP transport
[2021-05-11 12:14:04,276: DEBUG/MainProcess] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2021-05-11 12:14:04,276: DEBUG/MainProcess] background worker got flush request
 [sentry] DEBUG: background worker flushed
[2021-05-11 12:14:04,277: DEBUG/MainProcess] background worker flushed
 [sentry] DEBUG: Killing HTTP transport
[2021-05-11 12:14:04,277: DEBUG/MainProcess] Killing HTTP transport
 [sentry] DEBUG: background worker got kill request
[2021-05-11 12:14:04,277: DEBUG/MainProcess] background worker got kill request

RobbieClarken · 2021-08-04T09:28:46Z

I've confirmed that the bug can still be reproduced with the steps outlined above with the latest release of sentry-sdk (v1.3.1).

logs

[2021-08-04 18:48:04,389: INFO/MainProcess] Scaling down 1 processes.
[2021-08-04 18:48:04,391: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
[2021-08-04 18:48:04,392: DEBUG/ForkPoolWorker-1] background worker got flush request
[2021-08-04 18:48:04,494: DEBUG/ForkPoolWorker-1] 2 event(s) pending on flush
[2021-08-04 18:48:06,395: ERROR/ForkPoolWorker-1] flush timed out, dropped 2 events
[2021-08-04 18:48:06,396: DEBUG/ForkPoolWorker-1] background worker flushed

@untitaker Would you mind removing the needs-information and question labels from this issue?

github-actions · 2021-12-23T15:13:38Z

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

RobbieClarken · 2021-12-26T05:57:40Z

I've confirmed that this bug is still present in sentry-sdk 1.5.1 and master although the behaviour has changed due to commit a6cc971.

There is still the issue that the first logged error doesn't get processed immediately and could stay pending on the queue indefinitely. However:

Prior to a6cc971 when the Celery worker was terminated the pending event would be dropped and never reported.
From a6cc971 onwards when the Celery worker is terminated the pending event does get reported to Sentry.

I still consider this an issue because the Celery worker may not be terminated for quite some time and so the error may not be reported until it is too late.

The code in #687 (comment) can still be used to replicate the issue. By default Celery autoscales down inactive workers after 30 seconds so the error will be reported after 30 seconds. This can be changed with the AUTOSCALE_KEEPALIVE environment variable. For example, setting AUTOSCALE_KEEPALIVE=600 will demonstrate the error doesn't get reported for 10 minutes.

antonpirker · 2023-11-14T14:10:40Z

Hey @nicolasgrasset and @RobbieClarken

Better late than never. I just tried to reproduce this with Sentry SDK version 1.35.0, Python 3.9.15, celery 5.3.5 and redis as the Celery broker. Redis server version 7.2.1

I used this tasks.py file:

# tasks.py
import os 
import logging
import time
from secrets import token_hex

from celery import Celery
import sentry_sdk


sentry_sdk.init(
    dsn=os.environ.get("SENTRY_DSN"),
    enable_tracing=True,
    debug=True,
    send_default_pii=True,
)

app = Celery("tasks", broker='redis://localhost:6379/0')
logger = logging.getLogger(__name__)

logger.error(f"SETUP {token_hex(2)}")

@app.task
def my_task():
    logger.error(f"MY_TASK2 {token_hex(2)}")
    time.sleep(1)

This is how I run celery:

celery -A tasks.app worker \
    --loglevel=INFO \
    --autoscale 4,0

And this is how I tall the task:

python -c 'import tasks; tasks.my_task.delay()'

This is the output of Celery I get:

[2023-11-14 15:03:24,218: INFO/MainProcess] Task tasks.my_task[c3877e25-ce4f-44ae-9917-ab4477ceb57a] received
[2023-11-14 15:03:24,218: INFO/MainProcess] Scaling up 1 processes.
[2023-11-14 15:03:24,241: DEBUG/ForkPoolWorker-1] [Tracing] Extracted propagation context from incoming data: {'trace_id': '60298be3580046bcb81e55784f1a87c2', 'parent_span_id': 'be2335eb4aead2ff', 'parent_sampled': None, 'span_id': '9c00cc87ad85d6f9'}
[2023-11-14 15:03:24,242: DEBUG/ForkPoolWorker-1] [Tracing] Starting <queue.task.celery> transaction <tasks.my_task>
[2023-11-14 15:03:24,242: DEBUG/ForkPoolWorker-1] [Profiling] Discarding profile because profiler was not started.
[2023-11-14 15:03:24,242: ERROR/ForkPoolWorker-1] MY_TASK2 709d
[2023-11-14 15:03:25,247: INFO/ForkPoolWorker-1] Task tasks.my_task[c3877e25-ce4f-44ae-9917-ab4477ceb57a] succeeded in 1.0041934169999998s: None
[2023-11-14 15:03:25,252: DEBUG/ForkPoolWorker-1] Sending envelope [envelope with 1 items (error)] project:5461230 host:o447951.ingest.sentry.io
[2023-11-14 15:03:25,287: DEBUG/ForkPoolWorker-1] Sending envelope [envelope with 1 items (transaction)] project:5461230 host:o447951.ingest.sentry.io
[2023-11-14 15:04:00,809: INFO/MainProcess] Scaling down 1 processes.
[2023-11-14 15:04:00,819: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
[2023-11-14 15:04:00,822: DEBUG/ForkPoolWorker-1] background worker got flush request

So immediately after the my_task is finished the error (and one transaction event) is sent to Sentry.

I am not able to reproduce this behavior.

As this is a quiet old issue, could you please check if the issue has been resolved as of now, or if it is still occurring!
Many thanks and sorry again for the late answer.

RobbieClarken · 2023-11-14T22:56:56Z

@antonpirker Yes, I can still reproduce the issue including with the dependencies you specified. However if I include enable_tracing=True in the sentry_sdk.init() I don't get the problem.

Could you please try reproducing without the enable_tracing=True line in tasks.py?

Logs

versions

$ .venv/bin/python --version
Python 3.9.15

$ .venv/bin/pip freeze
amqp==5.2.0
async-timeout==4.0.3
billiard==4.2.0
celery==5.3.5
certifi==2023.7.22
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
kombu==5.3.3
prompt-toolkit==3.0.41
python-dateutil==2.8.2
redis==5.0.1
sentry-sdk==1.35.0
six==1.16.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.1.0
vine==5.1.0
wcwidth==0.2.10

$ redis-server --version
Redis server v=7.2.3 sha=00000000:0 malloc=libc bits=64 build=96f23777753ded6c

tasks.py

import os
import logging
import time
from secrets import token_hex

from celery import Celery
import sentry_sdk


sentry_sdk.init(
    dsn=os.environ.get("SENTRY_DSN"),
    enable_tracing=True,  # this changes the behaviour
    debug=True,
    send_default_pii=True,
)

app = Celery("tasks", broker='redis://localhost:6379/0')
logger = logging.getLogger(__name__)

logger.error(f"SETUP {token_hex(2)}")

@app.task
def my_task():
    logger.error(f"MY_TASK2 {token_hex(2)}")
    time.sleep(1)

log without enable_tracing=True

$ .venv/bin/celery -A tasks.app worker --loglevel=INFO --autoscale 4,0
 [sentry] DEBUG: Setting up integrations (with default = True)
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.aiohttp.AioHttpIntegration: AIOHTTP not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.boto3.Boto3Integration: botocore is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.bottle.BottleIntegration: Bottle not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.django.DjangoIntegration: Django not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.falcon.FalconIntegration: Falcon not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.fastapi.FastApiIntegration: Starlette is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.flask.FlaskIntegration: Flask is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.httpx.HttpxIntegration: httpx is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.pyramid.PyramidIntegration: Pyramid not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.rq.RqIntegration: RQ not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sanic.SanicIntegration: Sanic not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sqlalchemy.SqlalchemyIntegration: SQLAlchemy not installed.
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.starlette.StarletteIntegration: Starlette is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.tornado.TornadoIntegration: Tornado not installed
 [sentry] DEBUG: Setting up previously not enabled integration argv
 [sentry] DEBUG: Setting up previously not enabled integration atexit
 [sentry] DEBUG: Setting up previously not enabled integration dedupe
 [sentry] DEBUG: Setting up previously not enabled integration excepthook
 [sentry] DEBUG: Setting up previously not enabled integration logging
 [sentry] DEBUG: Setting up previously not enabled integration modules
 [sentry] DEBUG: Setting up previously not enabled integration stdlib
 [sentry] DEBUG: Setting up previously not enabled integration threading
 [sentry] DEBUG: Setting up previously not enabled integration celery
 [sentry] DEBUG: Setting up previously not enabled integration redis
 [sentry] DEBUG: Enabling integration argv
 [sentry] DEBUG: Enabling integration atexit
 [sentry] DEBUG: Enabling integration dedupe
 [sentry] DEBUG: Enabling integration excepthook
 [sentry] DEBUG: Enabling integration logging
 [sentry] DEBUG: Enabling integration modules
 [sentry] DEBUG: Enabling integration stdlib
 [sentry] DEBUG: Enabling integration threading
 [sentry] DEBUG: Enabling integration celery
 [sentry] DEBUG: Enabling integration redis
 [sentry] DEBUG: Setting SDK name to 'sentry.python'
SETUP 7642
 [sentry] DEBUG: Sending event, type:null level:error event_id:1f40a7ab9b9348c98eacaf0b7d4d037a project:4506226358157312 host:o1412735.ingest.sentry.io

celery@noether v5.3.5 (emerald-rush)

macOS-12.5.1-arm64-64bit 2023-11-15 09:12:20

[config]
.> app:         tasks:0x10458f580
.> transport:   redis://localhost:6379/0
.> results:     disabled://
.> concurrency: {min=0, max=4} (prefork)
.> task events: OFF (enable -E to monitor tasks in this worker)

[queues]
.> celery           exchange=celery(direct) key=celery

[tasks]
  . tasks.my_task

[2023-11-15 09:12:20,956: WARNING/MainProcess] /Users/robbie/Developer/scratch/sentry-issue/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2023-11-15 09:12:20,959: INFO/MainProcess] Connected to redis://localhost:6379/0
[2023-11-15 09:12:20,960: WARNING/MainProcess] /Users/robbie/Developer/scratch/sentry-issue/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2023-11-15 09:12:20,961: INFO/MainProcess] mingle: searching for neighbors
[2023-11-15 09:12:21,970: INFO/MainProcess] mingle: all alone
[2023-11-15 09:12:21,986: INFO/MainProcess] celery@noether ready.
[2023-11-15 09:12:27,422: INFO/MainProcess] Task tasks.my_task[66843bc3-8186-42e2-8d03-c9e5febf60cb] received
[2023-11-15 09:12:27,422: INFO/MainProcess] Scaling up 1 processes.
 [sentry] DEBUG: [Tracing] Extracted propagation context from incoming data: {'dynamic_sampling_context': {'trace_id': 'bf59073d09ee4cbf928adcaddef40d03', 'environment': 'production', 'release': '0e22824e4e5ed5855a12e155a0398a1abe38f107', 'public_key': '183bf1a5cbdb1045cacdf2251e1d0ddc'}, 'trace_id': 'bf59073d09ee4cbf928adcaddef40d03', 'parent_span_id': 'bf9dee129daa6582', 'parent_sampled': None, 'span_id': '901b482e718b2401'}
[2023-11-15 09:12:27,458: DEBUG/ForkPoolWorker-1] [Tracing] Extracted propagation context from incoming data: {'dynamic_sampling_context': {'trace_id': 'bf59073d09ee4cbf928adcaddef40d03', 'environment': 'production', 'release': '0e22824e4e5ed5855a12e155a0398a1abe38f107', 'public_key': '183bf1a5cbdb1045cacdf2251e1d0ddc'}, 'trace_id': 'bf59073d09ee4cbf928adcaddef40d03', 'parent_span_id': 'bf9dee129daa6582', 'parent_sampled': None, 'span_id': '901b482e718b2401'}
 [sentry] DEBUG: [Profiling] Discarding profile because transaction is discarded.
[2023-11-15 09:12:27,459: DEBUG/ForkPoolWorker-1] [Profiling] Discarding profile because transaction is discarded.
[2023-11-15 09:12:27,459: ERROR/ForkPoolWorker-1] MY_TASK2 f01d
[2023-11-15 09:12:28,462: INFO/ForkPoolWorker-1] Task tasks.my_task[66843bc3-8186-42e2-8d03-c9e5febf60cb] succeeded in 1.0028278329999996s: None
 [sentry] DEBUG: Discarding transaction because sampled = False
[2023-11-15 09:12:28,464: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False



^C
worker: Hitting Ctrl+C again will terminate all running tasks!

worker: Warm shutdown (MainProcess)
 [sentry] DEBUG: Flushing HTTP transport
[2023-11-15 09:12:38,671: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2023-11-15 09:12:38,672: DEBUG/ForkPoolWorker-1] background worker got flush request
 [sentry] DEBUG: Sending event, type:null level:error event_id:ac881e2b06c142509c948712e0352186 project:4506226358157312 host:o1412735.ingest.sentry.io
[2023-11-15 09:12:38,672: DEBUG/ForkPoolWorker-1] Sending event, type:null level:error event_id:ac881e2b06c142509c948712e0352186 project:4506226358157312 host:o1412735.ingest.sentry.io
 [sentry] DEBUG: background worker flushed
[2023-11-15 09:12:38,706: DEBUG/ForkPoolWorker-1] background worker flushed
 [sentry] DEBUG: atexit: got shutdown signal
[2023-11-15 09:12:40,220: DEBUG/MainProcess] atexit: got shutdown signal
 [sentry] DEBUG: atexit: shutting down client
[2023-11-15 09:12:40,221: DEBUG/MainProcess] atexit: shutting down client
 [sentry] DEBUG: Flushing HTTP transport
[2023-11-15 09:12:40,221: DEBUG/MainProcess] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2023-11-15 09:12:40,221: DEBUG/MainProcess] background worker got flush request
 [sentry] DEBUG: background worker flushed
[2023-11-15 09:12:40,222: DEBUG/MainProcess] background worker flushed
 [sentry] DEBUG: Killing HTTP transport
[2023-11-15 09:12:40,222: DEBUG/MainProcess] Killing HTTP transport
 [sentry] DEBUG: background worker got kill request
[2023-11-15 09:12:40,222: DEBUG/MainProcess] background worker got kill request

log with enable_tracing=True

09:14 (1) $ .venv/bin/celery -A tasks.app worker --loglevel=INFO --autoscale 4,0
 [sentry] DEBUG: Setting up integrations (with default = True)
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.aiohttp.AioHttpIntegration: AIOHTTP not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.boto3.Boto3Integration: botocore is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.bottle.BottleIntegration: Bottle not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.django.DjangoIntegration: Django not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.falcon.FalconIntegration: Falcon not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.fastapi.FastApiIntegration: Starlette is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.flask.FlaskIntegration: Flask is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.httpx.HttpxIntegration: httpx is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.pyramid.PyramidIntegration: Pyramid not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.rq.RqIntegration: RQ not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sanic.SanicIntegration: Sanic not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sqlalchemy.SqlalchemyIntegration: SQLAlchemy not installed.
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.starlette.StarletteIntegration: Starlette is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.tornado.TornadoIntegration: Tornado not installed
 [sentry] DEBUG: Setting up previously not enabled integration argv
 [sentry] DEBUG: Setting up previously not enabled integration atexit
 [sentry] DEBUG: Setting up previously not enabled integration dedupe
 [sentry] DEBUG: Setting up previously not enabled integration excepthook
 [sentry] DEBUG: Setting up previously not enabled integration logging
 [sentry] DEBUG: Setting up previously not enabled integration modules
 [sentry] DEBUG: Setting up previously not enabled integration stdlib
 [sentry] DEBUG: Setting up previously not enabled integration threading
 [sentry] DEBUG: Setting up previously not enabled integration celery
 [sentry] DEBUG: Setting up previously not enabled integration redis
 [sentry] DEBUG: Enabling integration argv
 [sentry] DEBUG: Enabling integration atexit
 [sentry] DEBUG: Enabling integration dedupe
 [sentry] DEBUG: Enabling integration excepthook
 [sentry] DEBUG: Enabling integration logging
 [sentry] DEBUG: Enabling integration modules
 [sentry] DEBUG: Enabling integration stdlib
 [sentry] DEBUG: Enabling integration threading
 [sentry] DEBUG: Enabling integration celery
 [sentry] DEBUG: Enabling integration redis
 [sentry] DEBUG: Setting SDK name to 'sentry.python'
SETUP 38b1
 [sentry] DEBUG: Sending envelope [envelope with 1 items (error)] project:4506226358157312 host:o1412735.ingest.sentry.io

celery@noether v5.3.5 (emerald-rush)

macOS-12.5.1-arm64-64bit 2023-11-15 09:14:32

[config]
.> app:         tasks:0x1062e3580
.> transport:   redis://localhost:6379/0
.> results:     disabled://
.> concurrency: {min=0, max=4} (prefork)
.> task events: OFF (enable -E to monitor tasks in this worker)

[queues]
.> celery           exchange=celery(direct) key=celery

[tasks]
  . tasks.my_task

[2023-11-15 09:14:32,987: WARNING/MainProcess] /Users/robbie/Developer/scratch/sentry-issue/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2023-11-15 09:14:32,992: INFO/MainProcess] Connected to redis://localhost:6379/0
[2023-11-15 09:14:32,992: WARNING/MainProcess] /Users/robbie/Developer/scratch/sentry-issue/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2023-11-15 09:14:32,995: INFO/MainProcess] mingle: searching for neighbors
[2023-11-15 09:14:34,004: INFO/MainProcess] mingle: all alone
[2023-11-15 09:14:34,021: INFO/MainProcess] celery@noether ready.
[2023-11-15 09:14:38,430: INFO/MainProcess] Task tasks.my_task[fc248b87-4b32-446b-96ce-ff9b073eb67d] received
[2023-11-15 09:14:38,430: INFO/MainProcess] Scaling up 1 processes.
 [sentry] DEBUG: [Tracing] Extracted propagation context from incoming data: {'trace_id': 'a59631e7c4d44897a71f93547aa23fb0', 'parent_span_id': '8665693329723f90', 'parent_sampled': None, 'span_id': 'bc1aca9d0bba7130'}
[2023-11-15 09:14:38,470: DEBUG/ForkPoolWorker-1] [Tracing] Extracted propagation context from incoming data: {'trace_id': 'a59631e7c4d44897a71f93547aa23fb0', 'parent_span_id': '8665693329723f90', 'parent_sampled': None, 'span_id': 'bc1aca9d0bba7130'}
 [sentry] DEBUG: [Tracing] Starting <queue.task.celery> transaction <tasks.my_task>
[2023-11-15 09:14:38,470: DEBUG/ForkPoolWorker-1] [Tracing] Starting <queue.task.celery> transaction <tasks.my_task>
 [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started.
[2023-11-15 09:14:38,471: DEBUG/ForkPoolWorker-1] [Profiling] Discarding profile because profiler was not started.
[2023-11-15 09:14:38,471: ERROR/ForkPoolWorker-1] MY_TASK2 1469
[2023-11-15 09:14:39,473: INFO/ForkPoolWorker-1] Task tasks.my_task[fc248b87-4b32-446b-96ce-ff9b073eb67d] succeeded in 1.0021337079999997s: None
 [sentry] DEBUG: Sending envelope [envelope with 1 items (error)] project:4506226358157312 host:o1412735.ingest.sentry.io
[2023-11-15 09:14:39,479: DEBUG/ForkPoolWorker-1] Sending envelope [envelope with 1 items (error)] project:4506226358157312 host:o1412735.ingest.sentry.io
 [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4506226358157312 host:o1412735.ingest.sentry.io
[2023-11-15 09:14:39,519: DEBUG/ForkPoolWorker-1] Sending envelope [envelope with 1 items (transaction)] project:4506226358157312 host:o1412735.ingest.sentry.io





^C
worker: Hitting Ctrl+C again will terminate all running tasks!

worker: Warm shutdown (MainProcess)
 [sentry] DEBUG: Flushing HTTP transport
[2023-11-15 09:14:47,642: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2023-11-15 09:14:47,642: DEBUG/ForkPoolWorker-1] background worker got flush request
 [sentry] DEBUG: background worker flushed
[2023-11-15 09:14:47,643: DEBUG/ForkPoolWorker-1] background worker flushed
 [sentry] DEBUG: atexit: got shutdown signal
[2023-11-15 09:14:49,159: DEBUG/MainProcess] atexit: got shutdown signal
 [sentry] DEBUG: atexit: shutting down client
[2023-11-15 09:14:49,159: DEBUG/MainProcess] atexit: shutting down client
 [sentry] DEBUG: Flushing HTTP transport
[2023-11-15 09:14:49,159: DEBUG/MainProcess] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2023-11-15 09:14:49,160: DEBUG/MainProcess] background worker got flush request
 [sentry] DEBUG: background worker flushed
[2023-11-15 09:14:49,160: DEBUG/MainProcess] background worker flushed
 [sentry] DEBUG: Killing HTTP transport
[2023-11-15 09:14:49,160: DEBUG/MainProcess] Killing HTTP transport
 [sentry] DEBUG: background worker got kill request
[2023-11-15 09:14:49,160: DEBUG/MainProcess] background worker got kill request

antonpirker · 2023-11-16T07:44:38Z

Thanks for the update. Yes, will try with enable_tracing=False (same thing as omitting it)

szokeasaurusrex · 2024-04-15T16:35:14Z

Hi @RobbieClarken, I attempted to reproduce the issue using the instructions you provided. My reproduction is available here.

It indeed appears that there is a delay in sending the error event; however, in my testing, it appeared that the event still got sent eventually, either after a delay of approximately a few minutes or after interrupting the Celery worker with Ctrl-C.

Does this match the behavior you are observing?

getsantry · 2024-05-07T07:00:05Z

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

RobbieClarken · 2024-05-08T01:23:37Z

It indeed appears that there is a delay in sending the error event; however, in my testing, it appeared that the event still got sent eventually, either after a delay of approximately a few minutes or after interrupting the Celery worker with Ctrl-C.

@szokeasaurusrex I'm seeing the same behaviour when I install the latest sentry-sdk (v2.1.1).

The behaviour looks to have changed in sentry-sdk v1.4.0; prior to this version (eg v1.3.1) when the Celery worker was scaled down (after ~60 seconds of inactivity) we would see:

[2024-05-08 10:25:54,242: INFO/MainProcess] Scaling down 1 processes.
[2024-05-08 10:25:54,247: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
[2024-05-08 10:25:54,249: DEBUG/ForkPoolWorker-1] background worker got flush request
[2024-05-08 10:25:54,352: DEBUG/ForkPoolWorker-1] 2 event(s) pending on flush
[2024-05-08 10:25:56,254: ERROR/ForkPoolWorker-1] flush timed out, dropped 2 events
[2024-05-08 10:25:56,256: DEBUG/ForkPoolWorker-1] background worker flushed

From sentry-sdk v1.4.0 onwards we see:

[2024-05-08 10:24:30,125: INFO/MainProcess] Scaling down 1 processes.
[2024-05-08 10:24:30,141: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
[2024-05-08 10:24:30,142: DEBUG/ForkPoolWorker-1] background worker got flush request
[2024-05-08 10:24:30,143: DEBUG/ForkPoolWorker-1] Sending event, type:null level:error event_id:389743b4dd5b415b81f6241aa55bf471 project:4506505340846080 host:o1412735.ingest.sentry.io
[2024-05-08 10:24:30,144: DEBUG/ForkPoolWorker-1] [Tracing] Adding `sentry-trace` header 8b973f65f8c446a0a7a5c899926a295a-a1e9de2c17787839- to outgoing request to https://o1412735.ingest.sentry.io/api/4506505340846080/store/.
[2024-05-08 10:24:30,180: DEBUG/ForkPoolWorker-1] background worker flushed

So it seems that the errors will now be eventually reported... it may just be after ~60 second delay.

szokeasaurusrex · 2024-05-08T09:41:14Z

So it seems that the errors will now be eventually reported... it may just be after ~60 second delay.

@RobbieClarken So, if I am understanding you correctly, it sounds like this issue has now been resolved. Is that correct? Or, is there still some problem we need to address before we close this issue?

RobbieClarken · 2024-05-08T11:25:54Z

@szokeasaurusrex IMO the fact the error doesn't get reported until the celery worker is scaled down is an indication there is still a bug in sentry-sdk since errors should be reported immediately. I'm not sure if there is any situation where the worker will never get scaled down (which would mean the error would never be reported) or whether the bug can present itself in other ways.

That said, it may be that the current behaviour could be considered acceptable. With the default celery autoscaling configuration I doubt anyone would notice the delay in the error reporting.

sl0thentr0py · 2024-05-08T14:28:08Z

@RobbieClarken adding some more info here because I just tried this out.

In general, we've been telling celery users to use the celeryd_init or worker_init signals to init sentry instead of like you do in your sample.

@signals.celeryd_init.connect
# @signals.worker_init.connect
def init_worker(**_kwargs):
    init_sentry()

With the signal init, I can see the error events sent immediately while with the module level init, there does seem to be a consistent delay.

I don't know the root cause (would need to dig into the celery worker / threading condition mess) but maybe this helps for other people looking at this issue.

We've tried to clarify this in the docs but maybe we should make this really clear.
https://docs.sentry.io/platforms/python/integrations/celery/#standalone-setup

getsentry/sentry-python#687

untitaker added needs-information labels May 15, 2020

untitaker added Type: Bug Something isn't working and removed needs-information labels Aug 4, 2021

github-actions bot added the Status: Stale label Dec 23, 2021

sl0thentr0py added Status: Backlog and removed Status: Stale labels Jan 21, 2022

antonpirker added the Integration: Celery label Feb 15, 2022

antonpirker added this to the Celery Update milestone Mar 21, 2022

hubertdeng123 removed the Status: Backlog label Jul 25, 2023

sentrivana added the triaged Has been looked at recently during old issue triage label Aug 14, 2023

antonpirker self-assigned this Nov 14, 2023

antonpirker added the Waiting for: Community label Nov 14, 2023

antonpirker removed their assignment Nov 14, 2023

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels Nov 14, 2023

getsantry bot removed the Waiting for: Product Owner label Nov 16, 2023

szokeasaurusrex self-assigned this Apr 15, 2024

szokeasaurusrex added the Waiting for: Community label Apr 15, 2024

szokeasaurusrex mentioned this issue Apr 16, 2024

Metrics not sent on program exit #2983

Open

getsantry bot added the Stale label May 7, 2024

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community Stale labels May 8, 2024

getsantry bot removed the Waiting for: Product Owner label May 8, 2024

szokeasaurusrex added the Waiting for: Community label May 8, 2024

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels May 8, 2024

getsantry bot removed the Waiting for: Product Owner label May 8, 2024

sl0thentr0py added the Waiting for: Community label May 8, 2024

sl0thentr0py added a commit to sl0thentr0py/sentry_sample_apps that referenced this issue May 8, 2024

celery delay repro

a0762e2

getsentry/sentry-python#687

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First exception not flushed on Celery+Python until second occurrence #687

First exception not flushed on Celery+Python until second occurrence #687

nicolasgrasset commented May 13, 2020

untitaker commented May 15, 2020

nicolasgrasset commented May 15, 2020

RobbieClarken commented May 11, 2021 •

edited

RobbieClarken commented Aug 4, 2021

github-actions bot commented Dec 23, 2021

RobbieClarken commented Dec 26, 2021

antonpirker commented Nov 14, 2023

RobbieClarken commented Nov 14, 2023

antonpirker commented Nov 16, 2023

szokeasaurusrex commented Apr 15, 2024

getsantry bot commented May 7, 2024

RobbieClarken commented May 8, 2024

szokeasaurusrex commented May 8, 2024

RobbieClarken commented May 8, 2024

sl0thentr0py commented May 8, 2024 •

edited

First exception not flushed on Celery+Python until second occurrence #687

First exception not flushed on Celery+Python until second occurrence #687

Comments

nicolasgrasset commented May 13, 2020

Details

Celery task

Sentry init

untitaker commented May 15, 2020

nicolasgrasset commented May 15, 2020

RobbieClarken commented May 11, 2021 • edited

RobbieClarken commented Aug 4, 2021

github-actions bot commented Dec 23, 2021

RobbieClarken commented Dec 26, 2021

antonpirker commented Nov 14, 2023

RobbieClarken commented Nov 14, 2023

Logs

antonpirker commented Nov 16, 2023

szokeasaurusrex commented Apr 15, 2024

getsantry bot commented May 7, 2024

RobbieClarken commented May 8, 2024

szokeasaurusrex commented May 8, 2024

RobbieClarken commented May 8, 2024

sl0thentr0py commented May 8, 2024 • edited

RobbieClarken commented May 11, 2021 •

edited

sl0thentr0py commented May 8, 2024 •

edited