Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First exception not flushed on Celery+Python until second occurrence #687

Open
nicolasgrasset opened this issue May 13, 2020 · 15 comments
Open
Assignees
Labels
Integration: Celery triaged Has been looked at recently during old issue triage Type: Bug Something isn't working Waiting for: Community
Milestone

Comments

@nicolasgrasset
Copy link

Here is an issue that's a little hard to understand. We run Sentry on Django+Celery.

Django==2.2.12
celery==4.4.2
sentry-sdk==0.14.3

We run many packages on that project, so I suspect it's a conflict with another one but I do not know where to start.

Details

  1. Exceptions are reported to Sentry as expected on the Django wsgi
  2. Exceptions from code running on Celery will not be reported immediately
  3. Triggering the same exception (same fingerprint, different message) a second time will "flush" both and they will both appear on Sentry
  4. Triggering 2 different exceptions on a single Celery task will not report either of them to Sentry
  5. Calling Hub.current.client.flush() doesn't change any of it

Celery task

@app.task
def sentry_logging_task(what):
    try:
        raise Exception("Sentry say what? {}".format(what))
    except Exception as e:
        logger.exception(str(e))

Sentry init

sentry_sdk.init(
    dsn=settings.DNS,
    integrations=[DjangoIntegration(), CeleryIntegration()],
    environment=settings.ENV
 )
 ignore_logger('django.security.DisallowedHost')
@untitaker
Copy link
Member

How are you launching the celery workers? Is there anything special about the way you are configuring celery?

@nicolasgrasset
Copy link
Author

Thanks for the follow up @untitaker:

Nothing special. I can reproduce the issue 100% of the time locally and I launch it with pipenv so it loads a .env file as environment variables

pipenv run celery worker -A app -n app

On production it runs in a docker container.

@RobbieClarken
Copy link

RobbieClarken commented May 11, 2021

I'm also seeing this issue. It appears to happen when an error is logged in the celery worker's main process before celery creates the process pool child processes. I can reproduce it with this code:

# tasks.py
import logging
import time
from secrets import token_hex

from celery import Celery
import sentry_sdk

sentry_sdk.init(dsn="CHANGE_ME", debug=True)
app = Celery(backend="rpc://")
logger = logging.getLogger(__name__)

logger.error(f"SETUP {token_hex(2)}")

@app.task
def my_task():
    logger.error(f"MY_TASK {token_hex(2)}")
    time.sleep(1)

To reproduce:

  1. Start rabbitmq in docker: docker run -d --rm -p 5672:5672 rabbitmq:3.8.2

  2. Wait 30 seconds to give rabbitmq time to start.

  3. Start the celery worker:

    python3 -m venv .venv
    .venv/bin/pip install sentry-sdk==1.1.0 celery==5.0.5
    .venv/bin/celery --app tasks worker -l INFO --autoscale 4,0

    (I use --autoscale 4,0 here because it reproduces the issue 100% of the time. The error will also happen without --autoscale around 90% of the time.)

  4. In a second terminal, trigger the worker to run my_task:

    .venv/bin/python -c 'import tasks; tasks.my_task.delay()'

When the worker first starts we see in the logs:

SETUP 7eac
 [sentry] DEBUG: Sending event, type:null level:error event_id:bd247509b0ee42d88fd0f4dc90ac9cc3 project:5334031 host:o419569.ingest.sentry.io

The "Sending event" line indicates sentry_sdk successfully processed the error that was logged at module-level scope in tasks.py. This event is seen at sentry.io as expected.

When the task is submitted to the worker (step 4 above) we see in the worker logs:

[2021-05-11 12:13:50,786: INFO/MainProcess] Received task: tasks.my_task[7ebd1b39-1479-450c-9f83-0da8f774a0b3]
[2021-05-11 12:13:50,787: INFO/MainProcess] Scaling up 1 processes.
[2021-05-11 12:13:50,845: ERROR/ForkPoolWorker-1] MY_TASK 644f
[2021-05-11 12:13:51,872: INFO/ForkPoolWorker-1] Task tasks.my_task[7ebd1b39-1479-450c-9f83-0da8f774a0b3] succeeded in 1.0273930020000002s: None
[2021-05-11 12:13:51,873: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False

There is no "Sending event" line so sentry_sdk has not processed this error for some reason. The event never appears at sentry.io.

When we kill the celery worker we see in the warm shutdown logs:

[2021-05-11 12:14:00,754: DEBUG/ForkPoolWorker-1] background worker got flush request
[2021-05-11 12:14:00,855: DEBUG/ForkPoolWorker-1] 2 event(s) pending on flush
[2021-05-11 12:14:02,757: ERROR/ForkPoolWorker-1] flush timed out, dropped 2 events
[2021-05-11 12:14:02,758: DEBUG/ForkPoolWorker-1] background worker flushed

This indicates the error is dropped and so never gets reported to Sentry. The fact it says "2 event(s) pending on flush" is interesting because there is actually only one event that didn't get sent.

Note if we submit two events (and they are handled by the same pool worker process) we see:

[2021-05-11 12:17:41,899: INFO/MainProcess] Received task: tasks.my_task[06305752-a686-4404-b473-2533f71e1c65]
[2021-05-11 12:17:41,899: INFO/MainProcess] Scaling up 1 processes.
[2021-05-11 12:17:41,954: ERROR/ForkPoolWorker-1] MY_TASK cb7a
[2021-05-11 12:17:42,980: INFO/ForkPoolWorker-1] Task tasks.my_task[06305752-a686-4404-b473-2533f71e1c65] succeeded in 1.0271145849999996s: None
[2021-05-11 12:17:42,981: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False

[2021-05-11 12:17:46,377: INFO/MainProcess] Received task: tasks.my_task[8802bffe-a4fd-42f8-b78c-7ede8544ff65]
[2021-05-11 12:17:46,379: ERROR/ForkPoolWorker-1] MY_TASK a313
[2021-05-11 12:17:46,382: DEBUG/ForkPoolWorker-1] Sending event, type:null level:error event_id:e2ffcdc31ea74188a2cff7d888524da8 project:5334031 host:o419569.ingest.sentry.io
[2021-05-11 12:17:46,427: DEBUG/ForkPoolWorker-1] Sending event, type:null level:error event_id:2c2bef1f28784ed586e2bb2b94315124 project:5334031 host:o419569.ingest.sentry.io
[2021-05-11 12:17:47,383: INFO/ForkPoolWorker-1] Task tasks.my_task[8802bffe-a4fd-42f8-b78c-7ede8544ff65] succeeded in 1.004195029s: None
[2021-05-11 12:17:47,384: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False

This matches what @fellowshipofone reported: the first error is not processed until there is a 2nd occurrence. Unfortunately this could be a long time after the first error is logged or could be never if the pool worker is terminated before a 2nd error is logged. This is especially a problem when using celery's autoscale feature as worker processes will be terminated shortly after the task is complete.

I did a little bit of debugging and found if I put log messages before and after self.not_empty.wait() in sentry_sdk._queue.Queue.get():

while not self._qsize():
    logger.warning("before self.not_empty.wait()")
    self.not_empty.wait()
    logger.warning("after self.not_empty.wait()")

and put a sleep and a log before self.not_empty.notify() in sentry_sdk._queue.Queue.put():

sleep(1)
logger.warning("before self.not_empty.notify()")
self.not_empty.notify()
logger.warning("after self.not_empty.notify()")

we get the following log output:

[2021-05-11 12:25:50,812: INFO/MainProcess] Received task: tasks.my_task[56e9b4a6-dfa3-4956-bb38-8a042dabde20]
[2021-05-11 12:25:50,812: INFO/MainProcess] Scaling up 1 processes.
[2021-05-11 12:25:50,874: ERROR/ForkPoolWorker-1] MY_TASK 25ea
[2021-05-11 12:25:50,877: WARNING/ForkPoolWorker-1] before self.not_empty.wait()
[2021-05-11 12:25:51,879: WARNING/ForkPoolWorker-1] before self.not_empty.notify()
[2021-05-11 12:25:51,880: WARNING/ForkPoolWorker-1] after self.not_empty.notify()
[2021-05-11 12:25:52,904: INFO/ForkPoolWorker-1] Task tasks.my_task[56e9b4a6-dfa3-4956-bb38-8a042dabde20] succeeded in 2.0308109759999997s: None

Note that we see before self.not_empty.wait() but not after self.not_empty.wait() despite self.not_empty.notify() being called. It appears the not_empty threading.Condition is in some kind of invalid state.

My system configuration is:

  • OS: macOS 10.14.6 but also seen in docker using the python:3.9.2-buster base image
  • Python: 3.9.4
  • sentry-sdk: 1.1.0
  • celery: 5.0.5

Here is the full output of the celery worker:

worker output before task is run
$ .venv/bin/celery --app tasks worker -l INFO --autoscale 4,0
 [sentry] DEBUG: Setting up integrations (with default = True)
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.django.DjangoIntegration: Django not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.flask.FlaskIntegration: Flask is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.bottle.BottleIntegration: Bottle not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.falcon.FalconIntegration: Falcon not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sanic.SanicIntegration: Sanic not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.rq.RqIntegration: RQ not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.aiohttp.AioHttpIntegration: AIOHTTP not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.tornado.TornadoIntegration: Tornado not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sqlalchemy.SqlalchemyIntegration: SQLAlchemy not installed.
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.boto3.Boto3Integration: botocore is not installed
 [sentry] DEBUG: Setting up previously not enabled integration logging
 [sentry] DEBUG: Setting up previously not enabled integration stdlib
 [sentry] DEBUG: Setting up previously not enabled integration excepthook
 [sentry] DEBUG: Setting up previously not enabled integration dedupe
 [sentry] DEBUG: Setting up previously not enabled integration atexit
 [sentry] DEBUG: Setting up previously not enabled integration modules
 [sentry] DEBUG: Setting up previously not enabled integration argv
 [sentry] DEBUG: Setting up previously not enabled integration threading
 [sentry] DEBUG: Setting up previously not enabled integration celery
 [sentry] DEBUG: Enabling integration logging
 [sentry] DEBUG: Enabling integration stdlib
 [sentry] DEBUG: Enabling integration excepthook
 [sentry] DEBUG: Enabling integration dedupe
 [sentry] DEBUG: Enabling integration atexit
 [sentry] DEBUG: Enabling integration modules
 [sentry] DEBUG: Enabling integration argv
 [sentry] DEBUG: Enabling integration threading
 [sentry] DEBUG: Enabling integration celery
SETUP 7eac
 [sentry] DEBUG: Sending event, type:null level:error event_id:bd247509b0ee42d88fd0f4dc90ac9cc3 project:5334031 host:o419569.ingest.sentry.io

celery@mbp.local v5.0.5 (singularity)

macOS-10.14.6-x86_64-i386-64bit 2021-05-11 12:13:24

[config]
.> app:         __main__:0x11193a520
.> transport:   amqp://guest:**@localhost:5672//
.> results:     rpc://
.> concurrency: {min=0, max=4} (prefork)
.> task events: OFF (enable -E to monitor tasks in this worker)

[queues]
.> celery           exchange=celery(direct) key=celery


[tasks]
  . tasks.my_task

[2021-05-11 12:13:24,638: INFO/MainProcess] Connected to amqp://guest:**@127.0.0.1:5672//
[2021-05-11 12:13:24,654: INFO/MainProcess] mingle: searching for neighbors
[2021-05-11 12:13:25,703: INFO/MainProcess] mingle: all alone
[2021-05-11 12:13:25,741: INFO/MainProcess] celery@mbp.local ready.
worker output when task is run
[2021-05-11 12:13:50,786: INFO/MainProcess] Received task: tasks.my_task[7ebd1b39-1479-450c-9f83-0da8f774a0b3]
[2021-05-11 12:13:50,787: INFO/MainProcess] Scaling up 1 processes.
[2021-05-11 12:13:50,845: ERROR/ForkPoolWorker-1] MY_TASK 644f
[2021-05-11 12:13:51,872: INFO/ForkPoolWorker-1] Task tasks.my_task[7ebd1b39-1479-450c-9f83-0da8f774a0b3] succeeded in 1.0273930020000002s: None
 [sentry] DEBUG: Discarding transaction because sampled = False
[2021-05-11 12:13:51,873: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False
worker output when worker is stopped with ctrl-c
^C
worker: Hitting Ctrl+C again will terminate all running tasks!

worker: Warm shutdown (MainProcess)
 [sentry] DEBUG: Flushing HTTP transport
[2021-05-11 12:14:00,753: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2021-05-11 12:14:00,754: DEBUG/ForkPoolWorker-1] background worker got flush request
 [sentry] DEBUG: 2 event(s) pending on flush
[2021-05-11 12:14:00,855: DEBUG/ForkPoolWorker-1] 2 event(s) pending on flush
 [sentry] ERROR: flush timed out, dropped 2 events
[2021-05-11 12:14:02,757: ERROR/ForkPoolWorker-1] flush timed out, dropped 2 events
 [sentry] DEBUG: background worker flushed
[2021-05-11 12:14:02,758: DEBUG/ForkPoolWorker-1] background worker flushed
 [sentry] DEBUG: atexit: got shutdown signal
[2021-05-11 12:14:04,275: DEBUG/MainProcess] atexit: got shutdown signal
 [sentry] DEBUG: atexit: shutting down client
[2021-05-11 12:14:04,276: DEBUG/MainProcess] atexit: shutting down client
 [sentry] DEBUG: Flushing HTTP transport
[2021-05-11 12:14:04,276: DEBUG/MainProcess] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2021-05-11 12:14:04,276: DEBUG/MainProcess] background worker got flush request
 [sentry] DEBUG: background worker flushed
[2021-05-11 12:14:04,277: DEBUG/MainProcess] background worker flushed
 [sentry] DEBUG: Killing HTTP transport
[2021-05-11 12:14:04,277: DEBUG/MainProcess] Killing HTTP transport
 [sentry] DEBUG: background worker got kill request
[2021-05-11 12:14:04,277: DEBUG/MainProcess] background worker got kill request

@RobbieClarken
Copy link

I've confirmed that the bug can still be reproduced with the steps outlined above with the latest release of sentry-sdk (v1.3.1).

logs
[2021-08-04 18:48:04,389: INFO/MainProcess] Scaling down 1 processes.
[2021-08-04 18:48:04,391: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
[2021-08-04 18:48:04,392: DEBUG/ForkPoolWorker-1] background worker got flush request
[2021-08-04 18:48:04,494: DEBUG/ForkPoolWorker-1] 2 event(s) pending on flush
[2021-08-04 18:48:06,395: ERROR/ForkPoolWorker-1] flush timed out, dropped 2 events
[2021-08-04 18:48:06,396: DEBUG/ForkPoolWorker-1] background worker flushed

@untitaker Would you mind removing the needs-information and question labels from this issue?

@untitaker untitaker added Type: Bug Something isn't working and removed needs-information labels Aug 4, 2021
@github-actions
Copy link

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@RobbieClarken
Copy link

I've confirmed that this bug is still present in sentry-sdk 1.5.1 and master although the behaviour has changed due to commit a6cc971.

There is still the issue that the first logged error doesn't get processed immediately and could stay pending on the queue indefinitely. However:

  • Prior to a6cc971 when the Celery worker was terminated the pending event would be dropped and never reported.
  • From a6cc971 onwards when the Celery worker is terminated the pending event does get reported to Sentry.

I still consider this an issue because the Celery worker may not be terminated for quite some time and so the error may not be reported until it is too late.

The code in #687 (comment) can still be used to replicate the issue. By default Celery autoscales down inactive workers after 30 seconds so the error will be reported after 30 seconds. This can be changed with the AUTOSCALE_KEEPALIVE environment variable. For example, setting AUTOSCALE_KEEPALIVE=600 will demonstrate the error doesn't get reported for 10 minutes.

@antonpirker antonpirker added this to the Celery Update milestone Mar 21, 2022
@sentrivana sentrivana added the triaged Has been looked at recently during old issue triage label Aug 14, 2023
@antonpirker antonpirker self-assigned this Nov 14, 2023
@antonpirker
Copy link
Member

Hey @nicolasgrasset and @RobbieClarken

Better late than never. I just tried to reproduce this with Sentry SDK version 1.35.0, Python 3.9.15, celery 5.3.5 and redis as the Celery broker. Redis server version 7.2.1

I used this tasks.py file:

# tasks.py
import os 
import logging
import time
from secrets import token_hex

from celery import Celery
import sentry_sdk


sentry_sdk.init(
    dsn=os.environ.get("SENTRY_DSN"),
    enable_tracing=True,
    debug=True,
    send_default_pii=True,
)

app = Celery("tasks", broker='redis://localhost:6379/0')
logger = logging.getLogger(__name__)

logger.error(f"SETUP {token_hex(2)}")

@app.task
def my_task():
    logger.error(f"MY_TASK2 {token_hex(2)}")
    time.sleep(1)

This is how I run celery:

celery -A tasks.app worker \
    --loglevel=INFO \
    --autoscale 4,0

And this is how I tall the task:

python -c 'import tasks; tasks.my_task.delay()'

This is the output of Celery I get:

[2023-11-14 15:03:24,218: INFO/MainProcess] Task tasks.my_task[c3877e25-ce4f-44ae-9917-ab4477ceb57a] received
[2023-11-14 15:03:24,218: INFO/MainProcess] Scaling up 1 processes.
[2023-11-14 15:03:24,241: DEBUG/ForkPoolWorker-1] [Tracing] Extracted propagation context from incoming data: {'trace_id': '60298be3580046bcb81e55784f1a87c2', 'parent_span_id': 'be2335eb4aead2ff', 'parent_sampled': None, 'span_id': '9c00cc87ad85d6f9'}
[2023-11-14 15:03:24,242: DEBUG/ForkPoolWorker-1] [Tracing] Starting <queue.task.celery> transaction <tasks.my_task>
[2023-11-14 15:03:24,242: DEBUG/ForkPoolWorker-1] [Profiling] Discarding profile because profiler was not started.
[2023-11-14 15:03:24,242: ERROR/ForkPoolWorker-1] MY_TASK2 709d
[2023-11-14 15:03:25,247: INFO/ForkPoolWorker-1] Task tasks.my_task[c3877e25-ce4f-44ae-9917-ab4477ceb57a] succeeded in 1.0041934169999998s: None
[2023-11-14 15:03:25,252: DEBUG/ForkPoolWorker-1] Sending envelope [envelope with 1 items (error)] project:5461230 host:o447951.ingest.sentry.io
[2023-11-14 15:03:25,287: DEBUG/ForkPoolWorker-1] Sending envelope [envelope with 1 items (transaction)] project:5461230 host:o447951.ingest.sentry.io
[2023-11-14 15:04:00,809: INFO/MainProcess] Scaling down 1 processes.
[2023-11-14 15:04:00,819: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
[2023-11-14 15:04:00,822: DEBUG/ForkPoolWorker-1] background worker got flush request

So immediately after the my_task is finished the error (and one transaction event) is sent to Sentry.

I am not able to reproduce this behavior.

As this is a quiet old issue, could you please check if the issue has been resolved as of now, or if it is still occurring!
Many thanks and sorry again for the late answer.

@RobbieClarken
Copy link

@antonpirker Yes, I can still reproduce the issue including with the dependencies you specified. However if I include enable_tracing=True in the sentry_sdk.init() I don't get the problem.

Could you please try reproducing without the enable_tracing=True line in tasks.py?

Logs

versions
$ .venv/bin/python --version
Python 3.9.15

$ .venv/bin/pip freeze
amqp==5.2.0
async-timeout==4.0.3
billiard==4.2.0
celery==5.3.5
certifi==2023.7.22
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
kombu==5.3.3
prompt-toolkit==3.0.41
python-dateutil==2.8.2
redis==5.0.1
sentry-sdk==1.35.0
six==1.16.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.1.0
vine==5.1.0
wcwidth==0.2.10

$ redis-server --version
Redis server v=7.2.3 sha=00000000:0 malloc=libc bits=64 build=96f23777753ded6c
tasks.py
import os
import logging
import time
from secrets import token_hex

from celery import Celery
import sentry_sdk


sentry_sdk.init(
    dsn=os.environ.get("SENTRY_DSN"),
    enable_tracing=True,  # this changes the behaviour
    debug=True,
    send_default_pii=True,
)

app = Celery("tasks", broker='redis://localhost:6379/0')
logger = logging.getLogger(__name__)

logger.error(f"SETUP {token_hex(2)}")

@app.task
def my_task():
    logger.error(f"MY_TASK2 {token_hex(2)}")
    time.sleep(1)
log without enable_tracing=True
$ .venv/bin/celery -A tasks.app worker --loglevel=INFO --autoscale 4,0
 [sentry] DEBUG: Setting up integrations (with default = True)
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.aiohttp.AioHttpIntegration: AIOHTTP not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.boto3.Boto3Integration: botocore is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.bottle.BottleIntegration: Bottle not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.django.DjangoIntegration: Django not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.falcon.FalconIntegration: Falcon not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.fastapi.FastApiIntegration: Starlette is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.flask.FlaskIntegration: Flask is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.httpx.HttpxIntegration: httpx is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.pyramid.PyramidIntegration: Pyramid not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.rq.RqIntegration: RQ not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sanic.SanicIntegration: Sanic not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sqlalchemy.SqlalchemyIntegration: SQLAlchemy not installed.
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.starlette.StarletteIntegration: Starlette is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.tornado.TornadoIntegration: Tornado not installed
 [sentry] DEBUG: Setting up previously not enabled integration argv
 [sentry] DEBUG: Setting up previously not enabled integration atexit
 [sentry] DEBUG: Setting up previously not enabled integration dedupe
 [sentry] DEBUG: Setting up previously not enabled integration excepthook
 [sentry] DEBUG: Setting up previously not enabled integration logging
 [sentry] DEBUG: Setting up previously not enabled integration modules
 [sentry] DEBUG: Setting up previously not enabled integration stdlib
 [sentry] DEBUG: Setting up previously not enabled integration threading
 [sentry] DEBUG: Setting up previously not enabled integration celery
 [sentry] DEBUG: Setting up previously not enabled integration redis
 [sentry] DEBUG: Enabling integration argv
 [sentry] DEBUG: Enabling integration atexit
 [sentry] DEBUG: Enabling integration dedupe
 [sentry] DEBUG: Enabling integration excepthook
 [sentry] DEBUG: Enabling integration logging
 [sentry] DEBUG: Enabling integration modules
 [sentry] DEBUG: Enabling integration stdlib
 [sentry] DEBUG: Enabling integration threading
 [sentry] DEBUG: Enabling integration celery
 [sentry] DEBUG: Enabling integration redis
 [sentry] DEBUG: Setting SDK name to 'sentry.python'
SETUP 7642
 [sentry] DEBUG: Sending event, type:null level:error event_id:1f40a7ab9b9348c98eacaf0b7d4d037a project:4506226358157312 host:o1412735.ingest.sentry.io

celery@noether v5.3.5 (emerald-rush)

macOS-12.5.1-arm64-64bit 2023-11-15 09:12:20

[config]
.> app:         tasks:0x10458f580
.> transport:   redis://localhost:6379/0
.> results:     disabled://
.> concurrency: {min=0, max=4} (prefork)
.> task events: OFF (enable -E to monitor tasks in this worker)

[queues]
.> celery           exchange=celery(direct) key=celery

[tasks]
  . tasks.my_task

[2023-11-15 09:12:20,956: WARNING/MainProcess] /Users/robbie/Developer/scratch/sentry-issue/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2023-11-15 09:12:20,959: INFO/MainProcess] Connected to redis://localhost:6379/0
[2023-11-15 09:12:20,960: WARNING/MainProcess] /Users/robbie/Developer/scratch/sentry-issue/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2023-11-15 09:12:20,961: INFO/MainProcess] mingle: searching for neighbors
[2023-11-15 09:12:21,970: INFO/MainProcess] mingle: all alone
[2023-11-15 09:12:21,986: INFO/MainProcess] celery@noether ready.
[2023-11-15 09:12:27,422: INFO/MainProcess] Task tasks.my_task[66843bc3-8186-42e2-8d03-c9e5febf60cb] received
[2023-11-15 09:12:27,422: INFO/MainProcess] Scaling up 1 processes.
 [sentry] DEBUG: [Tracing] Extracted propagation context from incoming data: {'dynamic_sampling_context': {'trace_id': 'bf59073d09ee4cbf928adcaddef40d03', 'environment': 'production', 'release': '0e22824e4e5ed5855a12e155a0398a1abe38f107', 'public_key': '183bf1a5cbdb1045cacdf2251e1d0ddc'}, 'trace_id': 'bf59073d09ee4cbf928adcaddef40d03', 'parent_span_id': 'bf9dee129daa6582', 'parent_sampled': None, 'span_id': '901b482e718b2401'}
[2023-11-15 09:12:27,458: DEBUG/ForkPoolWorker-1] [Tracing] Extracted propagation context from incoming data: {'dynamic_sampling_context': {'trace_id': 'bf59073d09ee4cbf928adcaddef40d03', 'environment': 'production', 'release': '0e22824e4e5ed5855a12e155a0398a1abe38f107', 'public_key': '183bf1a5cbdb1045cacdf2251e1d0ddc'}, 'trace_id': 'bf59073d09ee4cbf928adcaddef40d03', 'parent_span_id': 'bf9dee129daa6582', 'parent_sampled': None, 'span_id': '901b482e718b2401'}
 [sentry] DEBUG: [Profiling] Discarding profile because transaction is discarded.
[2023-11-15 09:12:27,459: DEBUG/ForkPoolWorker-1] [Profiling] Discarding profile because transaction is discarded.
[2023-11-15 09:12:27,459: ERROR/ForkPoolWorker-1] MY_TASK2 f01d
[2023-11-15 09:12:28,462: INFO/ForkPoolWorker-1] Task tasks.my_task[66843bc3-8186-42e2-8d03-c9e5febf60cb] succeeded in 1.0028278329999996s: None
 [sentry] DEBUG: Discarding transaction because sampled = False
[2023-11-15 09:12:28,464: DEBUG/ForkPoolWorker-1] Discarding transaction because sampled = False



^C
worker: Hitting Ctrl+C again will terminate all running tasks!

worker: Warm shutdown (MainProcess)
 [sentry] DEBUG: Flushing HTTP transport
[2023-11-15 09:12:38,671: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2023-11-15 09:12:38,672: DEBUG/ForkPoolWorker-1] background worker got flush request
 [sentry] DEBUG: Sending event, type:null level:error event_id:ac881e2b06c142509c948712e0352186 project:4506226358157312 host:o1412735.ingest.sentry.io
[2023-11-15 09:12:38,672: DEBUG/ForkPoolWorker-1] Sending event, type:null level:error event_id:ac881e2b06c142509c948712e0352186 project:4506226358157312 host:o1412735.ingest.sentry.io
 [sentry] DEBUG: background worker flushed
[2023-11-15 09:12:38,706: DEBUG/ForkPoolWorker-1] background worker flushed
 [sentry] DEBUG: atexit: got shutdown signal
[2023-11-15 09:12:40,220: DEBUG/MainProcess] atexit: got shutdown signal
 [sentry] DEBUG: atexit: shutting down client
[2023-11-15 09:12:40,221: DEBUG/MainProcess] atexit: shutting down client
 [sentry] DEBUG: Flushing HTTP transport
[2023-11-15 09:12:40,221: DEBUG/MainProcess] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2023-11-15 09:12:40,221: DEBUG/MainProcess] background worker got flush request
 [sentry] DEBUG: background worker flushed
[2023-11-15 09:12:40,222: DEBUG/MainProcess] background worker flushed
 [sentry] DEBUG: Killing HTTP transport
[2023-11-15 09:12:40,222: DEBUG/MainProcess] Killing HTTP transport
 [sentry] DEBUG: background worker got kill request
[2023-11-15 09:12:40,222: DEBUG/MainProcess] background worker got kill request
log with enable_tracing=True
09:14 (1) $ .venv/bin/celery -A tasks.app worker --loglevel=INFO --autoscale 4,0
 [sentry] DEBUG: Setting up integrations (with default = True)
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.aiohttp.AioHttpIntegration: AIOHTTP not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.boto3.Boto3Integration: botocore is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.bottle.BottleIntegration: Bottle not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.django.DjangoIntegration: Django not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.falcon.FalconIntegration: Falcon not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.fastapi.FastApiIntegration: Starlette is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.flask.FlaskIntegration: Flask is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.httpx.HttpxIntegration: httpx is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.pyramid.PyramidIntegration: Pyramid not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.rq.RqIntegration: RQ not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sanic.SanicIntegration: Sanic not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sqlalchemy.SqlalchemyIntegration: SQLAlchemy not installed.
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.starlette.StarletteIntegration: Starlette is not installed
 [sentry] DEBUG: Did not import default integration sentry_sdk.integrations.tornado.TornadoIntegration: Tornado not installed
 [sentry] DEBUG: Setting up previously not enabled integration argv
 [sentry] DEBUG: Setting up previously not enabled integration atexit
 [sentry] DEBUG: Setting up previously not enabled integration dedupe
 [sentry] DEBUG: Setting up previously not enabled integration excepthook
 [sentry] DEBUG: Setting up previously not enabled integration logging
 [sentry] DEBUG: Setting up previously not enabled integration modules
 [sentry] DEBUG: Setting up previously not enabled integration stdlib
 [sentry] DEBUG: Setting up previously not enabled integration threading
 [sentry] DEBUG: Setting up previously not enabled integration celery
 [sentry] DEBUG: Setting up previously not enabled integration redis
 [sentry] DEBUG: Enabling integration argv
 [sentry] DEBUG: Enabling integration atexit
 [sentry] DEBUG: Enabling integration dedupe
 [sentry] DEBUG: Enabling integration excepthook
 [sentry] DEBUG: Enabling integration logging
 [sentry] DEBUG: Enabling integration modules
 [sentry] DEBUG: Enabling integration stdlib
 [sentry] DEBUG: Enabling integration threading
 [sentry] DEBUG: Enabling integration celery
 [sentry] DEBUG: Enabling integration redis
 [sentry] DEBUG: Setting SDK name to 'sentry.python'
SETUP 38b1
 [sentry] DEBUG: Sending envelope [envelope with 1 items (error)] project:4506226358157312 host:o1412735.ingest.sentry.io

celery@noether v5.3.5 (emerald-rush)

macOS-12.5.1-arm64-64bit 2023-11-15 09:14:32

[config]
.> app:         tasks:0x1062e3580
.> transport:   redis://localhost:6379/0
.> results:     disabled://
.> concurrency: {min=0, max=4} (prefork)
.> task events: OFF (enable -E to monitor tasks in this worker)

[queues]
.> celery           exchange=celery(direct) key=celery

[tasks]
  . tasks.my_task

[2023-11-15 09:14:32,987: WARNING/MainProcess] /Users/robbie/Developer/scratch/sentry-issue/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2023-11-15 09:14:32,992: INFO/MainProcess] Connected to redis://localhost:6379/0
[2023-11-15 09:14:32,992: WARNING/MainProcess] /Users/robbie/Developer/scratch/sentry-issue/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:507: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
  warnings.warn(

[2023-11-15 09:14:32,995: INFO/MainProcess] mingle: searching for neighbors
[2023-11-15 09:14:34,004: INFO/MainProcess] mingle: all alone
[2023-11-15 09:14:34,021: INFO/MainProcess] celery@noether ready.
[2023-11-15 09:14:38,430: INFO/MainProcess] Task tasks.my_task[fc248b87-4b32-446b-96ce-ff9b073eb67d] received
[2023-11-15 09:14:38,430: INFO/MainProcess] Scaling up 1 processes.
 [sentry] DEBUG: [Tracing] Extracted propagation context from incoming data: {'trace_id': 'a59631e7c4d44897a71f93547aa23fb0', 'parent_span_id': '8665693329723f90', 'parent_sampled': None, 'span_id': 'bc1aca9d0bba7130'}
[2023-11-15 09:14:38,470: DEBUG/ForkPoolWorker-1] [Tracing] Extracted propagation context from incoming data: {'trace_id': 'a59631e7c4d44897a71f93547aa23fb0', 'parent_span_id': '8665693329723f90', 'parent_sampled': None, 'span_id': 'bc1aca9d0bba7130'}
 [sentry] DEBUG: [Tracing] Starting <queue.task.celery> transaction <tasks.my_task>
[2023-11-15 09:14:38,470: DEBUG/ForkPoolWorker-1] [Tracing] Starting <queue.task.celery> transaction <tasks.my_task>
 [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started.
[2023-11-15 09:14:38,471: DEBUG/ForkPoolWorker-1] [Profiling] Discarding profile because profiler was not started.
[2023-11-15 09:14:38,471: ERROR/ForkPoolWorker-1] MY_TASK2 1469
[2023-11-15 09:14:39,473: INFO/ForkPoolWorker-1] Task tasks.my_task[fc248b87-4b32-446b-96ce-ff9b073eb67d] succeeded in 1.0021337079999997s: None
 [sentry] DEBUG: Sending envelope [envelope with 1 items (error)] project:4506226358157312 host:o1412735.ingest.sentry.io
[2023-11-15 09:14:39,479: DEBUG/ForkPoolWorker-1] Sending envelope [envelope with 1 items (error)] project:4506226358157312 host:o1412735.ingest.sentry.io
 [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4506226358157312 host:o1412735.ingest.sentry.io
[2023-11-15 09:14:39,519: DEBUG/ForkPoolWorker-1] Sending envelope [envelope with 1 items (transaction)] project:4506226358157312 host:o1412735.ingest.sentry.io





^C
worker: Hitting Ctrl+C again will terminate all running tasks!

worker: Warm shutdown (MainProcess)
 [sentry] DEBUG: Flushing HTTP transport
[2023-11-15 09:14:47,642: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2023-11-15 09:14:47,642: DEBUG/ForkPoolWorker-1] background worker got flush request
 [sentry] DEBUG: background worker flushed
[2023-11-15 09:14:47,643: DEBUG/ForkPoolWorker-1] background worker flushed
 [sentry] DEBUG: atexit: got shutdown signal
[2023-11-15 09:14:49,159: DEBUG/MainProcess] atexit: got shutdown signal
 [sentry] DEBUG: atexit: shutting down client
[2023-11-15 09:14:49,159: DEBUG/MainProcess] atexit: shutting down client
 [sentry] DEBUG: Flushing HTTP transport
[2023-11-15 09:14:49,159: DEBUG/MainProcess] Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
[2023-11-15 09:14:49,160: DEBUG/MainProcess] background worker got flush request
 [sentry] DEBUG: background worker flushed
[2023-11-15 09:14:49,160: DEBUG/MainProcess] background worker flushed
 [sentry] DEBUG: Killing HTTP transport
[2023-11-15 09:14:49,160: DEBUG/MainProcess] Killing HTTP transport
 [sentry] DEBUG: background worker got kill request
[2023-11-15 09:14:49,160: DEBUG/MainProcess] background worker got kill request

@antonpirker
Copy link
Member

Thanks for the update. Yes, will try with enable_tracing=False (same thing as omitting it)

@szokeasaurusrex
Copy link
Member

Hi @RobbieClarken, I attempted to reproduce the issue using the instructions you provided. My reproduction is available here.

It indeed appears that there is a delay in sending the error event; however, in my testing, it appeared that the event still got sent eventually, either after a delay of approximately a few minutes or after interrupting the Celery worker with Ctrl-C.

Does this match the behavior you are observing?

@getsantry
Copy link

getsantry bot commented May 7, 2024

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@getsantry getsantry bot added the Stale label May 7, 2024
@RobbieClarken
Copy link

It indeed appears that there is a delay in sending the error event; however, in my testing, it appeared that the event still got sent eventually, either after a delay of approximately a few minutes or after interrupting the Celery worker with Ctrl-C.

@szokeasaurusrex I'm seeing the same behaviour when I install the latest sentry-sdk (v2.1.1).

The behaviour looks to have changed in sentry-sdk v1.4.0; prior to this version (eg v1.3.1) when the Celery worker was scaled down (after ~60 seconds of inactivity) we would see:

[2024-05-08 10:25:54,242: INFO/MainProcess] Scaling down 1 processes.
[2024-05-08 10:25:54,247: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
[2024-05-08 10:25:54,249: DEBUG/ForkPoolWorker-1] background worker got flush request
[2024-05-08 10:25:54,352: DEBUG/ForkPoolWorker-1] 2 event(s) pending on flush
[2024-05-08 10:25:56,254: ERROR/ForkPoolWorker-1] flush timed out, dropped 2 events
[2024-05-08 10:25:56,256: DEBUG/ForkPoolWorker-1] background worker flushed

From sentry-sdk v1.4.0 onwards we see:

[2024-05-08 10:24:30,125: INFO/MainProcess] Scaling down 1 processes.
[2024-05-08 10:24:30,141: DEBUG/ForkPoolWorker-1] Flushing HTTP transport
[2024-05-08 10:24:30,142: DEBUG/ForkPoolWorker-1] background worker got flush request
[2024-05-08 10:24:30,143: DEBUG/ForkPoolWorker-1] Sending event, type:null level:error event_id:389743b4dd5b415b81f6241aa55bf471 project:4506505340846080 host:o1412735.ingest.sentry.io
[2024-05-08 10:24:30,144: DEBUG/ForkPoolWorker-1] [Tracing] Adding `sentry-trace` header 8b973f65f8c446a0a7a5c899926a295a-a1e9de2c17787839- to outgoing request to https://o1412735.ingest.sentry.io/api/4506505340846080/store/.
[2024-05-08 10:24:30,180: DEBUG/ForkPoolWorker-1] background worker flushed

So it seems that the errors will now be eventually reported... it may just be after ~60 second delay.

@szokeasaurusrex
Copy link
Member

So it seems that the errors will now be eventually reported... it may just be after ~60 second delay.

@RobbieClarken So, if I am understanding you correctly, it sounds like this issue has now been resolved. Is that correct? Or, is there still some problem we need to address before we close this issue?

@RobbieClarken
Copy link

@szokeasaurusrex IMO the fact the error doesn't get reported until the celery worker is scaled down is an indication there is still a bug in sentry-sdk since errors should be reported immediately. I'm not sure if there is any situation where the worker will never get scaled down (which would mean the error would never be reported) or whether the bug can present itself in other ways.

That said, it may be that the current behaviour could be considered acceptable. With the default celery autoscaling configuration I doubt anyone would notice the delay in the error reporting.

@sl0thentr0py
Copy link
Member

sl0thentr0py commented May 8, 2024

@RobbieClarken adding some more info here because I just tried this out.

In general, we've been telling celery users to use the celeryd_init or worker_init signals to init sentry instead of like you do in your sample.

@signals.celeryd_init.connect
# @signals.worker_init.connect
def init_worker(**_kwargs):
    init_sentry()

With the signal init, I can see the error events sent immediately while with the module level init, there does seem to be a consistent delay.

I don't know the root cause (would need to dig into the celery worker / threading condition mess) but maybe this helps for other people looking at this issue.

We've tried to clarify this in the docs but maybe we should make this really clear.
https://docs.sentry.io/platforms/python/integrations/celery/#standalone-setup

sl0thentr0py added a commit to sl0thentr0py/sentry_sample_apps that referenced this issue May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration: Celery triaged Has been looked at recently during old issue triage Type: Bug Something isn't working Waiting for: Community
Projects
Status: No status
Status: Waiting for: Community
Status: Waiting for: Community
Development

No branches or pull requests

8 participants