send 400 response to Cinder for some webhook validation errors by eviljeff · Pull Request #23175 · mozilla/addons-server

eviljeff · 2025-03-14T13:54:18Z

Description

Changes what http response code we return for validation errors - i.e. when we can't/won't process the payload.

Context

settings.CINDER_UNIQUE_IDS is because our stage Cinder instance is enabled for addons-dev, addons stage, and also any local testing integration (e.g. pull request testing) so it's quite possible that a given cinder job, decision, or entity id is not valid in a payload simply because it doesn't originate from the same environment that the webhook is registered for. There is no way of filtering webhook responses at the Cinder level either - you either listen for all decisions originating from all environments or none, for example. So we don't want to return a 400 because that would pollute the logs with too many false positives. In Cinder prod the only registered webhook (right now...) is for AMO prod, so those same errors are returned as 400.

Doesn't change any functionality - neither 200 or 400 are retried - but will mean that genuine errors will show up as 400 in the Cinder webhook logs (/settings/webhooks), so we can identify actual errors that may be bugs more easily.

This patch doesn't change how we log the errors, or change any of the error handling (e.g. so we re-raise to Sentry), but that could be a follow-up.

Testing

To test you need to replay (and fake) some webhook requests - it's easier if the custom auth is commented out. There are 10 cases where errors are returned - and 4 of those act differently depending on whether it's a prod or non-prod environment - and its probably not practical to test every case for the PR.

A case where a 200 would always be expected:

(Enable all DSA/Cinder related waffle switches)
report an add-on for abuse, with a policy violation within the add-on
process the job for that abuse report in the reviewer tools (any resolution is fine)
In Cinder, copy the webhook decision payload for the decision you just made (there should be payloads for dev and stage that are the same)
replay that payload locally via curl, etc
a 200 response should be returned with the explanation "Decision already handled via reviewer tools" (as now)

A case where a 400 would always be expected:

Take the payload from the previous steps and change the enforcement actions so the list only contains strings for enforcement actions we don't support on AMO (e.g. ['not-an-amo-action'])
a 400 response should be returned with the explanation "Payload invalid: no supported enforcement_actions" (text as now)

A case where a 200 would be expected in a dev env, and 400 in a prod env:

Take the payload from the first steps (with the enforcement actions changes reverted) and change the job id source->job->id to some other id.
a 200 response should be returned with the explanation "No matching job id found" (as now)
And repeat, but with CINDER_UNIQUE_IDS set to True in local settings
- same explanation, but this time as a 400.

Checklist

Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
Successfully verified the change locally.
The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
Add before and after screenshots (Only for changes that impact the UI).
Add or update relevant docs reflecting the changes made.

src/olympia/lib/settings_base.py

diox · 2025-03-17T16:06:16Z

A case where a 400 would always be expected:

Take the payload from the previous steps and change the enforcement actions so the list only contains strings for enforcement actions we don't support on AMO (e.g. ['not-an-amo-action'])

I did that, but still received a 200 with {"amo":{"received":true,"handled":false,"not_handled_reason":"Decision already handled via reviewer tools"}}.

I guess I should have used a non-reviewer tools job this time around ? The condition for that is above the one validating the enforcement actions. Edit: yes, that was it.

eviljeff · 2025-03-17T16:10:34Z

A case where a 400 would always be expected:

Take the payload from the previous steps and change the enforcement actions so the list only contains strings for enforcement actions we don't support on AMO (e.g. ['not-an-amo-action'])

I did that, but still received a 200 with {"amo":{"received":true,"handled":false,"not_handled_reason":"Decision already handled via reviewer tools"}}.

I guess I should have used a non-reviewer tools job this time around ? The condition for that is above the one validating the enforcement actions.

yes - I forgot that check comes before the enforcement actions validation. You can just change the resolvable_in_reviewer_tools property of the job

diox · 2025-03-17T16:16:39Z

A case where a 200 would be expected in a dev env, and 400 in a prod env:

Take the payload from the first steps (with the enforcement actions changes reverted) and change the job id source->job->id to some other id.
a 200 response should be returned with the explanation "No matching job id found" (as now)

I got a 201 here, {"amo":{"received":true,"handled":true}} - edit AFAICT no action was processed though...

And repeat, but with CINDER_UNIQUE_IDS set to True in local settings same explanation, but this time as a 400.

I got a 500:

Traceback (most recent call last):
(...)
  File "/data/olympia/src/olympia/abuse/views.py", line 305, in cinder_webhook
    process_webhook_payload_decision(payload)
  File "/data/olympia/src/olympia/abuse/views.py", line 255, in process_webhook_payload_decision
    cinder_job.process_decision(
  File "/data/olympia/src/olympia/abuse/models.py", line 293, in process_decision
    decision = ContentDecision.objects.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(...)
  File "/data/olympia/deps/lib/python3.12/site-packages/MySQLdb/connections.py", line 280, in query
    _mysql.connection.query(self, query)
django.db.utils.IntegrityError: (1062, "Duplicate entry '1234b0cf-d6de-4e6e-9c9c-70981014b2bc' for key 'abuse_cinderdecision.cinder_id'"

diox

The logic seems fine, just slightly concerned about the disconnect between your STR and my results

eviljeff · 2025-03-18T11:19:06Z

The logic seems fine, just slightly concerned about the disconnect between your STR and my results

well, I didn't test it in that sequence, I just thought up how to test in the least steps... and missed some, apparently! 😬 The 500 would be because the previous request succeeded (201) when it shouldn't have, so actually created the decision. It is a little concerning, but as this patch doesn't change any of the logic around when exceptions are raised - just if they're returned as 200 or 400 - it shouldn't regress anything.

send 400 response to Cinder for some webhook validation errors

496c581

eviljeff marked this pull request as ready for review March 14, 2025 16:03

eviljeff requested review from a team and diox and removed request for a team March 14, 2025 16:15

diox reviewed Mar 14, 2025

View reviewed changes

src/olympia/lib/settings_base.py Show resolved Hide resolved

diox self-requested a review March 14, 2025 17:03

Add comments in settings about CINDER_UNIQUE_IDS

346ff13

diox approved these changes Mar 17, 2025

View reviewed changes

eviljeff merged commit 89fc3de into mozilla:master Mar 18, 2025
41 checks passed

ioanarusiczki mentioned this pull request Mar 18, 2025

[Task]: change cinder_webhook to report non-success as 4xx mozilla/addons#14825

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

send 400 response to Cinder for some webhook validation errors#23175

send 400 response to Cinder for some webhook validation errors#23175
eviljeff merged 2 commits intomozilla:masterfrom
eviljeff:14825-cinder-webhook-non-success-as-4xx

eviljeff commented Mar 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

diox commented Mar 17, 2025 •

edited

Loading

Uh oh!

eviljeff commented Mar 17, 2025

Uh oh!

diox commented Mar 17, 2025 •

edited

Loading

Uh oh!

diox left a comment

Uh oh!

eviljeff commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eviljeff commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Context

Testing

Checklist

Uh oh!

Uh oh!

diox commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eviljeff commented Mar 17, 2025

Uh oh!

diox commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diox left a comment

Choose a reason for hiding this comment

Uh oh!

eviljeff commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eviljeff commented Mar 14, 2025 •

edited

Loading

diox commented Mar 17, 2025 •

edited

Loading

diox commented Mar 17, 2025 •

edited

Loading