Skip to content

send 400 response to Cinder for some webhook validation errors#23175

Merged
eviljeff merged 2 commits intomozilla:masterfrom
eviljeff:14825-cinder-webhook-non-success-as-4xx
Mar 18, 2025
Merged

send 400 response to Cinder for some webhook validation errors#23175
eviljeff merged 2 commits intomozilla:masterfrom
eviljeff:14825-cinder-webhook-non-success-as-4xx

Conversation

@eviljeff
Copy link
Copy Markdown
Member

@eviljeff eviljeff commented Mar 14, 2025

Fixes: mozilla/addons#14825

Description

Changes what http response code we return for validation errors - i.e. when we can't/won't process the payload.

Context

settings.CINDER_UNIQUE_IDS is because our stage Cinder instance is enabled for addons-dev, addons stage, and also any local testing integration (e.g. pull request testing) so it's quite possible that a given cinder job, decision, or entity id is not valid in a payload simply because it doesn't originate from the same environment that the webhook is registered for. There is no way of filtering webhook responses at the Cinder level either - you either listen for all decisions originating from all environments or none, for example. So we don't want to return a 400 because that would pollute the logs with too many false positives. In Cinder prod the only registered webhook (right now...) is for AMO prod, so those same errors are returned as 400.

Doesn't change any functionality - neither 200 or 400 are retried - but will mean that genuine errors will show up as 400 in the Cinder webhook logs (/settings/webhooks), so we can identify actual errors that may be bugs more easily.

This patch doesn't change how we log the errors, or change any of the error handling (e.g. so we re-raise to Sentry), but that could be a follow-up.

Testing

To test you need to replay (and fake) some webhook requests - it's easier if the custom auth is commented out. There are 10 cases where errors are returned - and 4 of those act differently depending on whether it's a prod or non-prod environment - and its probably not practical to test every case for the PR.

A case where a 200 would always be expected:

  • (Enable all DSA/Cinder related waffle switches)
  • report an add-on for abuse, with a policy violation within the add-on
  • process the job for that abuse report in the reviewer tools (any resolution is fine)
  • In Cinder, copy the webhook decision payload for the decision you just made (there should be payloads for dev and stage that are the same)
  • replay that payload locally via curl, etc
  • a 200 response should be returned with the explanation "Decision already handled via reviewer tools" (as now)

A case where a 400 would always be expected:

  • Take the payload from the previous steps and change the enforcement actions so the list only contains strings for enforcement actions we don't support on AMO (e.g. ['not-an-amo-action'])
  • a 400 response should be returned with the explanation "Payload invalid: no supported enforcement_actions" (text as now)

A case where a 200 would be expected in a dev env, and 400 in a prod env:

  • Take the payload from the first steps (with the enforcement actions changes reverted) and change the job id source->job->id to some other id.
  • a 200 response should be returned with the explanation "No matching job id found" (as now)
  • And repeat, but with CINDER_UNIQUE_IDS set to True in local settings
    • same explanation, but this time as a 400.

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
  • Add before and after screenshots (Only for changes that impact the UI).
  • Add or update relevant docs reflecting the changes made.

@eviljeff eviljeff marked this pull request as ready for review March 14, 2025 16:03
@eviljeff eviljeff requested review from a team and diox and removed request for a team March 14, 2025 16:15
@diox diox self-requested a review March 14, 2025 17:03
@diox
Copy link
Copy Markdown
Member

diox commented Mar 17, 2025

A case where a 400 would always be expected:

Take the payload from the previous steps and change the enforcement actions so the list only contains strings for enforcement actions we don't support on AMO (e.g. ['not-an-amo-action'])

I did that, but still received a 200 with {"amo":{"received":true,"handled":false,"not_handled_reason":"Decision already handled via reviewer tools"}}.

I guess I should have used a non-reviewer tools job this time around ? The condition for that is above the one validating the enforcement actions. Edit: yes, that was it.

@eviljeff
Copy link
Copy Markdown
Member Author

A case where a 400 would always be expected:

Take the payload from the previous steps and change the enforcement actions so the list only contains strings for enforcement actions we don't support on AMO (e.g. ['not-an-amo-action'])

I did that, but still received a 200 with {"amo":{"received":true,"handled":false,"not_handled_reason":"Decision already handled via reviewer tools"}}.

I guess I should have used a non-reviewer tools job this time around ? The condition for that is above the one validating the enforcement actions.

yes - I forgot that check comes before the enforcement actions validation. You can just change the resolvable_in_reviewer_tools property of the job

@diox
Copy link
Copy Markdown
Member

diox commented Mar 17, 2025

A case where a 200 would be expected in a dev env, and 400 in a prod env:

Take the payload from the first steps (with the enforcement actions changes reverted) and change the job id source->job->id to some other id.
a 200 response should be returned with the explanation "No matching job id found" (as now)

I got a 201 here, {"amo":{"received":true,"handled":true}} - edit AFAICT no action was processed though...

And repeat, but with CINDER_UNIQUE_IDS set to True in local settings same explanation, but this time as a 400.

I got a 500:

Traceback (most recent call last):
(...)
  File "/data/olympia/src/olympia/abuse/views.py", line 305, in cinder_webhook
    process_webhook_payload_decision(payload)
  File "/data/olympia/src/olympia/abuse/views.py", line 255, in process_webhook_payload_decision
    cinder_job.process_decision(
  File "/data/olympia/src/olympia/abuse/models.py", line 293, in process_decision
    decision = ContentDecision.objects.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(...)
  File "/data/olympia/deps/lib/python3.12/site-packages/MySQLdb/connections.py", line 280, in query
    _mysql.connection.query(self, query)
django.db.utils.IntegrityError: (1062, "Duplicate entry '1234b0cf-d6de-4e6e-9c9c-70981014b2bc' for key 'abuse_cinderdecision.cinder_id'"

Copy link
Copy Markdown
Member

@diox diox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic seems fine, just slightly concerned about the disconnect between your STR and my results

@eviljeff
Copy link
Copy Markdown
Member Author

The logic seems fine, just slightly concerned about the disconnect between your STR and my results

well, I didn't test it in that sequence, I just thought up how to test in the least steps... and missed some, apparently! 😬 The 500 would be because the previous request succeeded (201) when it shouldn't have, so actually created the decision. It is a little concerning, but as this patch doesn't change any of the logic around when exceptions are raised - just if they're returned as 200 or 400 - it shouldn't regress anything.

@eviljeff eviljeff merged commit 89fc3de into mozilla:master Mar 18, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: change cinder_webhook to report non-success as 4xx

2 participants