fix: redact pending primary email before retirement deletion#38426
fix: redact pending primary email before retirement deletion#38426ktyagiapphelix2u wants to merge 3 commits intoopenedx:masterfrom
Conversation
| original_new_email = self.email_change.new_email | ||
| original_activation_key = self.email_change.activation_key | ||
| record_was_redacted = PendingEmailChange.redact_pending_email_by_user_value(self.user2, field='user') | ||
| assert not record_was_redacted |
There was a problem hiding this comment.
I don't really understand this test. Should it just have lines 617 and 618, where you ask to redact on a user that isn't in the table, and it returns that it didn't redact?
All the other details about the user 1 email change seem irrelevant and confusing. If you think it is important, I'd need better comments.
There was a problem hiding this comment.
The test now:
- Has a clear docstring explaining it verifies redacting a user with no pending email change returns False
- Only tests the relevant behavior - calling redact on user2 (who has no email change record) returns False
- Removes the confusing assertions about user1's email change data remaining unchanged
| Redact pending email change fields for records matching ``field=value``. | ||
| This method is intended for retirement flows where downstream systems | ||
| may keep soft-deleted snapshots of these rows. | ||
| """ |
There was a problem hiding this comment.
- Docstrings should have a one line summary, and an optional blank line and longer description. Like a comment message.
- Also, maybe add something like:
Returns True if redacted, and False if no matching records found.
| record.new_email = get_retired_email_by_email(record.new_email) | ||
| record.save(update_fields=['new_email']) |
There was a problem hiding this comment.
The PR description explicitly identifies activation_key as sensitive data that can still persist indirectly in logs, backups, or downstream systems. However, the implementation only redacts new_email . activation_key is left as-is. If downstream systems snapshot these rows, the activation key still leaks. It should be cleared before deletion, e.g.:
| record.new_email = get_retired_email_by_email(record.new_email) | |
| record.save(update_fields=['new_email']) | |
| record.new_email = get_retired_email_by_email(record.new_email) | |
| record.activation_key = '' # or a redacted value | |
| record.save(update_fields=['new_email', 'activation_key']) |
There was a problem hiding this comment.
@Akanshu-2u The activation_key field has a unique=True database constraint. If we attempt to redact it to a fixed value (empty string or redacted placeholder), we'll violate this constraint when processing multiple users, causing database integrity errors.
Additionally, the activation key is a random UUID with no PII - it's just a token
There was a problem hiding this comment.
@ktyagiapphelix2u: The PR description mentions the activation key. I agree that this PR should not touch it, but can you update the PR description to remove any mention of it? Thanks.
| assert record_was_redacted | ||
| self.email_change.refresh_from_db() | ||
| assert self.email_change.new_email == expected_retired_email | ||
| assert self.email_change.activation_key == original_activation_key |
There was a problem hiding this comment.
If activation_key redaction is added, this assertion must be updated to verify the key is also cleared/replaced. As-is, this test will need to change regardless once above issue is fixed.
There was a problem hiding this comment.
This test assertion is correct as-is and does not need to change.
As explained in the previous comment thread, we are not adding activation_key redaction
| records_matching_user_value = cls.objects.filter(**filter_kwargs) | ||
| if not records_matching_user_value.exists(): | ||
| return False | ||
| for record in records_matching_user_value: |
There was a problem hiding this comment.
Both queries fetch the same data. Since the queryset is lazy, .exists() and the for loop each trigger a separate SQL query. Change to:
| records_matching_user_value = cls.objects.filter(**filter_kwargs) | |
| if not records_matching_user_value.exists(): | |
| return False | |
| for record in records_matching_user_value: | |
| records = list(cls.objects.filter(**filter_kwargs)) | |
| if not records: | |
| return False | |
| for record in records: |
This is a single DB hit, which matters more if the field filter ever doesn't use the OneToOneField on user.
Note: The change is optional.
There was a problem hiding this comment.
Since PendingEmailChange has a OneToOneField on user, this will only ever return 0 or 1 records, so the memory impact is negligible while reducing database round-trips.
Thanks Updated
| record.new_email = get_retired_email_by_email(record.new_email) | ||
| record.save(update_fields=['new_email']) |
There was a problem hiding this comment.
@ktyagiapphelix2u: The PR description mentions the activation key. I agree that this PR should not touch it, but can you update the PR description to remove any mention of it? Thanks.
| PendingEmailChange.redact_pending_email_by_user_value(user, field="user") | ||
| PendingEmailChange.delete_by_user_value(user, field="user") |
There was a problem hiding this comment.
Don't we just want to redact in delete_by_user_value before the delete happens? We'd use the same simple value from other PRs, like redact-before-delete@redacted.com. And this would resolve for any flow that is deleting this record, including retirement flow or completion of pending email changes, etc.
There was a problem hiding this comment.
Additionally, I am alluding to two bugs in my above comment. Are these accurate? The PR description does not yet mention both of these. Also, the ticket has an AC that mentions three bugs. Is there another, or is that a copy/paste issue in the ticket?
Summary
This change addresses a privacy issue in the retirement flow for users who have a pending primary email change.
Problem
When a user retires with an active row in student_pendingemailchange, the LMS deletes that row. However, sensitive data such as the pending email and activation key could still persist indirectly (e.g., in logs, backups, or downstream systems), creating a privacy risk.
Root Cause
The retirement flow deleted PendingEmailChange records directly without redacting sensitive fields first.
What Changed
Added a model helper to redact PendingEmailChange fields for a user before deletion
Updated the retirement flow to call redaction before deleting records
Added tests to verify redaction behavior and correct ordering
Updated inline comments and PII annotations to explicitly document the “redact then delete” approach
Behavior Before
User retires with a pending primary email
LMS deletes the pending email row
Sensitive values (e.g., pending email) may still persist indirectly
Behavior After
User retires with a pending primary email
LMS first redacts sensitive fields in the pending email record
LMS deletes the record
Any persisted traces contain only redacted values
Ticket & Reference
https://2u-internal.atlassian.net/browse/BOMS-498