Skip to content

ENG-3298: Add parent_id column to Comment model for threading#7864

Open
JadeCara wants to merge 17 commits intomainfrom
ENG-3298/comment-threading-migration
Open

ENG-3298: Add parent_id column to Comment model for threading#7864
JadeCara wants to merge 17 commits intomainfrom
ENG-3298/comment-threading-migration

Conversation

@JadeCara
Copy link
Copy Markdown
Contributor

@JadeCara JadeCara commented Apr 8, 2026

Ticket ENG-3298

Description Of Changes

Adds self-referential threading support to the Comment model as a prerequisite for the data subject correspondence feature. This is a purely additive, backward-compatible migration — no existing queries or behavior change.

The CommentType.reply enum value already exists but was unused. A follow-up ticket (ENG-3299) will add message_to_subject / reply_from_subject types and a delete guard to preserve correspondence threads.

Code Changes

  • Added nullable parent_id column (VARCHAR(255), FK to comment.id, ON DELETE SET NULL) to the Comment model
  • Added parent (many-to-one) and replies (one-to-many) self-referential relationships with overlaps and passive_deletes configured correctly
  • Added ix_comment_parent_id index
  • Created Alembic migration 1724da7ee981 (autogenerated, cleaned up)
  • Added TODO (ENG-3299) for delete guard when correspondence types are introduced

Steps to Confirm

  1. Run alembic upgrade head — migration applies cleanly
  2. Run alembic downgrade -1 then alembic upgrade head — reversible
  3. Verify parent_id column, ix_comment_parent_id index, and comment_parent_id_fkey FK exist in the comment table
  4. Run pytest tests/ctl/models/test_comment.py — all 14 tests pass

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

JadeCara and others added 2 commits April 8, 2026 15:09
Add self-referential parent_id FK to enable comment threading.
Includes alembic migration, index, and parent/replies relationships.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Apr 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
fides-plus-nightly Ready Ready Preview, Comment Apr 9, 2026 10:46pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
fides-privacy-center Ignored Ignored Apr 9, 2026 10:46pm

Request Review

JadeCara and others added 2 commits April 8, 2026 15:18
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 86.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.08%. Comparing base (e07701b) to head (b8609a8).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/fides/api/models/comment.py 86.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7864   +/-   ##
=======================================
  Coverage   85.08%   85.08%           
=======================================
  Files         627      627           
  Lines       40794    40809   +15     
  Branches     4742     4745    +3     
=======================================
+ Hits        34708    34722   +14     
- Misses       5017     5018    +1     
  Partials     1069     1069           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JadeCara JadeCara marked this pull request as ready for review April 8, 2026 21:48
@JadeCara JadeCara requested a review from a team as a code owner April 8, 2026 21:48
@JadeCara JadeCara requested review from thabofletcher and removed request for a team April 8, 2026 21:48
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR 7864: Add parent_id to Comment model

The migration and model changes are clean and straightforward. One issue needs to be resolved before merging: a mismatch between the FK delete rule and the SQLAlchemy relationship configuration that causes reply comments to be orphaned when a parent is deleted.

Critical

Reply orphaning on parent delete

The FK uses ondelete="SET NULL" but the replies relationship uses passive_deletes=True. These two settings are in conflict:

  • passive_deletes=True tells SQLAlchemy "the database will handle child rows — don't load or touch them before deleting the parent." This is correct for ON DELETE CASCADE (DB deletes child rows), not for ON DELETE SET NULL (DB only nullifies a column on child rows).
  • Comment.delete() only cleans up CommentReference rows and attachments for the parent, then calls db.delete(self). Replies are never touched.
  • Result: when a parent is deleted (e.g., via delete_comments_for_reference_and_type during a privacy-request purge), each reply is left in the DB with parent_id = NULL and no CommentReference row. The reference-based deletion query cannot find them, so they persist indefinitely — leaking user-authored content after the owning request is purged.

Two valid fixes:

  1. Change ondelete="SET NULL"ondelete="CASCADE" in both the migration and the model FK, and keep passive_deletes=True. The DB will delete replies when the parent is deleted.
  2. Remove passive_deletes=True, and in Comment.delete() explicitly iterate self.replies and call reply.delete(db) on each before deleting the parent.

See inline comments for specific line callouts.

Suggestions

  • No tests for the new relationship. tests/ctl/models/test_comment.py has no cases for parent_id. At minimum: create-reply round-trip, parent-delete cascade behaviour, and self-referential parent guard.
  • passive_deletes / FK semantic mismatch is called out inline on both the model and migration files.

JadeCara and others added 2 commits April 8, 2026 16:05
…d tests

- Remove passive_deletes=True from replies relationship (wrong for SET NULL)
- Explicitly delete replies in Comment.delete() before attachment/reference cleanup
- Add test_reply_relationship and test_delete_parent_comment_deletes_replies

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JadeCara
Copy link
Copy Markdown
Contributor Author

JadeCara commented Apr 9, 2026

/code-review

@JadeCara
Copy link
Copy Markdown
Contributor Author

JadeCara commented Apr 9, 2026

/code-review

1 similar comment
@JadeCara
Copy link
Copy Markdown
Contributor Author

JadeCara commented Apr 9, 2026

/code-review

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Add parent_id to Comment model for threading

The migration itself is well-structured (nullable column, named FK constraint and index, correct downgrade order), and the test coverage for the happy path is solid. There are two blocking issues to resolve before merge.


Blocking

1. ON DELETE SET NULL vs. manual cascade loop — pick one.
The FK is declared with ondelete="SET NULL" (preserving orphaned replies), but Comment.delete() then manually deletes all replies anyway. These behaviours are mutually exclusive. The current code produces inconsistent results depending on call path: comment.delete(db) cascades deletions, while db.delete(comment) leaves replies as orphans with parent_id = NULL. See the inline comment on delete() for the two clean options.

2. Missing overlaps on the self-referential relationship.
SQLAlchemy 1.4+ will emit SAWarning (and may error) for bidirectional self-referential relationships sharing the same FK column without overlaps declared. This will surface in CI. Both parent and replies need overlaps= added. See inline comment for the corrected definition.


Suggestions

  • remote_side="Comment.id" (string) — prefer remote_side=[id] (column object) for consistency with the rest of the codebase and eager resolution at mapper init.
  • replies relationship has no lazy= setting — lazy="selectin" would match the user relationship pattern and avoid N+1 queries when listing comments.
  • uselist=True on replies is the default for one-to-many — can be removed.
  • No test exercises the delete_comments_for_reference_and_type path (called by PrivacyRequest.delete()) with a reply present. This is the path most likely to leak orphaned reply rows in production.
  • CommentType.reply's docstring still says "reserved for future use and not currently supported" — worth updating now that it's being wired up.

JadeCara and others added 2 commits April 9, 2026 12:39
Expanded the delete() docstring to explain why the FK uses SET NULL as a
DB-level safety net while the method handles application-level cascade
cleanup of replies, attachments, and references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JadeCara
Copy link
Copy Markdown
Contributor Author

JadeCara commented Apr 9, 2026

/code-review

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: ENG-3298 — Comment Threading Migration

This PR introduces a parent_id self-referential FK on Comment to support threading. The migration, model changes, and tests are well-structured overall. Two correctness issues need to be addressed before merging.


Critical (Must Fix)

1. Recursive delete() is unbounded and risks session corruption (line 167)
reply.delete(db) calls delete() recursively. The schema permits arbitrary nesting depth — nothing prevents a reply from having its own replies. This can recurse indefinitely. Even at two levels, SQLAlchemy's identity map can yield DetachedInstanceError or double-delete once the inner recursive call has touched the same session objects the outer loop still holds. The existing test only covers one level of depth. See the inline comment for an iterative fix.

2. delete_comments_for_reference_and_type double-deletes replies (line 206)
The join query returns every Comment linked to the given reference_id — including replies that have their own CommentReference row. If a parent and its reply both point at the same reference, the loop calls comment.delete() on the parent (which recursively deletes the reply), then calls comment.delete() again on the already-deleted reply. Fix: filter the query to parent_id IS NULL and let delete() handle descendants. A test for this case is also needed.


Suggestions

3. remote_side uses a string literal (line 114)
remote_side="Comment.id" is inconsistent with the adjacent foreign_keys=[parent_id] which uses the column object. Prefer remote_side=[id].

4. No invariant enforced between comment_type and parent_id (line 108)
A reply can be created without a parent_id, and a note can be created with one. If the reply ↔ parent_id invariant is meaningful, enforce it at the model or DB level now rather than relying on callers.


Minor

  • The migration filename uses an xx_ prefix — confirm this is intentional and consistent with the project's migration naming convention.
  • The TODO on line 163 references future CommentType values; wording it as "additional" types would make it clear that reply already exists and is not the subject of the TODO.

💡 Write /code-review in a comment to re-run this review.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: ENG-3298 — Add parent_id to Comment model

The overall approach is clean and well-scoped. The migration is correct (additive, nullable, properly reversible), the docstring on delete() is clear about the SET NULL safety-net rationale, and the new tests cover the happy path. A few issues worth addressing before merge:

Blocking / correctness

Missing overlaps and passive_deletes on the self-referential relationships (two separate comments inline)
The PR description explicitly states both are "configured correctly," but neither is present in the code. Without overlaps, SQLAlchemy 1.4+ will emit SAWarning at every startup about the conflicting FK mapping between parent and replies. Without passive_deletes=True on replies, SQLAlchemy may issue a redundant UPDATE comment SET parent_id = NULL before the db.delete(self) call even though replies were already deleted in the loop above.

Non-blocking but worth fixing

Unbounded recursion in delete() — if a reply has replies, the recursive call walks the full depth. A guard or iterative approach would be safer, especially before the nesting depth is constrained by the API layer.

Missing bulk-deletion testdelete_comments_for_reference_and_type is the production code path (privacy request teardown, etc.) and has no test covering the case where a found comment has replies. The logic is correct, but a test would catch regressions when that path is touched again.

No invariant on replyparent_id — minor and clearly deferred to ENG-3299, but a TODO comment in __table_args__ would make that intent explicit for future reviewers.


💡 Write /code-review in a comment to re-run this review.

@JadeCara
Copy link
Copy Markdown
Contributor Author

JadeCara commented Apr 9, 2026

/code-review

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: ENG-3298 — Add parent_id to Comment model

The migration and model changes are well-structured: the column definition, FK (ON DELETE SET NULL), index, and back_populates wiring are all correct. The explicit recursive delete() rationale is clearly documented. A few things worth addressing before merge:

Issues

1. Missing test: reply attachment cleanup (medium)
The explicit justification for recursive deletion over CASCADE is that reply attachments must be cleaned up. There's no test verifying that AttachmentReference rows on a reply are removed when the parent is deleted. This is the highest-value gap since it's the core design contract.

2. Potential double-delete in delete() under lazy reload (medium)
self.replies is lazy-loaded. If a reply has already been marked db.delete(reply) in the current session (e.g., from the test_reply_relationship test calling reply.delete(db) before fixture teardown), a subsequent access of comment.replies could reload the same object from the DB and attempt a second db.delete() on it. SQLAlchemy's behavior in this edge case can vary by version and flush mode.

3. test_reply_relationship cleanup fragility (medium)
The test calls reply.delete(db) without committing, then relies on the fixture teardown to clean up the parent — but the teardown's comment.delete(db) iterates self.replies, which may re-fetch the pending-delete reply. The test should either not call reply.delete(db) (let teardown handle it) or call db.commit() immediately after.

4. No coverage for delete_comments_for_reference_and_type + threading (low)
The bulk delete method now has more complex behavior (recursive reply cleanup) but the threaded scenario isn't exercised in tests.

5. No depth constraint on threading (low / design note)
The model permits replies to have sub-replies (infinite nesting). The recursive delete() handles this correctly. If single-level threading is the product intent, worth documenting or enforcing.

6. replies uses default lazy loading (low)
Unlike user (which uses lazy="selectin"), replies uses the implicit select strategy. Fine for deletes, but an N+1 risk if replies are ever accessed in a bulk listing context.


Overall this is a clean, minimal, backward-compatible migration. The main ask before merging is a test covering reply attachment cleanup on parent deletion, and a fix or at least a review of the test_reply_relationship teardown ordering.


💡 Write /code-review in a comment to re-run this review.

JadeCara and others added 2 commits April 9, 2026 14:24
- Refactor Comment.delete() from recursive to iterative traversal to
  avoid unbounded recursion and session state hazards
- Add passive_deletes=True to replies relationship
- Remove fragile explicit reply.delete(db) in test_reply_relationship
- Add test_delete_parent_deletes_reply_attachments
- Add test_delete_comments_for_reference_deletes_replies

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant