Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add seen tables and update seen status based on downloads in JI #5505

Merged
merged 6 commits into from
Sep 24, 2020

Conversation

sssoleileraaa
Copy link
Contributor

@sssoleileraaa sssoleileraaa commented Sep 16, 2020

Description of Changes

Fixes #5474
Also fixes a bug found while working on this PR, see "Change 3"

Change 1

Adds association tables so we know which journalists have seen a file, message, or reply:

  • seen_files
  • seen_replies
  • seen_messages

There has been significant consideration around journalist account deletion. While research is ongoing, we are going to continue to support keeping submissions as downloaded/seen even if the journalist who downloaded/saw the submission was deleted. And this will apply to replies as well.

Note about unique constraints in sqlite

In sqlite, null is the absence of a value, so it is never equal to other null values and not considered a duplicate value. This means that it's possible to insert rows that appear to be duplicates if one of the values is null. So if we have:

id  message_id  journalist_id
---------------------------------
1   12                  7
2   12                  8
3   13                  8
4   13                  5

and then journalist_id=8 is deleted, we would then have:

id  message_id  journalist_id
---------------------------------
1   12                  7
2   12                  
3   13                  
4   13                  5

and then journalist_id=5 is deleted, we would then have:

id  message_id  journalist_id
---------------------------------
1   12                  7
2   12                  
3   13                  
4   13                  

Once/if we create a global "deleted" account in the db, we will have to do a data migration to aggregate these duplicate rows and update journalist_id to the journalist_id of the global deleted account.

More info about deletion

Currently, SecureDrop provides a global inbox, so we preserve journalist replies even when journalist accounts are deleted. The replies are no longer attributed to the journalist, but other journalists will still be able to read the source conversation (which may have replies from multiple journalists) until the conversation is deleted with the source. Currently, we set 'journalist_id' to null in the replies table if the journalist is deleted, which should violate the foreign key constraint on the 'replies' table for 'journalist_id', but due to this issue the constraint is not currently enforced: #5503. Also, we lose information about how unique individuals sent those replies that we are persisting without their accounts.

We're also considering full-deletion of journalists, meaning deleting the account and the replies made from that journalist. The issue I see with this is that there are other people sharing the same global inbox and viewing the same source conversations those replies are part of. Removing a journalist's replies when their account is deleted means altering the conversations that other people see. Deleting all the historical data connected to a journalist would also make any global features more challenging. Take global read/unread (aka seen/unseen, the current feature we're working on implementing) for instance. If a journalist was the primary person checking securedrop for a long time and then their account was deleted, potentially many old outdated sources would get bumped up to the top of the list as "unread" if they were the only journalists who read these messages.

As mentioned already, #5503 for referential integrity.

Another issue for debate is switching to using uuids as our primary keys. This would remove redundancy and hide sequential ordering of ids tied to account creation. If using ids has made significant performance improvements, I'd be interested in seeing the results. Otherwise, I'd be interested in benchmarking so that we know the tradeoffs.

So here were a couple ideas that were considered when designing the seen tables.

Idea 1:

class SeenFile(db.Model):
    __tablename__ = "seen_files"
    file_id = Column(Integer, ForeignKey("submissions.id"), primary_key=True)
    journalist_id = Column(Integer, ForeignKey("journalists.id"), primary_key=True)
    file = relationship("Submission", backref="seen_files", cascade="delete")
    journalist = relationship("Journalist", backref="seen_files")

The association tables declare the foreign keys as primary keys so that the join operations are fast and prevent duplicate records of (journalist_id, message_id), (journalist_id, reply_id), and (journalist_id, file_id). This would require us to implement a global "deleted" journalist account, but there would be work involved in deciding when the "deleted" account should be created (upon first account deletion? by default?) and to be complete and organized about it, we would also want to do a data migration for replies with null journalist_ids. Also, there is some work around catching exceptions when journalists are deleted and making sure we re-attribute replies and seen records to the global "deleted" user. This third point would not be necessary if we implement #5467 (which I am in favor of doing in the future).

Idea 2 (the winner):

class SeenFile(db.Model):
    __tablename__ = "seen_files"
    __table_args__ = (
        db.UniqueConstraint('file_id', 'journalist_id'),
    )
    id = Column(Integer, primary_key=True)
    file_id = Column(Integer, ForeignKey("submissions.id"), nullable=False)
    journalist_id = Column(Integer, ForeignKey("journalists.id"), nullable=True)
    file = relationship("Submission", backref="seen_files", cascade="delete")
    journalist = relationship("Journalist", backref="seen_files")

We still have the same foreign keys but there is a new sequential 'id' column that is the primary key and a unique constraint on (file_id, journalist_id) to prevent duplicate records. Also, journalist_id can now be null. This is a workaround solution for allowing null 'journalist_id' (once we turn on foreign key support we will have to update this table along with replies) to support deleted accounts.

Change 2

The journalist interface no longer marks submissions as 'downloaded' and instead makes entries into the seen tables when:

  • files, messages, or replies are downloaded via the "Download Selected" button, by clicking on the filename link, and by clicking on the " unread" link from the "Sources" page
  • files are downloaded from the client

The submissions table now has a seen property that is set if 'downloaded' is set or there is a corresponding 'seen' entry in the database. This is for backwards compatibility.

Note about design choice

With a preference for avoiding fake/ special numbers and accounts, it made the most sense to keep the 'downloaded' column in the 'submissions' table and to leave the data there. 'downloaded' can now be used for historical data where we don’t know which journalist saw a submission. Other options considered were: * data migration using a fake 'journalist_id' of a very large number in the 'seen_files' table and removal of 'downloaded' * data migration using a special journalist account to represent an "unknown" user, as distinct from "deleted", and removal of 'downloaded'

Change 3

Fixed a bug where we were marking all files and messages as downloaded when the client opens from the JI endpoint /sources/<source_uuid>/submissions/<submission_uuid>/download, which is used in the background by the client just to get the latest messages and used to download files per user request. We now no longer mark files and messages as 'downloaded'.

Once we start using the new seen endpoints in the client we will be able to mark source messages as seen when a user clicks on the source and files as seen when a user clicks to open a file.

Testing

New tables and account deletion [Change 1]

  1. Make sure alembic upgrade and downgrade work as expected (you can just check that tests pass as well
  2. Check out this branch
  3. make dev
  4. docker exec -it securedrop-dev-0 bash
  5. sqlite3 /var/lib/securedrop/db.sqlite
  6. Check existence of tables
    • verify 'seen_replies', 'seen_files', and 'seen_messages' exist
  7. Check deletion of a journalist
    • Log in from JI as dellsberg and download a message
    • verify record exists in seen_messages table
    • Log in from JI as test-user and download the same message
    • verify new record exists in seen_messages table
    • Log in as journalist and delete dellsberg and test-user
    • verify journalist_id is now null for both records
  8. Check unique constraints for non-nulls
    • insert into seen_files (file_id, journalist_id) values (1, 1); twice
    • verify constraint does not allow this
    • repeat for 'seen_messages' and 'seen_replies'
  9. Check deletion of a file/message/reply
    • Send a file attachment from the source interface
    • Download it from the JI
    • verify seen record exists for this file
    • Now delete the file from the JI
    • verify record is deleted in the seen_files table
    • repeat for replies and messages
  10. Check deletion of a source
    • Delete a source with seen records for replies, files, and messages
    • verify seen records no longer exist

Behavior has not changed in the JI [Change 2]

  1. Check out this branch
  2. make dev
  3. docker exec -it securedrop-dev-0 bash
  4. sqlite3 /var/lib/securedrop/db.sqlite
  5. Check x-unread link
    • Click on the " unread" link for the source at the top of the list
    • verify JI works the same as before
    • verify the correct entries were added to the 'seen_messages' table for submissions
    • verify 'downloaded' was not set in the db
    • Repeat for files
  6. Check "Download Unread" button
    • Send a file and message
    • Select the source with the unread file and message and click "Download Unread"
    • verify JI works the same as before
    • Select the source again with no unread files and messages and click "Download Unread"
    • verify that you get error message "No unread submissions in selected collections."
    • Send a couple files
    • Select all sources in the source list and click "Download Unread"
    • verify JI works the same as before
  7. Check "Download Selected" button
    • Send a file, message, and reply
    • Select all conversation items for this source and click "Download Selected"
    • verify JI works the same as before
    • verify the correct entries were added to the 'seen_*' table
    • verify 'downloaded' was not set in the db
  8. Check filename-link
    • Send a file
    • Click on the filename of that file from the JI
    • verify JI works the same as before
    • verify the correct entry was added to the 'seen_files' table
    • verify 'downloaded' was not set in the db
    • Repeat for message
  9. Check unique constraint exception handling
    • verify that downloading the same file, message, or reply again and again works and that there is only one seen entry for each
  10. Check that sending a reply as a journalist creates seen record
    • Send a reply to a source
    • Verify seen record exists in the seen_replies table

New behavior in the JI [Change 3]

  1. Restart make dev
  2. Send 2 file attachments from the source interface as a new source
  3. Visit JI to confirm that there are unread files
  4. Log into the client
    • verify no files or messages are marked as read in the JI due to logging into the client
  5. Download one file from the client
    • verify the file is not marked as read in the JI
    • verify 'downloaded' was not set in the db
  6. Check that you can download a file that was seen via the JI
    • Download the unread file from the JI
    • Download the same file from the client
    • verify download was successful

Migration testing

Note: the migration is in the postinst of the package

(test on staging or dev)

  1. On your server running the latest securedrop, download some messages and files from the JI
  2. Checkout this branch, rebuild, and restart the server (for staging: build the debs and make staging, for dev: run make dev)
  3. Run manage.py init-db to apply the migration (Either ssh to your staging server or docker exec -it securedrop-dev-0 bash if you're testing on a dev server)
  4. Revisit the JI and see that the same files and messages are still downloaded
  5. Download some new messages, files, and replies (these will be marked as seen in the new seen tables)
  6. From the latest release branch, either a) build the debs and make staging, or b) make dev
  7. Checkout the latest release branch, rebuild, and restart the server (for staging: build the debs and make staging, for dev: run make dev)
  8. Revisit the JI and see that only the messages and files you downloaded from step 2 are still downloaded
  9. Download some more stuff to make sure nothing went horribly wrong during the downgrade

Or, instead of all the steps above, you can just do: https://docs.securedrop.org/en/stable/development/upgrade_testing.html?highlight=upgrade%20testing#upgrade-testing-using-molecule

QA Testing

Remember to follow https://docs.securedrop.org/en/stable/development/database_migrations.html?highlight=alembic#release-testing-migrations during QA testing

Upgrading existing production instances

N/A since database migrations are applied postinst

@sssoleileraaa
Copy link
Contributor Author

while I start working on tests and scripts for development/ qa, someone could start looking at my test plan and taking it for a spin (it's long, so I might have missed something, but it should be ready for a first pass)

@eloquence eloquence added this to In Development in SecureDrop Team Board Sep 16, 2020
@sssoleileraaa sssoleileraaa changed the title add seen tables Add seen tables and update seenstatus based on downloads in JI Sep 16, 2020
@eloquence eloquence changed the title Add seen tables and update seenstatus based on downloads in JI Add seen tables and update seen status based on downloads in JI Sep 17, 2020
@lgtm-com
Copy link

lgtm-com bot commented Sep 18, 2020

This pull request introduces 1 alert when merging a0b9b74 into 79e322f - view on LGTM.com

new alerts:

  • 1 for Unused import

@eloquence eloquence moved this from In Development to Under Review in SecureDrop Team Board Sep 18, 2020
@lgtm-com
Copy link

lgtm-com bot commented Sep 18, 2020

This pull request introduces 1 alert when merging da669bd into 79e322f - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Sep 18, 2020

This pull request introduces 2 alerts when merging 6ac3765 into 79e322f - view on LGTM.com

new alerts:

  • 2 for Unused import

@eloquence
Copy link
Member

  • verify 'seen_replies', 'seen_files', and 'seen_messages' exist
sqlite> .schema seen_files
CREATE TABLE seen_files (
	id INTEGER NOT NULL, 
	file_id INTEGER NOT NULL, 
	journalist_id INTEGER, 
	PRIMARY KEY (id), 
	UNIQUE (file_id, journalist_id), 
	FOREIGN KEY(file_id) REFERENCES submissions (id), 
	FOREIGN KEY(journalist_id) REFERENCES journalists (id)
);
sqlite> .schema seen_replies
CREATE TABLE seen_replies (
	id INTEGER NOT NULL, 
	reply_id INTEGER NOT NULL, 
	journalist_id INTEGER, 
	PRIMARY KEY (id), 
	UNIQUE (reply_id, journalist_id), 
	FOREIGN KEY(reply_id) REFERENCES replies (id), 
	FOREIGN KEY(journalist_id) REFERENCES journalists (id)
);
sqlite> .schema seen_messages
CREATE TABLE seen_messages (
	id INTEGER NOT NULL, 
	message_id INTEGER NOT NULL, 
	journalist_id INTEGER, 
	PRIMARY KEY (id), 
	UNIQUE (message_id, journalist_id), 
	FOREIGN KEY(message_id) REFERENCES submissions (id), 
	FOREIGN KEY(journalist_id) REFERENCES journalists (id)
  • Log in from JI as dellsberg and download a message
  • verify record exists in seen_messages table
  • Log in from JI as test-user and download the same message
  • verify new record exists in seen_messages table
  • Log in as journalist and delete dellsberg and test-user
  • verify journalist_id is now null for both records
  1. Check unique constraints for non-nulls
  • insert into seen_files (file_id, journalist_id) values (1, 1); twice
  • verify constraint does not allow this
  • repeat for 'seen_messages' and 'seen_replies'
  • Send a file attachment from the source interface
  • Download it from the JI
  • verify seen record exists for this file
  • Now delete the file from the JI
  • verify record is deleted in the seen_files table

@lgtm-com
Copy link

lgtm-com bot commented Sep 18, 2020

This pull request introduces 3 alerts when merging 387d4c8 into 79e322f - view on LGTM.com

new alerts:

  • 3 for Unused import

securedrop/models.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Sep 18, 2020

This pull request introduces 3 alerts when merging 0bcee2e into 79e322f - view on LGTM.com

new alerts:

  • 3 for Unused import

@eloquence
Copy link
Member

  • Click on the " unread" link for the source at the top of the list
  • verify JI works the same as before
  • verify the correct entries were added to the 'seen_messages' table for submissions
  • verify 'downloaded' was not set in the db
  • Repeat for files
  • Send a file, message, and reply
  • Select all conversation items for this source and click "Download Selected"
  • verify JI works the same as before
  • verify the correct entries were added to the 'seen_*' table
  • verify seen-* table entries added for each file, reply, and message.
  • verify 'downloaded' was not set in the db
  1. Check filename-link
  • Send a file, message, and reply
  • Select all conversation items for this source and click "Download Selected"
  • verify JI works the same as before
  • verify the correct entries were added to the 'seen_*' table
  • verify seen-* table entries added for each file, reply, and message.
  • verify 'downloaded' was not set in the db
  1. Check unique constraint exception handling
  • verify that downloading the same file, message, or reply again and again works and that there is only one seen entry for each
  1. Check that sending a reply as a journalist creates seen record
  • Send a reply to a source
  • Verify seen record exists in the seen_replies table

New behavior in the JI [Change 3]

  1. Restart make dev
  2. Send a file attachment from the source interface as a new source
  • 3. Visit JI to confirm that there are unread files and messages
  1. Log into the client
  • verify JI no longer marks everything as read by refreshing the JI index page
  1. Download the file from the client
  • verify the file is marked as read in the JI
  • verify entry exists in the seen_files table
  • verify 'downloaded' was not set in the db
  • ❌ Downloading a file from the client that's already seen causes a server-side exception due to unique constraint violation, and an error in the client

@sssoleileraaa
Copy link
Contributor Author

sssoleileraaa commented Sep 19, 2020

Thanks for the review! I add more tests to the test plan based on your feedback:

Check that you can download a file that was seen via the JI
- [ ] Download the unread file from the JI
- [ ] Download the same file from the client
- [ ] verify download was successful

Check deletion of a source
- Delete a source with seen records for replies, files, and messages
- [ ] verify seen records no longer exist

Check "Download Unread" button
- Send a file and message
- Select the source with the unread file and message and click "Download Unread"
- [ ] verify JI works the same as before
- Select the source again with no unread files and messages and click "Download Unread"
- [ ] verify that you get error message "No unread submissions in selected collections."
- Send a couple files
- Select all sources in the source list and click "Download Unread"
- [ ] verify download contains expected content and JI works the same as before

I'll mark this as "Ready" after making the fix and adding a test section around doing a migration as well. While it's ready I might continue to add more unit tests.

@sssoleileraaa
Copy link
Contributor Author

one more rebase since the base develop branch changed...

Copy link
Contributor

@rmol rmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass looks good. Had a few suggestions, no blockers though.

securedrop/journalist_app/main.py Show resolved Hide resolved
securedrop/journalist_app/main.py Show resolved Hide resolved
securedrop/models.py Outdated Show resolved Hide resolved
@sssoleileraaa
Copy link
Contributor Author

sssoleileraaa commented Sep 22, 2020

one of the challenges here is that unit testing migrations seems to always pass locally so i've been pushing changes to this pr in order to see where the tests fail (not the most efficient method, so i'll look into updating https://docs.securedrop.org/en/stable/development/database_migrations.html?highlight=alembic#unit-testing-migrations once i figure out what steps are required to get migration tests to work locally)

@sssoleileraaa sssoleileraaa force-pushed the 5474-seen-tables branch 2 times, most recently from 1c3e733 to ee10a74 Compare September 22, 2020 23:17
@sssoleileraaa
Copy link
Contributor Author

First pass looks good. Had a few suggestions, no blockers though.

Your comments have been addressed so this is ready for your 👁️ 👁️ again. Switching over to do an early review of #5513

Copy link
Contributor

@rmol rmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New tables and account deletion [Change 1]

  1. Make sure alembic upgrade and downgrade work as expected (you can just check that tests pass as well
  2. Check out this branch
  3. make dev
  4. docker exec -it securedrop-dev-0 bash
  5. sqlite3 /var/lib/securedrop/db.sqlite
  6. Check existence of tables
    • verify 'seen_replies', 'seen_files', and 'seen_messages' exist
  7. Check deletion of a journalist
    • Log in from JI as dellsberg and download a message
    • verify record exists in seen_messages table
    • Log in from JI as test-user and download the same message
    • verify new record exists in seen_messages table
    • Log in as journalist and delete dellsberg and test-user
    • verify journalist_id is now null for both records
  8. Check unique constraints for non-nulls
    • insert into seen_files (file_id, journalist_id) values (1, 1); twice
    • verify constraint does not allow this
    • repeat for 'seen_messages' and 'seen_replies'
  9. Check deletion of a file/message/reply
    • Send a file attachment from the source interface
    • Download it from the JI
    • verify seen record exists for this file
    • Now delete the file from the JI
    • verify record is deleted in the seen_files table
    • repeat for replies and messages
  10. Check deletion of a source
    • Delete a source with seen records for replies, files, and messages
    • verify seen records no longer exist

Behavior has not changed in the JI [Change 2]

  1. Check out this branch
  2. make dev
  3. docker exec -it securedrop-dev-0 bash
  4. sqlite3 /var/lib/securedrop/db.sqlite
  5. Check x-unread link
    • Click on the " unread" link for the source at the top of the list
    • verify JI works the same as before
    • verify the correct entries were added to the 'seen_messages' table for submissions
    • verify 'downloaded' was not set in the db
    • Repeat for files
  6. Check "Download Unread" button
    • Send a file and message
    • Select the source with the unread file and message and click "Download Unread"
    • verify JI works the same as before
    • Select the source again with no unread files and messages and click "Download Unread"
    • verify that you get error message "No unread submissions in selected collections."
    • Send a couple files
    • Select all sources in the source list and click "Download Unread"
    • verify JI works the same as before
  7. Check "Download Selected" button
    • Send a file, message, and reply
    • Select all conversation items for this source and click "Download Selected"
    • verify JI works the same as before
    • verify the correct entries were added to the 'seen_*' table
    • verify 'downloaded' was not set in the db
  8. Check filename-link
    • Send a file
    • Click on the filename of that file from the JI
    • verify JI works the same as before
    • verify the correct entry was added to the 'seen_files' table
    • verify 'downloaded' was not set in the db
    • Repeat for message
  9. Check unique constraint exception handling
    • verify that downloading the same file, message, or reply again and again works and that there is only one seen entry for each
  10. Check that sending a reply as a journalist creates seen record
    • Send a reply to a source
    • Verify seen record exists in the seen_replies table

New behavior in the JI [Change 3]

  1. Restart make dev
  2. Send 2 file attachments from the source interface as a new source
  3. Visit JI to confirm that there are unread files
  4. Log into the client
    • verify no files or messages are marked as read in the JI due to logging into the client
  5. Download one file from the client
    • verify the file is not marked as read in the JI
    • verify 'downloaded' was not set in the db
  6. Check that you can download a file that was seen via the JI
    • Download the unread file from the JI
    • Download the same file from the client
    • verify download was successful

I did not complete the migration testing as written; @zenmonkeykstop has volunteered to run through our upgrade testing scenario to validate the migrations instead.

@conorsch conorsch mentioned this pull request Sep 23, 2020
5 tasks
@zenmonkeykstop
Copy link
Contributor

upgrade testing scenario passed successfully:

  • ran make build-debs && make upgrade-start
  • logged into SI and created a few sources and submissions
  • logged into JI and downloaded a subset of submissions
  • updated molecule/upgrade/side-effect.yml to stop the upgrade resetting the db /shrug
  • ran make upgrade-test-local
  • verified that read/unread states were preserved in JI
  • verified that seen_files and seen_messages tables were created on app server db and are empty
  • verified that submissions can still be downloaded
  • verified that seen_* tables are updated when files downloaded from JI

@zenmonkeykstop zenmonkeykstop merged commit a1c9cbd into develop Sep 24, 2020
SecureDrop Team Board automation moved this from Under Review to Done Sep 24, 2020
@emkll emkll mentioned this pull request Sep 28, 2020
22 tasks
@sssoleileraaa sssoleileraaa deleted the 5474-seen-tables branch October 15, 2020 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Migrate messages, files, and replies to seen_by architecture
4 participants