-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add seen tables and update seen status based on downloads in JI #5505
Conversation
while I start working on tests and scripts for development/ qa, someone could start looking at my test plan and taking it for a spin (it's long, so I might have missed something, but it should be ready for a first pass) |
f6970a6
to
fa958e9
Compare
This pull request introduces 1 alert when merging a0b9b74 into 79e322f - view on LGTM.com new alerts:
|
a0b9b74
to
da669bd
Compare
This pull request introduces 1 alert when merging da669bd into 79e322f - view on LGTM.com new alerts:
|
da669bd
to
6ac3765
Compare
This pull request introduces 2 alerts when merging 6ac3765 into 79e322f - view on LGTM.com new alerts:
|
|
This pull request introduces 3 alerts when merging 387d4c8 into 79e322f - view on LGTM.com new alerts:
|
387d4c8
to
0bcee2e
Compare
This pull request introduces 3 alerts when merging 0bcee2e into 79e322f - view on LGTM.com new alerts:
|
New behavior in the JI [Change 3]
|
Thanks for the review! I add more tests to the test plan based on your feedback: Check that you can download a file that was seen via the JI Check deletion of a source Check "Download Unread" button I'll mark this as "Ready" after making the fix and adding a test section around doing a migration as well. While it's ready I might continue to add more unit tests. |
0bcee2e
to
4478aa6
Compare
one more rebase since the base |
6f77b32
to
d6be8a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass looks good. Had a few suggestions, no blockers though.
643ff46
to
0581db9
Compare
67f882d
to
0130428
Compare
one of the challenges here is that unit testing migrations seems to always pass locally so i've been pushing changes to this pr in order to see where the tests fail (not the most efficient method, so i'll look into updating https://docs.securedrop.org/en/stable/development/database_migrations.html?highlight=alembic#unit-testing-migrations once i figure out what steps are required to get migration tests to work locally) |
0130428
to
4bef63f
Compare
4bef63f
to
a79701f
Compare
1c3e733
to
ee10a74
Compare
Your comments have been addressed so this is ready for your 👁️ 👁️ again. Switching over to do an early review of #5513 |
ee10a74
to
3d13faf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New tables and account deletion [Change 1]
- Make sure alembic upgrade and downgrade work as expected (you can just check that tests pass as well
- Check out this branch
make dev
docker exec -it securedrop-dev-0 bash
sqlite3 /var/lib/securedrop/db.sqlite
- Check existence of tables
- verify 'seen_replies', 'seen_files', and 'seen_messages' exist
- Check deletion of a journalist
- Log in from JI as dellsberg and download a message
- verify record exists in seen_messages table
- Log in from JI as test-user and download the same message
- verify new record exists in seen_messages table
- Log in as journalist and delete dellsberg and test-user
- verify journalist_id is now null for both records
- Check unique constraints for non-nulls
insert into seen_files (file_id, journalist_id) values (1, 1);
twice- verify constraint does not allow this
- repeat for 'seen_messages' and 'seen_replies'
- Check deletion of a file/message/reply
- Send a file attachment from the source interface
- Download it from the JI
- verify seen record exists for this file
- Now delete the file from the JI
- verify record is deleted in the seen_files table
- repeat for replies and messages
- Check deletion of a source
- Delete a source with seen records for replies, files, and messages
- verify seen records no longer exist
Behavior has not changed in the JI [Change 2]
- Check out this branch
make dev
docker exec -it securedrop-dev-0 bash
sqlite3 /var/lib/securedrop/db.sqlite
- Check x-unread link
- Click on the " unread" link for the source at the top of the list
- verify JI works the same as before
- verify the correct entries were added to the 'seen_messages' table for submissions
- verify 'downloaded' was not set in the db
- Repeat for files
- Check "Download Unread" button
- Send a file and message
- Select the source with the unread file and message and click "Download Unread"
- verify JI works the same as before
- Select the source again with no unread files and messages and click "Download Unread"
- verify that you get error message "No unread submissions in selected collections."
- Send a couple files
- Select all sources in the source list and click "Download Unread"
- verify JI works the same as before
- Check "Download Selected" button
- Send a file, message, and reply
- Select all conversation items for this source and click "Download Selected"
- verify JI works the same as before
- verify the correct entries were added to the 'seen_*' table
- verify 'downloaded' was not set in the db
- Check filename-link
- Send a file
- Click on the filename of that file from the JI
- verify JI works the same as before
- verify the correct entry was added to the 'seen_files' table
- verify 'downloaded' was not set in the db
- Repeat for message
- Check unique constraint exception handling
- verify that downloading the same file, message, or reply again and again works and that there is only one seen entry for each
- Check that sending a reply as a journalist creates seen record
- Send a reply to a source
- Verify seen record exists in the seen_replies table
New behavior in the JI [Change 3]
- Restart
make dev
- Send 2 file attachments from the source interface as a new source
- Visit JI to confirm that there are unread files
- Log into the client
- verify no files or messages are marked as read in the JI due to logging into the client
- Download one file from the client
- verify the file is not marked as read in the JI
- verify 'downloaded' was not set in the db
- Check that you can download a file that was seen via the JI
- Download the unread file from the JI
- Download the same file from the client
- verify download was successful
I did not complete the migration testing as written; @zenmonkeykstop has volunteered to run through our upgrade testing scenario to validate the migrations instead.
upgrade testing scenario passed successfully:
|
Description of Changes
Fixes #5474
Also fixes a bug found while working on this PR, see "Change 3"
Change 1
Adds association tables so we know which journalists have seen a file, message, or reply:
There has been significant consideration around journalist account deletion. While research is ongoing, we are going to continue to support keeping submissions as downloaded/seen even if the journalist who downloaded/saw the submission was deleted. And this will apply to replies as well.
Note about unique constraints in sqlite
In sqlite, null is the absence of a value, so it is never equal to other null values and not considered a duplicate value. This means that it's possible to insert rows that appear to be duplicates if one of the values is null. So if we have:
and then
journalist_id=8
is deleted, we would then have:and then
journalist_id=5
is deleted, we would then have:Once/if we create a global "deleted" account in the db, we will have to do a data migration to aggregate these duplicate rows and update journalist_id to the journalist_id of the global deleted account.
More info about deletion
Currently, SecureDrop provides a global inbox, so we preserve journalist replies even when journalist accounts are deleted. The replies are no longer attributed to the journalist, but other journalists will still be able to read the source conversation (which may have replies from multiple journalists) until the conversation is deleted with the source. Currently, we set 'journalist_id' to null in the replies table if the journalist is deleted, which should violate the foreign key constraint on the 'replies' table for 'journalist_id', but due to this issue the constraint is not currently enforced: #5503. Also, we lose information about how unique individuals sent those replies that we are persisting without their accounts.
We're also considering full-deletion of journalists, meaning deleting the account and the replies made from that journalist. The issue I see with this is that there are other people sharing the same global inbox and viewing the same source conversations those replies are part of. Removing a journalist's replies when their account is deleted means altering the conversations that other people see. Deleting all the historical data connected to a journalist would also make any global features more challenging. Take global read/unread (aka seen/unseen, the current feature we're working on implementing) for instance. If a journalist was the primary person checking securedrop for a long time and then their account was deleted, potentially many old outdated sources would get bumped up to the top of the list as "unread" if they were the only journalists who read these messages.
As mentioned already, #5503 for referential integrity.
Another issue for debate is switching to using uuids as our primary keys. This would remove redundancy and hide sequential ordering of ids tied to account creation. If using ids has made significant performance improvements, I'd be interested in seeing the results. Otherwise, I'd be interested in benchmarking so that we know the tradeoffs.
So here were a couple ideas that were considered when designing the seen tables.
Idea 1:
The association tables declare the foreign keys as primary keys so that the join operations are fast and prevent duplicate records of (journalist_id, message_id), (journalist_id, reply_id), and (journalist_id, file_id). This would require us to implement a global "deleted" journalist account, but there would be work involved in deciding when the "deleted" account should be created (upon first account deletion? by default?) and to be complete and organized about it, we would also want to do a data migration for replies with null journalist_ids. Also, there is some work around catching exceptions when journalists are deleted and making sure we re-attribute replies and seen records to the global "deleted" user. This third point would not be necessary if we implement #5467 (which I am in favor of doing in the future).
Idea 2 (the winner):
We still have the same foreign keys but there is a new sequential 'id' column that is the primary key and a unique constraint on (file_id, journalist_id) to prevent duplicate records. Also, journalist_id can now be null. This is a workaround solution for allowing null 'journalist_id' (once we turn on foreign key support we will have to update this table along with replies) to support deleted accounts.
Change 2
The journalist interface no longer marks submissions as 'downloaded' and instead makes entries into the seen tables when:
The submissions table now has a
seen
property that is set if 'downloaded' is set or there is a corresponding 'seen' entry in the database. This is for backwards compatibility.Note about design choice
With a preference for avoiding fake/ special numbers and accounts, it made the most sense to keep the 'downloaded' column in the 'submissions' table and to leave the data there. 'downloaded' can now be used for historical data where we don’t know which journalist saw a submission. Other options considered were: * data migration using a fake 'journalist_id' of a very large number in the 'seen_files' table and removal of 'downloaded' * data migration using a special journalist account to represent an "unknown" user, as distinct from "deleted", and removal of 'downloaded'
Change 3
Fixed a bug where we were marking all files and messages as downloaded when the client opens from the JI endpoint
/sources/<source_uuid>/submissions/<submission_uuid>/download
, which is used in the background by the client just to get the latest messages and used to download files per user request. We now no longer mark files and messages as 'downloaded'.Once we start using the new seen endpoints in the client we will be able to mark source messages as seen when a user clicks on the source and files as seen when a user clicks to open a file.
Testing
New tables and account deletion [Change 1]
make dev
docker exec -it securedrop-dev-0 bash
sqlite3 /var/lib/securedrop/db.sqlite
insert into seen_files (file_id, journalist_id) values (1, 1);
twiceBehavior has not changed in the JI [Change 2]
make dev
docker exec -it securedrop-dev-0 bash
sqlite3 /var/lib/securedrop/db.sqlite
New behavior in the JI [Change 3]
make dev
Migration testing
Note: the migration is in the postinst of the package
(test on staging or dev)
manage.py init-db
to apply the migration (Either ssh to your staging server ordocker exec -it securedrop-dev-0 bash
if you're testing on a dev server)Or, instead of all the steps above, you can just do: https://docs.securedrop.org/en/stable/development/upgrade_testing.html?highlight=upgrade%20testing#upgrade-testing-using-molecule
QA Testing
Remember to follow https://docs.securedrop.org/en/stable/development/database_migrations.html?highlight=alembic#release-testing-migrations during QA testing
Upgrading existing production instances
N/A since database migrations are applied postinst