Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break submissions table into files and messages #229

Closed
redshiftzero opened this issue Jan 23, 2019 · 6 comments
Closed

Break submissions table into files and messages #229

redshiftzero opened this issue Jan 23, 2019 · 6 comments
Assignees

Comments

@redshiftzero
Copy link
Contributor

redshiftzero commented Jan 23, 2019

Part of #231.

This ticket is for breaking up the submissions table into two tables: files and messages without other schema changes (other non-schema code changes will need to occur elsewhere in the application code). Note for the purpose of rapid development, let's not write alembic migrations since we'll do #48 before the next release.

Motivation: Ticket #217 describes adding a content column to the submission table. However, this column would only be populated for the messages submission type (not files). @joshuathayer and I were chatting yesterday, and were thinking it would make sense for us to first break up the submissions table into files and messages. This is because then in #217 we'd only add the content column to the messages table, and it would resolve the common confusion for people new to the client code between the distinction between submissions vs messages vs files.

@eloquence eloquence added this to Current Sprint - 1/23-2/6 in SecureDrop Team Board Jan 23, 2019
@eloquence eloquence moved this from Current Sprint - 2/6-2/20 to Near Term Backlog in SecureDrop Team Board Feb 6, 2019
@eloquence eloquence moved this from Near Term Backlog to Current Sprint - 2/20-3/6 in SecureDrop Team Board Feb 20, 2019
@heartsucker heartsucker moved this from Current Sprint - 2/20-3/6 to In Development in SecureDrop Team Board Feb 28, 2019
@heartsucker
Copy link
Contributor

I started implementing this and realized we are opening ourselves up to both security vulnerabilities and usability shortcomings. There is nothing in the backend that prevents large messages from being sent. There is a 500mb cap on the size of the request, but we typically think of this as files. Someone could repeatedly send 500mb messages. These would be loaded into the database which could degrade performance (especially with migrations). Additionally, this could break the UI by causing excessive memory usage or extreme sluggishness as massive messages are read from disk and loaded into widgets.

My proposal would be to keep them on disk and read the fist X unicode characters and store them in a "preview" column. If there are additional characters, there would we a "see full message" button that would open the message in a disposable VM like all other documents.

@ninavizz
Copy link
Member

ninavizz commented Mar 5, 2019

Hmm. In user testing journalists consistently asserted that "legit" Sources send long messages. Of course, there's always exceptions to the rule and we don't want to tweak the product in such a way that prevents a journalist from seeing a very important leak.

WRT "See full message" button... I feel that'd be a lot of work for a very marginal likelihood use-case. I also don't see it being evident to a user why they couldn't just view the full message in the message pane, even if it is multiple window-length scrolls long.

My primary concern for right now, though, would be to set-up thing on the backend to enable multiple file uploads for Sources. I'm not sure if the submissions table config would have anything to do with that, but only allowing Sources 1 file at a time to upload feels tedious and like a poor experience.

I also question if it encourages Source users to .zip multiple files into a single upload, which would keep Journalists from being able to preview/triage individual files as we're now enabling them to do in the Workstation—while also putting the whole thing at risk to not upload via being too big.

@heartsucker
Copy link
Contributor

If it is possible to crash the app by sending say, 20 500mb messages (10GB of text), then some trolls might do it for teh lulz. Or a government that is worried about an impeding leak might DOS all SecureDrops as a disruptive measure. This isn't something a source would do under normal conditions. This is specifically a guard against malicious content.

@redshiftzero
Copy link
Contributor Author

Good thoughts @heartsucker, I think we should (unordered list follows because both can be done independently):

  • Just enforce the reasonable limit server side since it's relatively easy to do and also improves UX: Enforce reasonable character limit to source message size securedrop#4230

  • I was considering the preview and "click here to open full file" button, but to @ninavizz's point this is work for what will become much harder once we do the above (i.e. the risk will be mitigated except for any historical files that were not cleaned up). For files that have already been uploaded, sqlite should be able to handle it (since it can definitely handle GBs of data), so my main concern is the UI sluggishness. This is something that we can research when we're working with news orgs during the pilot beta and we can implement this preview/click for full file behavior then if it's needed.

What do you think?

@ninavizz
Copy link
Member

ninavizz commented Mar 6, 2019

I love the character-limit suggestion, and feel it's a great way to mitigate potential necessity of CAPTCHAs. My concern with it is around user messaging in execution, however, and already there are a lot of things that are either potentially confusing or not clearly explained in the Source experience.

If it were to not be exposed as an up-front caveat and rather only presented as an error returned on a submission in excess of 10,000 characters, that to me seems like a good interim solution.

Tangential consideration is that some languages have much longer common words than English. Tagalog and Nordic languages come to mind. 10,000 characters is a lot, tho. Before studying this issue, I'd rather study how the Source interface currently looks in other alphabets—and how Journalists might receive those communications in the Workstation (like: if a Source submits in Hindi but the Workstation language is set to English, will the message show-up to journalists in the Workstation in Hindi characters, or as English alphabet garble matching ISO keys?).

@heartsucker
Copy link
Contributor

@redshiftzero Should we still consider only storing a partial amount of the files in the sqlite db? My reasoning being that if a SD was compromised, someone could put arbitrarily large messages into it so that all clients download them and break. Once the clients are clogged up with large messages, there might not be an easy way for a user to go in and delete them. This might require an admin to execute sqlite commands or blow away the whole database.

@heartsucker heartsucker moved this from In Development to Ready for review in SecureDrop Team Board Mar 6, 2019
@eloquence eloquence removed this from Ready for review in SecureDrop Team Board Mar 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants