Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etag header set by Journalist API is not sha256sum of file #4032

Closed
emkll opened this issue Jan 14, 2019 · 1 comment · Fixed by #4314
Closed

Etag header set by Journalist API is not sha256sum of file #4032

emkll opened this issue Jan 14, 2019 · 1 comment · Fixed by #4314
Assignees

Comments

@emkll
Copy link
Contributor

emkll commented Jan 14, 2019

Description

Etag header of file download files for Journalist API (https://github.com/freedomofpress/securedrop/blob/develop/securedrop/journalist_app/utils.py#L337) always returns sha256sum:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855. This the sha256sum of an empty string. This is due response.get_data() returning an empty object.

Steps to Reproduce

  1. Upload a file or send a message to the source interface
  2. Set up admin account and use the Journalist API to retrieve files:

a. Ensure you aren't using a staging environment or remove Header unset etag directive from /etc/apache2/sites-available/journalist.conf and restart Apache2
b. curl -I <download_url_of_file> and retain the value of header Etag: sha256sum:<SHA256sum goes here>
c. curl -O <download_url_of_file> and sha256sum download. Observe the hash is different from the previous step
d. echo -ne "" | sha256sum and observe the hash is identical to the etags above

Expected Behavior

  1. Etag value should be sha256 sum of file

Actual Behavior

  1. Etag value is sha256sum of an empty string

Comments

It makes sense that the response is empty because the file is sent as attachment: https://github.com/freedomofpress/securedrop/blob/develop/securedrop/journalist_app/utils.py#L333

Since the hash is computed every time a file is downloaded, it might use significant amount of resources server-side if there are large files that are downloaded at the time time. We should consider hashing the files at creation time, and storing the hash values in the database. This will also allow us to verify file integrity (e.g. when restoring backups)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants