Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of queries for unseen submissions #5872

Merged
merged 1 commit into from
Mar 23, 2021

Conversation

rmol
Copy link
Contributor

@rmol rmol commented Mar 16, 2021

Status

Ready for review

Description of Changes

With large numbers of sources or submissions, the journalist interface index page queries could exceed the 60-second Apache timeout. This uses joins instead of the n+1 checks for submission seen status, both there and when downloading unread submissions.

Testing

In a SecureDrop 1.8.0 environment with Apache (staging, hardware, prod VMs), create a large number of sources and submissions. Running loaddata.py --source-count 1000 is enough to demonstrate the performance change. If you want to run into the Apache timeout, increase the number of sources, or add more messages and files to each source.

Then log in to the journalist interface and load the index page several times, watching the network timing in Tor Browser's developer tools. With 1000 sources, the page should probably take around 20-30 seconds to load.

To confirm the cause:

  • edit config.py to add SQLALCHEMY_RECORD_QUERIES = True to the FlaskConfig.
  • edit journalist_app/main.py to make the end of index look like this:
        response = render_template("index.html", unstarred=unstarred, starred=starred)
        from flask_sqlalchemy import get_debug_queries
        for query in get_debug_queries():
            current_app.logger.critical("QUERY TOOK %f: %s", query.duration, query.statement)
        return response

Restarting Apache and reloading the index page should spam /var/log/apache2/journalist-error.log with a lot of this:

[Tue Mar 16 15:21:39.601133 2021] [wsgi:error] [pid 172290:tid 117089880090368] [remote 127.0.0.1:56342] CRITICAL:flask.app:QUERY TOOK 0.000117: SELECT submissions.id AS submissions_id, submissions.uuid AS submissions_uuid, submissions.source_id AS submi
ssions_source_id, submissions.filename AS submissions_filename, submissions.size AS submissions_size, submissions.downloaded AS submissions_downloaded, submissions.checksum AS submissions_checksum 
[Tue Mar 16 15:21:39.601143 2021] [wsgi:error] [pid 172290:tid 117089880090368] [remote 127.0.0.1:56342] FROM submissions 
[Tue Mar 16 15:21:39.601148 2021] [wsgi:error] [pid 172290:tid 117089880090368] [remote 127.0.0.1:56342] WHERE ? = submissions.source_id ORDER BY submissions.id

Now, check out this branch and either build debs and install securedrop-app-code on the app server, or just copy the two files changed in this PR to /var/www/securedrop and restart Apache. Again, reload the journalist index page several times. It should load in much less time, around 2-4 seconds in my testing. If you change the end of index again, /var/log/apache2/journalist-error.log should only contain a few queries for each load of the page.

The same query strategy change was made to the download of unread items. Ensure that you have a few sources with unread items, and confirm that you can properly download unread items for a single source via the {num} unread button in its row, or multiple selected sources and the Download Unread button at the top of the page. Make sure to also try downloading unread items for a source or sources with no unread items.

Deployment

No special considerations.

Checklist

If you made changes to the server application code:

  • Linting (make lint) and tests (make test) pass in the development container

If you made non-trivial code changes:

  • I have written a test plan and validated it for this PR

Choose one of the following:

  • I have opened a PR in the docs repo for these changes, or will do so later
  • I would appreciate help with the documentation
  • These changes do not require documentation

@lgtm-com
Copy link

lgtm-com bot commented Mar 16, 2021

This pull request introduces 6 alerts when merging d1e5951 into f81b937 - view on LGTM.com

new alerts:

  • 6 for Testing equality to None

With large numbers of sources or submissions, the journalist interface
index page queries could exceed the 60-second Apache timeout. This
uses joins instead of the n+1 checks for submission seen status, both
there and when downloading unread submissions.
@rmol rmol self-assigned this Mar 17, 2021
@kushaldas kushaldas self-assigned this Mar 23, 2021
Copy link
Contributor

@kushaldas kushaldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Verified the page load speed improvement (amazing)
  • Can download unread submissions from one source
  • Can download unread submissions from multiple sources
  • Can try to download but fails to download from sources with no unread submission (via download unread button).
    The changes looks okay to me. Approved.

@kushaldas kushaldas merged commit 011a50a into develop Mar 23, 2021
@kushaldas kushaldas deleted the speed-up-ji-index branch March 23, 2021 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants