Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download messages in the background, update DB, update UI #99

Merged
merged 28 commits into from Nov 6, 2018

Conversation

joshuathayer
Copy link
Contributor

@joshuathayer joshuathayer commented Nov 2, 2018

Adds a background thread to download messages from the SecureDrop server, decrypt those messages, store decrypted messages in the data directory, and indicate in the DB that the file was downloaded.

Also adds messages to the UI.

This also adds the beginnings of an abstraction for the data directory: it's represented by a singleton that's instantiated initially with the data directory's path. This class should be extended to include a set function, and if we like this abstraction we should use it elsewhere in the client code.

The message download thread runs in the background for as long as the client is running. It periodically queries the database for messages which have not yet been downloaded, serially downloads and processes those messages, then updates the database to indicate they've been downloaded.

Requires this securedrop-sdk change, too: freedomofpress/securedrop-sdk#39 (which requires some changes to get tests to pass, apparently. Also I don't seem to have write privs on that repo!)

There's still some cleanup to do in this code, but I wanted to get a PR up sooner rather than later. Feel free to pick it apart!

Closes #40

@@ -147,6 +147,9 @@ def start_app(args, qt_args) -> None:
configure_logging(args.sdc_home)
logging.info('Starting SecureDrop Client {}'.format(__version__))

# init data singleton
Data(args.sdc_home)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly think we should avoid singleton / global patterns as this will complicate testing. I would much rather have us use injection from the top level for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also rather use injection, but it's a little complicated. To enable accessor patterns on sqlalchemy objects, like submission.content where .content points to the content of the downloaded submission (which is held in <home>/data/<submission.filename>, not in the database), the object instantiated by sqlalchemy needs some knowledge about the runtime state of the system: namely, the value of home. By knowing home, the object knows where to look for downloaded files, and can find the downloaded content of itself.

The data singleton is just a further abstraction of that idea, of course- it's essentially a small wrapper around a single piece of global state: the value of home.

Since we're not instantiating our own objects in these cases (ie, the Submission object is instantiated by sqlalchemy, not by our own code), we don't really have the option to inject that bit of runtime state at object-creation time from the callsite.

To avoid a global/singleton, we can manually set that runtime state via <instance>.data = <data object> (or similar) in all cases where we want to set or access .content in message, reply, or file objects. Since we're only using .content in one place so far, that would be pretty easy- I'll give it a shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh we might be able to be clever with sqlalchemy's "ORM Events" https://docs.sqlalchemy.org/en/latest/orm/events.html. It's an exciting morning over here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut says that FS thnigs should be handled by an injected Storage class and you pass it a filename or something and it looks it up. It's not as simple as the accessor pattern, but it would de-spaghetti the code some.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, my gut says that, too! The hard part is that content is accessed (via accessors) in the GUI classes, which shouldn't have to know anything about the storage layer. So you have to get the content "into" the objects before they're accessed in the GUI code... so to a certain extent, we're stuck with the accessor pattern, like it or not. Unless we break our data abstractions and let the GUI call Storage.get(submission.filename)) we're sorta stuck, as far as I can tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GUI classes do have a self.controller reference, so we could add a method to the Client class like get_object_content(self, thing-that-could-have-content) which could be executed by GUI, but that also feels like a data abstraction violation to me. Bah, no great answers :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey folks... this is a really interesting discussion. :-)

I'm with @heartsucker in outlook about singletons.

Also, why the long-running thread rather than using a QTimer based fire-and-forget sync call (i.e. just re-using the code we already have)..? This "feels" (note the quotes) simpler to me, and I'm anxious to retain simplicity and re-use what we already have.

I'd strongly recommend keeping the logic and UI separate. (For instance, the logic layer uses a QTimer to refresh on new message state and call the UI layer if there's a change.)

Also, I'm painfully aware that the above comments are based on my limited understanding of the aim of the ticket... it is very likely that I have the wrong end of the stick. ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. The singleton's gone in favor of the sqlalchemy ORM Event approach! Hooray.

The long-running (which is really an always-running, separate thread) has to do with my preference to separate concerns, and for react/redux/re-frame -style functional reactive programming (particularly, triggering actions based on changes in application state, not as a side-effect of other actions). In our current data model we can't really subscribe to DB changes, so this simulates it by periodically polling the DB and looking for changes that the downloader would be interested in: the existence of not-downloaded submissions.

In this case, if we can guarantee that at any time there will be not more than one of those threads running, then there's not much practical difference between the approach that's implemented here vs periodically running a fire-and-forget thread which would do the exact same thing (examine the database for not-download submissions, then download them), aside from the decoupling of concerns: as-implemented, the "main" thread has nothing to do with coordinating message download.

That patterns can also extend to Reply and File downloads: unlike get_remote_data, which (absent a two-way channel to the server) must be triggered by a client-internal event (a timer expiring, the user clicking the refresh button), the various download actions can all be independent and reactive to the state of the database changing, and do not have to subscribe to events in the main client thread. In my eyes, that's conceptually simpler than asking the main thread to coordinate download threads (aside from their initial spawning at client init time).

All that said, I agree that our priority should be clarity and simplicity for all involved, so I'd be happy to port this to a QTimer-based approach (or port other actions to this one!)

@joshuathayer
Copy link
Contributor Author

ok @heartsucker that latest commit removes the data singleton, and instead uses sqlalchemy's event system to hook in to code in logic.py to add self.data (which closes over the runtime value for home) at both manual- and sqlalchemy-triggered object instantiation time of the Submission object. The same pattern can be extended to Reply and any other sqlalchemy-mediated objects which need access to the data directory.

no tests yet, i'll wait to see what others think of this pattern.

@heartsucker
Copy link
Contributor

That's an interesting pattern. It's a bit too late here for me to dissect it, so I'm going to dismiss my review to unblock this for today in case anyone wants to review.

@heartsucker heartsucker dismissed their stale review November 2, 2018 17:39

see previous comment

@eloquence eloquence added this to In Development in SecureDrop Team Board Nov 5, 2018
@eloquence eloquence moved this from In Development to Ready for review in SecureDrop Team Board Nov 5, 2018
@@ -756,6 +772,9 @@ def test_Client_on_download_timeout(safe_tmpdir):
"The connection to the SecureDrop server timed out. Please try again.")


# This can be unfailed when this SDK change is merged and released:
# https://github.com/freedomofpress/securedrop-sdk/pull/32
@pytest.mark.xfail(reason="needs SDK change merged")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this can be removed as that change was merged and released

shasum, filepath = self.api.download_submission_from_url(m.download_url)
out = tempfile.NamedTemporaryFile(suffix=".message")
err = tempfile.NamedTemporaryFile(suffix=".message-error", delete=False)
cmd = ["qubes-gpg-client", "--decrypt", filepath]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should make a followup targeted alpha-stretch for trying to use python-gnupg for these decryption actions (see approach in this branch) as that will do some of this logic for us - but let's leave this as is for now

@redshiftzero
Copy link
Contributor

this is working super well for me in Qubes, here are the following issues I think we can handle in followups:

  • Add Replies to this (alpha)
  • Refresh conversation currently viewed after messages download (alpha)
  • Fancier decryption-in-progress UX for users, i.e. message instead of saying "Decryption in progress" has some kind of UI feedback action so that a user knows Things are Happening (beta)
  • Refactor message thread to use new concurrent API request handling (alpha)
  • Consider adding a Storage class (a bit of experimentation needed here) (beta)
  • Use python-gnupg for gpg operations (alpha stretch)

one blocker is to make this platform independent to enable development on non-Qubes, which can be done with the following diff (I know you're working on the tests here so I will wait to push this):

diff --git a/securedrop_client/logic.py b/securedrop_client/logic.py
index 136a8c6..a25068c 100644
--- a/securedrop_client/logic.py
+++ b/securedrop_client/logic.py
@@ -227,7 +227,7 @@ class Client(QObject):
         """
         if not self.message_thread:
             self.message_thread = QThread()
-            self.message_sync = MessageSync(self.api, self.home)
+            self.message_sync = MessageSync(self.api, self.home, self.proxy)
             self.message_sync.moveToThread(self.message_thread)
             self.message_thread.started.connect(self.message_sync.run)
             self.message_thread.start()
diff --git a/securedrop_client/message_sync.py b/securedrop_client/message_sync.py
index d0096c2..2d220f2 100644
--- a/securedrop_client/message_sync.py
+++ b/securedrop_client/message_sync.py
@@ -41,7 +41,7 @@ class MessageSync(QObject):
     Runs in the background, finding messages to download and downloading them.
     """

-    def __init__(self, api, home):
+    def __init__(self, api, home, is_qubes):
         super().__init__()

         engine = make_engine(home)
@@ -49,6 +49,7 @@ class MessageSync(QObject):
         self.session = Session() # Reference to the SqlAlchemy session.
         self.api = api
         self.home = home
+        self.is_qubes = is_qubes

     def run(self, loop=True):
         while True:
@@ -62,10 +63,15 @@ class MessageSync(QObject):
                     uuid=m.uuid
                 )
                 sdk_submission.source_uuid = m.source.uuid
+                sdk_submission.filename = m.filename  # Needed for non-Qubes platforms
                 shasum, filepath = self.api.download_submission(sdk_submission)
                 out = tempfile.NamedTemporaryFile(suffix=".message")
                 err = tempfile.NamedTemporaryFile(suffix=".message-error", delete=False)
-                cmd = ["qubes-gpg-client", "--decrypt", filepath]
+                if self.is_qubes:
+                    gpg_binary = "qubes-gpg-client"
+                else:
+                    gpg_binary = "gpg"
+                cmd = [gpg_binary, "--decrypt", filepath]
                 res = subprocess.call(cmd, stdout=out, stderr=err)

                 os.unlink(filepath) # original file

@redshiftzero redshiftzero moved this from Ready for review to Under Review in SecureDrop Team Board Nov 5, 2018
@zenmonkeykstop
Copy link
Contributor

Tested this branch in Qubes with master of securedrop-workstation and securedrop-sdk. After the initial sync, new messages are downloaded and decrypted only on clicking Refresh, once for the download and then once again for the refresh (as far as I can tell).

It's not clear to me from the PR description whether or not the client GUI should update periodically as well, or if the DB is updated in the background and the Refresh button needs to be clicked?

@redshiftzero
Copy link
Contributor

@zenmonkeykstop thanks for QAing this - good flag, right now the GUI will not automatically update as messages complete decryption, will file a followup for this

We need to support developers on both Qubes and non-Qubes
platforms. This commit makes minimal changes to enable the
message functionality to work in both environments.
@joshuathayer
Copy link
Contributor Author

Hi @zenmonkeykstop! Maybe I can help clear up your questions.

On client startup, we do an initial synchronization call. This fetches source and submission metadata, and generally seems to take about 10 seconds. When that call finishes, you should see source names appear in the left column. If you were to click a source name immediately, you'd see "bubbles" for all parts of the conversation, but not actual message content (instead, you should see placeholder text for both sides of the conversation).

Within 5 seconds, the client will start downloading source messages that appear in its database (those new messages whose metadata was returned from the sync call). The UI will not update automatically as source messages are downloaded and processed. To show any newly processed messages, you can click a different source name, then click back to the source you were looking at. Messages that had been newly downloaded and processed since you first clicked that source will now appear. (Clearly this isn't a great UX. It shouldn't be too hard to make messages appear dynamically as they're downloaded).

Currently, we run a remote sync every 5 minutes. Remember, the sync call only checks for metadata about submissions on the server, and it updates the client-local DB with new messages it's found. The message download process is in a separate thread which is always running, serially downloading messages it finds in the client database that have not yet been downloaded and processed. The distinction between the long-running download thread and the timer-based sync process is part of what @ntoll and I were discussing above.

In any case, the "refresh" button only triggers an immediate (metadata) sync event- it doesn't do anything directly to the message download thread, and doesn't do anything directly to the UI. It's useful if you've added messages to the server via the source interface and want to see those messages appear in the client without waiting for the 5 minute timer to expire, but it doesn't refresh the UI on its own.

Copy link
Contributor

@redshiftzero redshiftzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @joshuathayer!

I'll file a few followups for the items that were discussed during this review and we can tackle those next, but let's get these changes in so we can build on this 🚀

@redshiftzero redshiftzero merged commit 970168e into master Nov 6, 2018
SecureDrop Team Board automation moved this from Under Review to Done Nov 6, 2018
@redshiftzero redshiftzero deleted the download-messages branch November 6, 2018 06:55
@joshuathayer joshuathayer mentioned this pull request Nov 6, 2018
legoktm pushed a commit that referenced this pull request Dec 11, 2023
…0.3.0

Bump version number and add changelog entry for 0.3.0
legoktm pushed a commit that referenced this pull request Dec 15, 2023
ci: increase build timeout to 20 minutes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants