-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download messages in the background, update DB, update UI #99
Conversation
…into download-messages
…s/securedrop-client into download-messages
…s/securedrop-client into download-messages
…s/securedrop-client into download-messages
…into download-messages
…into download-messages
@@ -147,6 +147,9 @@ def start_app(args, qt_args) -> None: | |||
configure_logging(args.sdc_home) | |||
logging.info('Starting SecureDrop Client {}'.format(__version__)) | |||
|
|||
# init data singleton | |||
Data(args.sdc_home) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly think we should avoid singleton / global patterns as this will complicate testing. I would much rather have us use injection from the top level for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also rather use injection, but it's a little complicated. To enable accessor patterns on sqlalchemy objects, like submission.content
where .content
points to the content of the downloaded submission (which is held in <home>/data/<submission.filename>
, not in the database), the object instantiated by sqlalchemy needs some knowledge about the runtime state of the system: namely, the value of home
. By knowing home
, the object knows where to look for downloaded files, and can find the downloaded content of itself.
The data
singleton is just a further abstraction of that idea, of course- it's essentially a small wrapper around a single piece of global state: the value of home
.
Since we're not instantiating our own objects in these cases (ie, the Submission
object is instantiated by sqlalchemy, not by our own code), we don't really have the option to inject that bit of runtime state at object-creation time from the callsite.
To avoid a global/singleton, we can manually set that runtime state via <instance>.data = <data object>
(or similar) in all cases where we want to set or access .content
in message, reply, or file objects. Since we're only using .content
in one place so far, that would be pretty easy- I'll give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh we might be able to be clever with sqlalchemy's "ORM Events" https://docs.sqlalchemy.org/en/latest/orm/events.html. It's an exciting morning over here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut says that FS thnigs should be handled by an injected Storage
class and you pass it a filename or something and it looks it up. It's not as simple as the accessor pattern, but it would de-spaghetti the code some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, my gut says that, too! The hard part is that content is accessed (via accessors) in the GUI classes, which shouldn't have to know anything about the storage layer. So you have to get the content "into" the objects before they're accessed in the GUI code... so to a certain extent, we're stuck with the accessor pattern, like it or not. Unless we break our data abstractions and let the GUI call Storage.get(submission.filename))
we're sorta stuck, as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GUI classes do have a self.controller
reference, so we could add a method to the Client
class like get_object_content(self, thing-that-could-have-content)
which could be executed by GUI, but that also feels like a data abstraction violation to me. Bah, no great answers :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey folks... this is a really interesting discussion. :-)
I'm with @heartsucker in outlook about singletons.
Also, why the long-running thread rather than using a QTimer based fire-and-forget sync call (i.e. just re-using the code we already have)..? This "feels" (note the quotes) simpler to me, and I'm anxious to retain simplicity and re-use what we already have.
I'd strongly recommend keeping the logic and UI separate. (For instance, the logic layer uses a QTimer to refresh on new message state and call the UI layer if there's a change.)
Also, I'm painfully aware that the above comments are based on my limited understanding of the aim of the ticket... it is very likely that I have the wrong end of the stick. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. The singleton's gone in favor of the sqlalchemy ORM Event approach! Hooray.
The long-running (which is really an always-running, separate thread) has to do with my preference to separate concerns, and for react/redux/re-frame -style functional reactive programming (particularly, triggering actions based on changes in application state, not as a side-effect of other actions). In our current data model we can't really subscribe to DB changes, so this simulates it by periodically polling the DB and looking for changes that the downloader would be interested in: the existence of not-downloaded submissions.
In this case, if we can guarantee that at any time there will be not more than one of those threads running, then there's not much practical difference between the approach that's implemented here vs periodically running a fire-and-forget thread which would do the exact same thing (examine the database for not-download submissions, then download them), aside from the decoupling of concerns: as-implemented, the "main" thread has nothing to do with coordinating message download.
That patterns can also extend to Reply and File downloads: unlike get_remote_data
, which (absent a two-way channel to the server) must be triggered by a client-internal event (a timer expiring, the user clicking the refresh button), the various download actions can all be independent and reactive to the state of the database changing, and do not have to subscribe to events in the main client thread. In my eyes, that's conceptually simpler than asking the main thread to coordinate download threads (aside from their initial spawning at client init time).
All that said, I agree that our priority should be clarity and simplicity for all involved, so I'd be happy to port this to a QTimer-based approach (or port other actions to this one!)
ok @heartsucker that latest commit removes the data singleton, and instead uses sqlalchemy's event system to hook in to code in no tests yet, i'll wait to see what others think of this pattern. |
That's an interesting pattern. It's a bit too late here for me to dissect it, so I'm going to dismiss my review to unblock this for today in case anyone wants to review. |
tests/test_logic.py
Outdated
@@ -756,6 +772,9 @@ def test_Client_on_download_timeout(safe_tmpdir): | |||
"The connection to the SecureDrop server timed out. Please try again.") | |||
|
|||
|
|||
# This can be unfailed when this SDK change is merged and released: | |||
# https://github.com/freedomofpress/securedrop-sdk/pull/32 | |||
@pytest.mark.xfail(reason="needs SDK change merged") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this can be removed as that change was merged and released
securedrop_client/message_sync.py
Outdated
shasum, filepath = self.api.download_submission_from_url(m.download_url) | ||
out = tempfile.NamedTemporaryFile(suffix=".message") | ||
err = tempfile.NamedTemporaryFile(suffix=".message-error", delete=False) | ||
cmd = ["qubes-gpg-client", "--decrypt", filepath] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should make a followup targeted alpha-stretch for trying to use python-gnupg
for these decryption actions (see approach in this branch) as that will do some of this logic for us - but let's leave this as is for now
this is working super well for me in Qubes, here are the following issues I think we can handle in followups:
one blocker is to make this platform independent to enable development on non-Qubes, which can be done with the following diff (I know you're working on the tests here so I will wait to push this): diff --git a/securedrop_client/logic.py b/securedrop_client/logic.py
index 136a8c6..a25068c 100644
--- a/securedrop_client/logic.py
+++ b/securedrop_client/logic.py
@@ -227,7 +227,7 @@ class Client(QObject):
"""
if not self.message_thread:
self.message_thread = QThread()
- self.message_sync = MessageSync(self.api, self.home)
+ self.message_sync = MessageSync(self.api, self.home, self.proxy)
self.message_sync.moveToThread(self.message_thread)
self.message_thread.started.connect(self.message_sync.run)
self.message_thread.start()
diff --git a/securedrop_client/message_sync.py b/securedrop_client/message_sync.py
index d0096c2..2d220f2 100644
--- a/securedrop_client/message_sync.py
+++ b/securedrop_client/message_sync.py
@@ -41,7 +41,7 @@ class MessageSync(QObject):
Runs in the background, finding messages to download and downloading them.
"""
- def __init__(self, api, home):
+ def __init__(self, api, home, is_qubes):
super().__init__()
engine = make_engine(home)
@@ -49,6 +49,7 @@ class MessageSync(QObject):
self.session = Session() # Reference to the SqlAlchemy session.
self.api = api
self.home = home
+ self.is_qubes = is_qubes
def run(self, loop=True):
while True:
@@ -62,10 +63,15 @@ class MessageSync(QObject):
uuid=m.uuid
)
sdk_submission.source_uuid = m.source.uuid
+ sdk_submission.filename = m.filename # Needed for non-Qubes platforms
shasum, filepath = self.api.download_submission(sdk_submission)
out = tempfile.NamedTemporaryFile(suffix=".message")
err = tempfile.NamedTemporaryFile(suffix=".message-error", delete=False)
- cmd = ["qubes-gpg-client", "--decrypt", filepath]
+ if self.is_qubes:
+ gpg_binary = "qubes-gpg-client"
+ else:
+ gpg_binary = "gpg"
+ cmd = [gpg_binary, "--decrypt", filepath]
res = subprocess.call(cmd, stdout=out, stderr=err)
os.unlink(filepath) # original file |
Tested this branch in Qubes with master of securedrop-workstation and securedrop-sdk. After the initial sync, new messages are downloaded and decrypted only on clicking Refresh, once for the download and then once again for the refresh (as far as I can tell). It's not clear to me from the PR description whether or not the client GUI should update periodically as well, or if the DB is updated in the background and the Refresh button needs to be clicked? |
@zenmonkeykstop thanks for QAing this - good flag, right now the GUI will not automatically update as messages complete decryption, will file a followup for this |
We need to support developers on both Qubes and non-Qubes platforms. This commit makes minimal changes to enable the message functionality to work in both environments.
Hi @zenmonkeykstop! Maybe I can help clear up your questions. On client startup, we do an initial synchronization call. This fetches source and submission metadata, and generally seems to take about 10 seconds. When that call finishes, you should see source names appear in the left column. If you were to click a source name immediately, you'd see "bubbles" for all parts of the conversation, but not actual message content (instead, you should see placeholder text for both sides of the conversation). Within 5 seconds, the client will start downloading source messages that appear in its database (those new messages whose metadata was returned from the sync call). The UI will not update automatically as source messages are downloaded and processed. To show any newly processed messages, you can click a different source name, then click back to the source you were looking at. Messages that had been newly downloaded and processed since you first clicked that source will now appear. (Clearly this isn't a great UX. It shouldn't be too hard to make messages appear dynamically as they're downloaded). Currently, we run a remote sync every 5 minutes. Remember, the sync call only checks for metadata about submissions on the server, and it updates the client-local DB with new messages it's found. The message download process is in a separate thread which is always running, serially downloading messages it finds in the client database that have not yet been downloaded and processed. The distinction between the long-running download thread and the timer-based sync process is part of what @ntoll and I were discussing above. In any case, the "refresh" button only triggers an immediate (metadata) sync event- it doesn't do anything directly to the message download thread, and doesn't do anything directly to the UI. It's useful if you've added messages to the server via the source interface and want to see those messages appear in the client without waiting for the 5 minute timer to expire, but it doesn't refresh the UI on its own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @joshuathayer!
I'll file a few followups for the items that were discussed during this review and we can tackle those next, but let's get these changes in so we can build on this 🚀
…0.3.0 Bump version number and add changelog entry for 0.3.0
ci: increase build timeout to 20 minutes
Adds a background thread to download messages from the SecureDrop server, decrypt those messages, store decrypted messages in the data directory, and indicate in the DB that the file was downloaded.
Also adds messages to the UI.
This also adds the beginnings of an abstraction for the data directory: it's represented by a singleton that's instantiated initially with the data directory's path. This class should be extended to include a
set
function, and if we like this abstraction we should use it elsewhere in the client code.The message download thread runs in the background for as long as the client is running. It periodically queries the database for messages which have not yet been downloaded, serially downloads and processes those messages, then updates the database to indicate they've been downloaded.
Requires this securedrop-sdk change, too: freedomofpress/securedrop-sdk#39 (which requires some changes to get tests to pass, apparently. Also I don't seem to have write privs on that repo!)There's still some cleanup to do in this code, but I wanted to get a PR up sooner rather than later. Feel free to pick it apart!
Closes #40