Bug 1626086: Wait for tasks to complete before shutting down by mdboom · Pull Request #798 · mozilla/glean

mdboom · 2020-03-31T19:42:20Z

No description provided.

wlach

Looks pretty good to me. Should we add a note about this behaviour to the SDK documentation?

glean-core/python/glean/_dispatcher.py

mdboom · 2020-03-31T20:36:05Z

Looks pretty good to me. Should we add a note about this behaviour to the SDK documentation?

Yeah. On some level, this is like a bugfix on an implementation detail that's documented here: https://mozilla.github.io/glean/book/user/adding-glean-to-your-project.html#parallelism

But it's just tricky/weird enough that Python developers probably want to know this is happening. Let me add a little bit in the Python-specific documentation.

Dexterp37 · 2020-04-01T06:15:40Z

docs/user/adding-glean-to-your-project.md

+
+Glean installs an [`atexit` handler](https://docs.python.org/3/library/atexit.html) so the Glean thread can attempt to cleanly shut down when your application exits.
+This handler will wait up to 1 second for any pending work to complete.
+If that times out, some Glean work may be lost.


@mdboom , on other platforms I/O and networking happen on a thread that's separate from the "dispatcher" queue for API calls.

If we did something similar here, thus being consistent with Kotlin/Swift, we'd mitigate data loss: API/ping serialization are very likely to finish < 1s, networking operations are more likely to take more time on weird network conditions.

Maybe this can be a follow-up?

Yep -- I have this bug for the follow-up: https://bugzilla.mozilla.org/show_bug.cgi?id=1626403

In the case of Python, I think we'll use a separate process (rather than a thread) so that networking can continue even if the main process finishes, but it's the same concept. My only hesitation on the follow-up is whether to do it before or after the Rust networking refactor lands.

Will the work actually be lost? Or would Glean be smart enough to resend it the next time the command / application is invoked?

Oh, yeah, of course. If networking is off, say, the ping file won't be deleted and a reattempt will be made next time. So pretty low risk.

I'd suggest mentioning that, since the current wording is a little scary.

In general, right now, I think there's a minor chance of loosing a small fraction of data: since I/O and networking are happening on the same dispatcher, if any work is queued after this lengthy operations and the timeout is hit, the work queued after is lost, I think.

If B is a lengthy task on which we hit the 1s timeout, then the following can happen:

Queued tasks: A-B (timeout) - C (lost)- D (lost) - E (lost)

Dexterp37 · 2020-04-02T05:58:58Z

docs/user/adding-glean-to-your-project.md

+
+Glean installs an [`atexit` handler](https://docs.python.org/3/library/atexit.html) so the Glean thread can attempt to cleanly shut down when your application exits.
+This handler will wait up to 1 second for any pending work to complete.
+If that times out, some Glean work may be lost.


In general, right now, I think there's a minor chance of loosing a small fraction of data: since I/O and networking are happening on the same dispatcher, if any work is queued after this lengthy operations and the timeout is hit, the work queued after is lost, I think.

If B is a lengthy task on which we hit the 1s timeout, then the following can happen:

Queued tasks: A-B (timeout) - C (lost)- D (lost) - E (lost)

mdboom · 2020-04-02T16:28:59Z

I went ahead and made uploading happen in a separate process. I think the pending Rust refactor on that won't impact this too much after all.

That way, I don't think we need the scary warning, since it's become a really unlikely corner case.

wlach

As I mentioned here, I'm worried that multiprocessing might have issues with the setup I'm using for mozregression on Windows. See this very long thread on stackoverflow, for example: https://stackoverflow.com/questions/24944558/pyinstaller-built-windows-exe-fails-with-multiprocessing

Can you make it optional?

mdboom · 2020-04-02T18:54:45Z

As I mentioned here, I'm worried that multiprocessing might have issues with the setup I'm using for mozregression on Windows. See this very long thread on stackoverflow, for example: https://stackoverflow.com/questions/24944558/pyinstaller-built-windows-exe-fails-with-multiprocessing

Can you make it optional?

Gah. That's horrifying, but not surprising when you think it all through. It's even failing our Windows tests here (which aren't doing any PyInstaller magic).

Assuming I can figure out what's going on with the Windows unit tests, making it optional through a allow_multiprocessing config flag or something could work.

If not, it's easy enough for me to take this last commit out and merge this with the multithreading, without the multiprocessing. That's at least an 80% solution.

Once the refactor to move a lot of the upload processing to Rust is complete, it will be pretty easy to write a pure-Rust ping uploader, which might avoid some of these issues (assuming we can run a binary executable inside of a PyInstaller archive).

mdboom · 2020-04-02T19:55:27Z

Fixed the basic Windows failures, and added a configuration option to make the multiprocessing optional.

wlach

I have some concerns about mixing the test code with the implementation, but aside from that this is looking pretty good to me.

CHANGELOG.md

docs/user/adding-glean-to-your-project.md

wlach · 2020-04-02T21:23:25Z

glean-core/python/glean/net/ping_upload_worker.py

-        """
-        Processes a single ping file.
-        """
+    def _do_process_async(cls) -> "multiprocessing.Process":


Is this the right name for the function? It seems like it's more about doing things in a seperate process than doing things async.

Maybe _process_pings_multiprocessing?

wlach · 2020-04-03T15:20:11Z

glean-core/python/glean/net/ping_upload_worker.py

+        return _process(cls.storage_directory(), Glean._configuration)
+
+    @classmethod
+    def _test_process_sync(cls) -> bool:


This feels like something that should live in a test or test fixture, rather than in the implementation. In this case I'd personally just create a small helper function inside the test file itself.

This needs to be here because it needs to run when the global _testing_mode flag is set to True. _testing_mode is required even for Glean's users writing their own unit tests involving Glean, so it has to be in code shipped in the library.

Ah, I see. Seems a little odd to me, but maybe there's no other way.

Co-Authored-By: William Lachance <wrlach@gmail.com>

auto-assign bot requested a review from badboy March 31, 2020 19:42

mdboom requested a review from wlach March 31, 2020 19:42

wlach reviewed Mar 31, 2020

View reviewed changes

glean-core/python/glean/_dispatcher.py Show resolved Hide resolved

Dexterp37 reviewed Apr 1, 2020

View reviewed changes

mdboom requested review from Dexterp37 and wlach April 1, 2020 12:49

wlach approved these changes Apr 1, 2020

View reviewed changes

Dexterp37 approved these changes Apr 2, 2020

View reviewed changes

mdboom force-pushed the python-shutdown-cleanly branch from e65860a to bedd50c Compare April 2, 2020 16:28

mdboom requested review from Dexterp37 and wlach April 2, 2020 16:29

mdboom force-pushed the python-shutdown-cleanly branch 2 times, most recently from 4db5aa7 to 15d704a Compare April 2, 2020 16:30

wlach suggested changes Apr 2, 2020

View reviewed changes

mdboom requested a review from wlach April 2, 2020 19:54

mdboom added 8 commits April 3, 2020 07:57

Bug 1626086: Wait for tasks to complete before shutting down

8ba378e

lint

74b7e5d

Add comment

16225a7

Add documentation about Python parallelism

aac3636

Do ping uploading in a separate process

e4e777b

Fix test on Windows

dc2ebd7

Refactor to reduce duplication

1d045e1

Make multiprocessing optional

adc3d6e

mdboom force-pushed the python-shutdown-cleanly branch from 9c27b3a to adc3d6e Compare April 3, 2020 11:57

wlach suggested changes Apr 3, 2020

View reviewed changes

Update docs/user/adding-glean-to-your-project.md

20a8bb4

Co-Authored-By: William Lachance <wrlach@gmail.com>

mdboom and others added 4 commits April 3, 2020 12:21

Update docs/user/adding-glean-to-your-project.md

fb9eda0

Co-Authored-By: William Lachance <wrlach@gmail.com>

Make it clear in the CHANGELOG that multiprocessing is optional

31c73ec

Rename methods

da5a98f

spellcheck

54723e7

mdboom merged commit fd8f771 into mozilla:master Apr 6, 2020

mdboom deleted the python-shutdown-cleanly branch April 14, 2020 19:09

Conversation

mdboom commented Mar 31, 2020

Uh oh!

wlach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mdboom commented Mar 31, 2020

Uh oh!

Dexterp37 Apr 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdboom commented Apr 2, 2020

Uh oh!

wlach left a comment

Choose a reason for hiding this comment

Uh oh!

mdboom commented Apr 2, 2020

Uh oh!

mdboom commented Apr 2, 2020

Uh oh!

wlach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dexterp37 Apr 1, 2020 •

edited

Loading