Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Fix(?) for macos test runner hang #1132

Merged
merged 1 commit into from Apr 20, 2023

Conversation

dairiki
Copy link
Contributor

@dairiki dairiki commented Apr 19, 2023

Our CI tests have been hanging recently.

The macos-latest py3.11 job sometimes hangs during running of the tox tests. When this happens, it gets stuck and just spins until the job is cancelled or times out (after three hours.)
This seems to have started somewhere around f8a4b3f.

It seems to be getting stuck in the tests in test_watcher.py (or possibly in test_videotools.py — the last output in the logs when it get stuck is from test_videotools; test_watcher is the next test file).

I've been flailing on this for a few days now, changing parameters and seeing what triggers the hang.

Currently, all of the following seem to be required for the problem to occur:

  • It only happens on the macOS runner
  • It only happens under python 3.11 (but happens under any of 3.11.1, 3.11.2, or 3.11.3).

Furthermore, diagnosing the issue is hindered by the facts:

  • If pytest is run with the -v (verbose) flag, it doesn't hang.
  • If pytest is run with the -s (don't capture stdout/err) flag, it doesn't hang.
  • If pytest-timeout is used to impose a time-limit on each test (using either the thread or signal timeout method), it doesn't hang.

It also seems heuristically true that:

  • If only one testenv is run during the tox run, it doesn't hang. It always seems to hang during the run of the second testenv that includes the ffmpeg tests.
  • If ffmpeg is not installed, it doesn't hang.
  • If the tests in test_videotools.py are skipped, it doesn't hang.
  • If the tests in test_images.py are skipped, it doesn't hang.
  • If the one test tests/test_watcher.py::TestBasicWatcher::test_creates_observer is disabled, it doesn't hang. (The other tests in test_watcher.py can be skipped, as long a test_creates_observer is still enabled, it hangs.)
  • If the tests using watchdog's fsevents-based native observer are skipped (testing only with PollingObserver) it doesn't hang.
  • If the usage of coverage (to collect coverage data) is omitted, it doesn't hang.
  • If coverage is pinned to <7.0.2, it doesn't hang. [Edit: okay, that's not true. It still hangs sometimes with coverage<7.0.2.]

At this point, I have a vague hunch that the issue has something to do with watchdog's fsevents observer. (This is the only bit of watchdog that uses an extension written in C, and it is only used on macOS.)

But really, I have no idea.

Anyhow, as things stand, pinning coverage on macOS seems to "fix" the issue, at least for now.

So that's what this PR does.

This seems to prevent test hangs that have been happening on the
"macos-latest py3.11" runner.
@dairiki dairiki force-pushed the fix.macos-py311-ci-test-hang branch from 6d6be53 to 25164ea Compare April 19, 2023 21:06
@dairiki
Copy link
Contributor Author

dairiki commented Apr 20, 2023

Well, this might not fix whatever the underlying cause of the hangs is, but I'm going to merge it now anyway.
It's innocuous enough and seems to fix the hangs.

@dairiki dairiki merged commit 7960115 into lektor:master Apr 20, 2023
15 checks passed
@dairiki dairiki deleted the fix.macos-py311-ci-test-hang branch April 22, 2023 02:24
@dairiki
Copy link
Contributor Author

dairiki commented Apr 22, 2023

Not so surprisingly, that was not it.

I've seen the hang happen once now with this pin in place.

Furthermore, now, even removing the pin, and running the tests for exactly the same commit as failed (consistently) a few days ago, it does not hang.

dairiki added a commit to dairiki/lektor that referenced this pull request Apr 22, 2023
This seems possibly no longer necessary to prevent the macOS CI run
from hanging. (See lektor#1132.)

This reverts commit 7960115.
dairiki added a commit that referenced this pull request Apr 22, 2023
* feat(imagetools): use Pillow (instead of imagemagick) for thumbnailing

* feat(imagetools): clean up remnants of imagemagick config

* test(imagetools): add color-space handling test page to test demo project

* fix: get_image_info(): return format=None for short file

Previous `format=`"unknown"` was being returned for short files, while
`None` was being returned for all other unrecognized image files.

* refactor(imagetools): omit sRGB color profile from thumbnails

After some experimentation, it turns out that the old imagemagick code
just discarded any embedded color profile from the source image.

After this commit, we attempt to transform the source image from any
embedded color profile to sRGB, but still omit the sRGB color profile
from the resulting thumbnail.

* refactor(imagetools): make Pillow a required dependency

* fix(imagetools): _combine_make - return model (not make)

If the model contains the make, we want the model as the combined
value.

* test: Test with older versions of Pillow - 6.0.x and 7.0.x

* feat!: convert warning to error when thumbnail dim missing

Since Lektor==3.2.dev0 make_image_thumbnail has been issuing a warning
and falling back to FIT mode when called without both a thumbnail
width and height when the mode other than FIT.

Now we raise ValueError in that case.

* Revert "ci: pin coverage<7.0.2 on macOS (#1132)"

This seems possibly no longer necessary to prevent the macOS CI run
from hanging. (See #1132.)

This reverts commit 7960115.
@dairiki
Copy link
Contributor Author

dairiki commented May 6, 2023

At this point, I have a vague hunch that the issue has something to do with watchdog's fsevents observer. (This is the only bit of watchdog that uses an extension written in C, and it is only used on macOS.)

It's possible that disusing watchdog in favor of watchfiles has fixed this issue.
So far, I've seen no hangs since #1136 was merged.

dairiki added a commit to dairiki/lektor that referenced this pull request Sep 11, 2023
This seems to prevent test hangs that have been happening on the
"macos-latest py3.11" runner.
dairiki added a commit to dairiki/lektor that referenced this pull request Sep 11, 2023
)

* feat(imagetools): use Pillow (instead of imagemagick) for thumbnailing

* feat(imagetools): clean up remnants of imagemagick config

* test(imagetools): add color-space handling test page to test demo project

* fix: get_image_info(): return format=None for short file

Previous `format=`"unknown"` was being returned for short files, while
`None` was being returned for all other unrecognized image files.

* refactor(imagetools): omit sRGB color profile from thumbnails

After some experimentation, it turns out that the old imagemagick code
just discarded any embedded color profile from the source image.

After this commit, we attempt to transform the source image from any
embedded color profile to sRGB, but still omit the sRGB color profile
from the resulting thumbnail.

* refactor(imagetools): make Pillow a required dependency

* fix(imagetools): _combine_make - return model (not make)

If the model contains the make, we want the model as the combined
value.

* test: Test with older versions of Pillow - 6.0.x and 7.0.x

* feat!: convert warning to error when thumbnail dim missing

Since Lektor==3.2.dev0 make_image_thumbnail has been issuing a warning
and falling back to FIT mode when called without both a thumbnail
width and height when the mode other than FIT.

Now we raise ValueError in that case.

* Revert "ci: pin coverage<7.0.2 on macOS (lektor#1132)"

This seems possibly no longer necessary to prevent the macOS CI run
from hanging. (See lektor#1132.)

This reverts commit 7960115.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant