Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xcode install on macOS 13 sometimes fails and then continuously fails on following runs #138238

Closed
1 task done
vashworth opened this issue Nov 10, 2023 · 14 comments
Closed
1 task done
Assignees
Labels
P1 High-priority issues at the top of the work list team-infra Owned by Infrastructure team triaged-infra Triaged by Infrastructure team

Comments

@vashworth
Copy link
Contributor

vashworth commented Nov 10, 2023

Is there an existing issue for this?

Type of Request

bug

Infrastructure Environment

LUCI

What is happening?

Occasionally, Xcode errors to install and then on following runs it will time out when trying to install

Here's example of where the problem starts:
https://luci-milo.appspot.com/ui/p/flutter/builders/try/Mac%20Engine%20Drone/509375/overview

reason: failed to update package permissions in /Volumes/Work/s/w/ir/cache/osx_sdk/xcode_14e300c/XCode.app for ios

Then in following runs, it times out:
https://luci-milo.appspot.com/ui/p/flutter/builders/try/Mac%20Engine%20Drone/509416/overview
Stuck at

Removing the hidden cipd files if exists to be compliant with MacOS13+ codesign check...

Clearing the cache seems to fix.

Steps to reproduce

No response

Expected results

No response

@vashworth vashworth added the team-infra Owned by Infrastructure team label Nov 10, 2023
@godofredoc
Copy link
Contributor

This seems like a mac_toolchain issue trying to change the mode of hidden files and then trying to delete those files on subsequent builds. Is there a new version of mac_toolchain that we can update to?

@vashworth vashworth changed the title Xcode install of macOS 13 sometimes fails and then continuously fails on following runs Xcode install on macOS 13 sometimes fails and then continuously fails on following runs Nov 10, 2023
@vashworth
Copy link
Contributor Author

vashworth commented Nov 10, 2023

This seems like a mac_toolchain issue trying to change the mode of hidden files and then trying to delete those files on subsequent builds. Is there a new version of mac_toolchain that we can update to?

So yes, there is newer version of mac_toolchain. However, newer version increases every build by ~2 minutes so we opted to use an older version for now. Also looking at their change history, I wonder if that's what this change is meant to fix.

Maybe as a stopgap, we can update recipes so that if the install xcode steps fails, then delete the cache

@vashworth
Copy link
Contributor Author

vashworth commented Nov 10, 2023

Another example that was using newest version of toolchain:

When it failed to install: https://luci-milo.appspot.com/ui/p/flutter/builders/try/Mac%20build_tests_2_4/54714/overview
The next run it fails integrity check and attempts to reinstall but fails: https://luci-milo.appspot.com/ui/p/flutter/builders/try/Mac%20Engine%20Drone/501115/overview
The next run it passes integrity check and times out: https://luci-milo.appspot.com/ui/p/flutter/builders/try/Mac%20dart_plugin_registry_test/29812/overview

Looking at this example, I don't think newer mac_toolchain will fix.

@vashworth vashworth self-assigned this Nov 10, 2023
@vashworth vashworth added the P0 Critical issues such as a build break or regression label Nov 10, 2023
@vashworth
Copy link
Contributor Author

vashworth commented Nov 10, 2023

We could try a workaround solution to hopefully prevent it from failing on following runs: https://flutter-review.googlesource.com/c/recipes/+/52421

I think we'll need to file an issue with chromium about it failing to update package permissions

@godofredoc
Copy link
Contributor

It seems like https://chromium.googlesource.com/infra/infra/+/32d81d877ee07af07bf03b7f70ce597e323b80ce is fixing the issue. I'd recommend to use the latest version 2m increase in execution time is not desirable but it can potentially fix this P0

@vashworth
Copy link
Contributor Author

It seems like https://chromium.googlesource.com/infra/infra/+/32d81d877ee07af07bf03b7f70ce597e323b80ce is fixing the issue. I'd recommend to use the latest version 2m increase in execution time is not desirable but it can potentially fix this P0

See this comment that shows newest mac_toolchain does not fix issue: #138238 (comment)

@ricardoamador ricardoamador added the triaged-infra Triaged by Infrastructure team label Nov 10, 2023
@keyonghan
Copy link
Contributor

Filed https://crbug.com/1501452.

auto-submit bot pushed a commit that referenced this issue Nov 10, 2023
There's an #138238 with mac_toolchain that makes Xcode installs flakey and an #138246 that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.
auto-submit bot pushed a commit to flutter/engine that referenced this issue Nov 11, 2023
There's an [issue](flutter/flutter#138238) with mac_toolchain that makes Xcode installs flakey and an [issue](flutter/flutter#138246) that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.

[C++, Objective-C, Java style guides]: https://github.com/flutter/engine/blob/main/CONTRIBUTING.md#style
auto-submit bot pushed a commit to flutter/packages that referenced this issue Nov 11, 2023
There's an flutter/flutter#138238 with mac_toolchain that makes Xcode installs flakey and an flutter/flutter#138246 that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.
@ricardoamador
Copy link
Contributor

Lowering to P1 as there is a work around and the required fix is being investigated by the Chromium team.

@ricardoamador ricardoamador added P1 High-priority issues at the top of the work list and removed P0 Critical issues such as a build break or regression labels Nov 11, 2023
@vashworth
Copy link
Contributor Author

FYI I believe this is only an issue on chromium bots - I have not seen it happen on devicelab bots

@godofredoc godofredoc self-assigned this Nov 16, 2023
@vashworth
Copy link
Contributor Author

There seems to be a pattern for when this issue happens:

  • A build is cancelled (Internal Failure is true on Swarming Task page), which invalidates the cache
  • Next build starts with empty cache (flutter_xcode should not be listed in caches list on swarming task page) and redownloads/installs Xcode
  • During install permission error happens and build fails
  • During next build, install hangs

@keyonghan
Copy link
Contributor

The workaround to auto retry xcode install is in. Let's monitor if it mitigates the issue.

Markzipan pushed a commit to Markzipan/flutter that referenced this issue Nov 27, 2023
There's an flutter#138238 with mac_toolchain that makes Xcode installs flakey and an flutter#138246 that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.
auto-submit bot pushed a commit that referenced this issue Nov 28, 2023
A workaround was added for #138238, so let's re-enable macOS 13 tests
auto-submit bot pushed a commit to flutter/packages that referenced this issue Dec 1, 2023
A workaround was added for flutter/flutter#138238, so let's re-enable macOS 13 tests
auto-submit bot pushed a commit to flutter/engine that referenced this issue Dec 1, 2023
A workaround was added for flutter/flutter#138238, so let's re-enable macOS 13 tests

[C++, Objective-C, Java style guides]: https://github.com/flutter/engine/blob/main/CONTRIBUTING.md#style
@vashworth
Copy link
Contributor Author

The workaround @keyonghan mentioned was https://flutter-review.googlesource.com/c/recipes/+/52421

We found that this wasn't entirely sufficient, though, because if the Xcode version was already corrupted, it would stay corrupted (see #139152). So we added checks to see if it was corrupted first and delete it if so: https://flutter-review.googlesource.com/c/recipes/+/52702

HugoOlthof pushed a commit to moneybird/packages that referenced this issue Dec 13, 2023
There's an flutter/flutter#138238 with mac_toolchain that makes Xcode installs flakey and an flutter/flutter#138246 that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.
HugoOlthof pushed a commit to moneybird/packages that referenced this issue Dec 13, 2023
A workaround was added for flutter/flutter#138238, so let's re-enable macOS 13 tests
@vashworth
Copy link
Contributor Author

I've been observing the macOS 13 bots and haven't seen this issue again. I think it's safe to close.

caseycrogers pushed a commit to caseycrogers/flutter that referenced this issue Dec 29, 2023
A workaround was added for flutter#138238, so let's re-enable macOS 13 tests
Copy link

github-actions bot commented Jan 3, 2024

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P1 High-priority issues at the top of the work list team-infra Owned by Infrastructure team triaged-infra Triaged by Infrastructure team
Projects
None yet
Development

No branches or pull requests

4 participants