Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xcode cache mysteriously gets cleared on macOS #138246

Open
1 task done
Tracked by #133207
vashworth opened this issue Nov 10, 2023 · 16 comments
Open
1 task done
Tracked by #133207

Xcode cache mysteriously gets cleared on macOS #138246

vashworth opened this issue Nov 10, 2023 · 16 comments
Assignees
Labels
P2 Important issues not at the top of the work list team-infra Owned by Infrastructure team triaged-infra Triaged by Infrastructure team

Comments

@vashworth
Copy link
Contributor

vashworth commented Nov 10, 2023

Is there an existing issue for this?

Type of Request

bug

Infrastructure Environment

LUCI

What is happening?

In one build Xcode is downloaded: https://chromium-swarm.appspot.com/task?id=65cf2793c5a1a110
Next build was canceled: https://chromium-swarm.appspot.com/task?id=65cf4e5b7efc7b10
In next build the cache is suddenly empty (see step "Show xcode cache") and Xcode has to be downloaded again and fails: https://chromium-swarm.appspot.com/task?id=65cf57470c8f4c10

It seems like maybe anytime a build is cancelled in the middle, the Xcode cache is cleared

Here's another example:
Build canceled: https://luci-milo.appspot.com/ui/p/flutter/builders/try/Mac%20tool_integration_tests_2_4/29778/overview
Next build the cache is empty and Xcode fails to re-download: https://luci-milo.appspot.com/ui/p/flutter/builders/try/Mac_arm64%20macos_platform_tests%20master%20-%20packages/8150/overview

Steps to reproduce

No response

Expected results

I expect Xcode cache not to get deleted and for Xcode not to fail to install

@vashworth vashworth added team-infra Owned by Infrastructure team P0 Critical issues such as a build break or regression labels Nov 10, 2023
@vashworth
Copy link
Contributor Author

I think this is a pretty big issue and could use infra help

@vashworth
Copy link
Contributor Author

Related issue: #138238

@vashworth vashworth changed the title Xcode cache mysteriously gets cleared Xcode cache mysteriously gets cleared on macOS 13 Nov 10, 2023
@keyonghan
Copy link
Contributor

Considering these three tasks:

  1. first successful build with xcode installed: https://chromium-swarm.appspot.com/task?id=65cf2793c5a1a110
  2. killed/canceled build right after 1): https://chromium-swarm.appspot.com/task?id=65cf4e5b7efc7b10
  3. failed build after 2): https://chromium-swarm.appspot.com/task?id=65cf57470c8f4c10

Something went wrong with 2), after the task was canceled, the xcode cache dir (osx_sdk) doesn't seem to get handled properly. This is based on the bot_cache list in task 3) where no osx_sdk showing up:
Screenshot 2023-11-10 at 12 22 06 PM

Wondering if our recipes module has some special logic. If not then something to check with LUCI team.

@ricardoamador
Copy link
Contributor

ricardoamador commented Nov 10, 2023

Don't we have logic to prevent against a corrupted cache? Looks like it was killed during the download phase. Wait, is this the new method of saving package for mac OS13 @vashworth we spoke about yesterday?

@vashworth
Copy link
Contributor Author

Don't we have logic to prevent against a corrupted cache? Looks like it was killed during the download phase. Wait, is this the new method of saving package for mac OS13 @vashworth we spoke about yesterday?

I don't think it's relevant to the "download dependencies" step if that's what you mean - that steps downloads dart dependencies I think.

We have logic against corrupted cache when runtimes are involved, but we would see steps for clearing the cache if that was the case. The only cache clearing logic I'm aware of is when we do it through steps (which none of the builds have)

@keyonghan
Copy link
Contributor

Started a thread in LUCI. Let's see what they find.

@vashworth
Copy link
Contributor Author

FYI I don't think is specific to macOS 13 - I believe this also happens on macOS 12. This may have been going on for a while...

Example on macOS 12:
Canceled build: https://chromium-swarm.appspot.com/task?id=65d2feac2ac5e710
Next build doesn't have Xcode cache: https://chromium-swarm.appspot.com/task?id=65d30d1be71f8c10

I think this issue is just more exacerbated on macOS 13 because of #138238

@vashworth vashworth changed the title Xcode cache mysteriously gets cleared on macOS 13 Xcode cache mysteriously gets cleared on macOS Nov 10, 2023
@keyonghan
Copy link
Contributor

Confirmed from LUCI team, there is a bug mishandling (or rather not handling at all) of SIGTERM in run_isolated when it is doing cleanup. Will paste the bug when available.

@ricardoamador ricardoamador added the triaged-infra Triaged by Infrastructure team label Nov 10, 2023
@ricardoamador
Copy link
Contributor

I wonder what fixing this will have an effect on? How much will this help?

@godofredoc
Copy link
Contributor

Reading more carefully this is the expected behavior for caches. If a download fails in the middle then the cache is corrupted and needs to be downloaded the next time. Rather than focusing on the cache we may need to focus on the failures.

@vashworth
Copy link
Contributor Author

Reading more carefully this is the expected behavior for caches. If a download fails in the middle then the cache is corrupted and needs to be downloaded the next time. Rather than focusing on the cache we may need to focus on the failures.

It's not always the download that fails in the middle (see 2nd example in original description of this issue). But yeah we should focus on the failure

@ricardoamador
Copy link
Contributor

ricardoamador commented Nov 10, 2023

In one build Xcode is downloaded: https://chromium-swarm.appspot.com/task?id=65cf2793c5a1a110
Next build was canceled: https://chromium-swarm.appspot.com/task?id=65cf4e5b7efc7b10
In next build the cache is suddenly empty (see step "Show xcode cache") and Xcode has to be downloaded again and fails: https://chromium-swarm.appspot.com/task?id=65cf57470c8f4c10

So this does not seem like an infra issue now based on what LUCI has said. We have a passing test followed by two killed tests. Which then LUCI is not handling the error code properly but the cache is doing what it needs to do.

I also have questions as to the first three tests. The first test succeeded, the second was killed in the middle of the download and the third was killed again during the reset of xcode. This doesn't seem like a caching issue.

Adding LUCI bug here: https://bugs.chromium.org/p/chromium/issues/detail?id=1501443

fluttermirroringbot pushed a commit that referenced this issue Nov 10, 2023
There's an #138238 with mac_toolchain that makes Xcode installs flakey and an #138246 that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.
auto-submit bot pushed a commit to flutter/engine that referenced this issue Nov 11, 2023
There's an [issue](flutter/flutter#138238) with mac_toolchain that makes Xcode installs flakey and an [issue](flutter/flutter#138246) that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.

[C++, Objective-C, Java style guides]: https://github.com/flutter/engine/blob/main/CONTRIBUTING.md#style
auto-submit bot pushed a commit to flutter/packages that referenced this issue Nov 11, 2023
There's an flutter/flutter#138238 with mac_toolchain that makes Xcode installs flakey and an flutter/flutter#138246 that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.
@ricardoamador
Copy link
Contributor

Lowering to P1 as the issue is off to LUCI to investigate.

@ricardoamador ricardoamador added P1 High-priority issues at the top of the work list and removed P0 Critical issues such as a build break or regression labels Nov 11, 2023
@keyonghan keyonghan removed the P1 High-priority issues at the top of the work list label Nov 27, 2023
@keyonghan keyonghan added the P2 Important issues not at the top of the work list label Nov 27, 2023
@keyonghan
Copy link
Contributor

Lowing to P2 for LUCI fix. If we see this issue more frequently, let's bump the priority of the LUCI bug.

Markzipan pushed a commit to Markzipan/flutter that referenced this issue Nov 27, 2023
There's an flutter#138238 with mac_toolchain that makes Xcode installs flakey and an flutter#138246 that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.
HugoOlthof pushed a commit to moneybird/packages that referenced this issue Dec 13, 2023
There's an flutter/flutter#138238 with mac_toolchain that makes Xcode installs flakey and an flutter/flutter#138246 that makes Xcode installs more frequent on macOS 13, which is causing presubmit tests to fall frequently. In the meantime, we'll only have tests run on macOS 12.
@flutter-triage-bot flutter-triage-bot bot added the Bot is counting down the days until it unassigns the issue label Apr 1, 2024
@flutter-triage-bot
Copy link

This issue is assigned to @ricardoamador and @keyonghan but has had no recent status updates. Please consider unassigning this issue if it is not going to be addressed in the near future. This allows people to have a clearer picture of what work is actually planned. Thanks!

@vashworth
Copy link
Contributor Author

FYI this does still appear to be a problem, but it's not as critical since when re-installing, it only re-installs Xcode since the simulator was still mounted (takes ~2 minutes rather than ~10).

For example,
One build was canceled: https://chromium-swarm.appspot.com/task?id=69100f54d5f5e710
And the next build has to re-install Xcode (see step 13.4 and 13.5): https://chromium-swarm.appspot.com/task?id=691029e524395f10

@flutter-triage-bot flutter-triage-bot bot removed the Bot is counting down the days until it unassigns the issue label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Important issues not at the top of the work list team-infra Owned by Infrastructure team triaged-infra Triaged by Infrastructure team
Projects
None yet
Development

No branches or pull requests

4 participants