Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac_arm64 tool_host_cross_arch_tests is failing with recipe crash #113231

Closed
jmagman opened this issue Oct 10, 2022 · 17 comments
Closed

Mac_arm64 tool_host_cross_arch_tests is failing with recipe crash #113231

jmagman opened this issue Oct 10, 2022 · 17 comments
Assignees
Labels
P1 High-priority issues at the top of the work list platform-host-arm Building on an ARM-based platform team-infra Owned by Infrastructure team

Comments

@jmagman
Copy link
Member

jmagman commented Oct 10, 2022

https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8800893589239561057/+/u/RECIPE_CRASH__Uncaught_exception_/logging

The recipe has crashed at point 'Uncaught exception'!

Traceback (most recent call last):
  File "/Volumes/Work/s/w/ir/kitchen-checkout/recipe_engine/recipe_engine/internal/engine.py", line 585, in run_steps
    raw_result = recipe_obj.run_steps(api, engine)
  File "/Volumes/Work/s/w/ir/kitchen-checkout/recipe_engine/recipe_engine/internal/recipe_deps.py", line 936, in run_steps
    recipe_result = invoke_with_properties(
  File "/Volumes/Work/s/w/ir/kitchen-checkout/recipe_engine/recipe_engine/internal/property_invoker.py", line 90, in invoke_with_properties
    return _invoke_with_properties(callable_obj, all_props, environ, prop_defs,

This test is marked bringup:true due to #112130, but was previously successfully passing in staging.

@jmagman jmagman added team-infra Owned by Infrastructure team P1 High-priority issues at the top of the work list labels Oct 10, 2022
@jmagman jmagman added this to New in Infra Ticket Queue via automation Oct 10, 2022
@jmagman
Copy link
Member Author

jmagman commented Oct 10, 2022

cc oncall @drewroengoogle

@drewroengoogle
Copy link
Contributor

It seems like the bundler path isn't being recognized at all, but I'm unsure what changed between the time this recipe was passing and failing, as the bundler version seems to be the same, and the recipe hasn't been modified during that timeframe. I can try taking this on.

@drewroengoogle drewroengoogle self-assigned this Oct 13, 2022
@drewroengoogle drewroengoogle moved this from New to Triaged in Infra Ticket Queue Oct 13, 2022
@drewroengoogle drewroengoogle moved this from Triaged to In progress in Infra Ticket Queue Oct 17, 2022
@drewroengoogle drewroengoogle moved this from In progress to Triaged in Infra Ticket Queue Oct 17, 2022
@drewroengoogle
Copy link
Contributor

Unassigning after discussion with @godofredoc about whether this is an infra issue.

@drewroengoogle drewroengoogle removed their assignment Oct 18, 2022
@jmagman
Copy link
Member Author

jmagman commented Oct 19, 2022

@drewroengoogle do you know who is working on this? Is /Volumes/Work/s/w/ir/cache/ruby/bin/bundle missing? We want to add more Mac_arm64 tests #110126

@drewroengoogle
Copy link
Contributor

drewroengoogle commented Oct 20, 2022

@jmagman Thanks for following up, I meant to yesterday but got sidetracked.
I sent a message in the LUCI chat this morning and CC'ed you, as I'm not sure the best person to take a look at this. From my search it doesn't seem like there was any difference between the last successful run and the first failing run, the recipe, recipe properties, and ruby CIPD package did not change around that time at all.

@godofredoc
Copy link
Contributor

I took a quick look, removing the installation of ruby makes the gems steps to succeed but all the tests fail.

E.g. for x64 GEM_HOME is set to a generic folder where blundler and bundle are installed https://cs.opensource.google/flutter/recipes/+/main:recipe_modules/flutter_deps/api.py;l=300 and only after the gems path has been set and gems have been installed the path is changes to the ruby specific location https://cs.opensource.google/flutter/recipes/+/main:recipe_modules/flutter_deps/api.py;l=326

In arm64 GEM_HOME is set directly to <gem_dir>/ruby/3.1.0 when the bundler and bundle are located at <gem_dir>

Another potential problem is that ruby binaries from cipd are installed at the end of the path and most likely never used is ruby is already installed in the system or the code is running from an xcode context. https://cs.opensource.google/flutter/recipes/+/main:recipe_modules/flutter_deps/api.py;l=282

@keyonghan
Copy link
Contributor

Compared the failed build with succeeded one, the env. dir path root was switched from /opt to /Volumes/Work, which caused issues. Followed up in the chat to see why such changes in our bots.

@keyonghan
Copy link
Contributor

Chromium team reinstalled our bots on 10/07 morning, which is aligned with our incident here. It is suspected some hard coded paths somewhere.

@godofredoc
Copy link
Contributor

Apart from the what actually caused this problem. We need to make this installation hermetic and resilience to changes in the environment.

@keyonghan
Copy link
Contributor

Yeah, I agree. When creating the ruby version, a prefix path is needed to make it work. I believe that's the culprit here.

@keyonghan
Copy link
Contributor

Have spent quite some hours to work on the portable version, but still no luck to make it work.
Referenced https://github.com/stephan-nordnes-eriksen/ruby_ship, which seems promising to meet our purpose. However this still depends on openssl pre-installed in the bot, which will make the ruby doesn't work again if the root dir changes.

I am running out of ideas.. Anyone with experience on ruby may have a better idea on how to make it work? /cc @jmagman

One possible workaround is build/deploy openssl and ruby on the runtime, but it will add 5+ mins on each run.

@jmagman
Copy link
Member Author

jmagman commented Nov 2, 2022

@godofredoc
Copy link
Contributor

Devicelab vs chromium machines?

@keyonghan
Copy link
Contributor

Yeah, these are devicelab testbeds where the root dirs are not changed (still /opt/s), whereas the chromium ones have been updated to /Volumes/Work/s. The existing ruby was compiled based on /opt/s, so the devicelab ones are still valid.

@keyonghan keyonghan removed this from Triaged in Infra Ticket Queue Nov 18, 2022
@keyonghan keyonghan added this to To do in Infra - pay technical debt via automation Nov 18, 2022
@jmagman jmagman added platform-host-arm Building on an ARM-based platform arch: m1 labels Dec 13, 2022
@godofredoc
Copy link
Contributor

@yusuf-goog FYI

@godofredoc godofredoc self-assigned this Jan 10, 2023
@godofredoc
Copy link
Contributor

This is fixed now with ruby hermetic packages and https://flutter-review.googlesource.com/c/recipes/+/38088

@github-actions
Copy link

github-actions bot commented Mar 4, 2023

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P1 High-priority issues at the top of the work list platform-host-arm Building on an ARM-based platform team-infra Owned by Infrastructure team
Projects
Development

No branches or pull requests

4 participants