Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac_ios hot_mode_dev_cycle_ios__benchmark is 2.00% flaky, Process exited with status = -1 (0xffffffff) lost connection #120808

Closed
fluttergithubbot opened this issue Feb 15, 2023 · 105 comments
Assignees
Labels
c: flake Tests that sometimes, but not always, incorrectly pass P0 Critical issues such as a build break or regression platform-ios iOS applications specifically team-ios Owned by iOS platform team triaged-ios Triaged by iOS platform team

Comments

@fluttergithubbot
Copy link
Contributor

The post-submit test builder Mac_ios hot_mode_dev_cycle_ios__benchmark had a flaky ratio 2.00% for the past (up to) 100 commits, which is above our 2.00% threshold.

One recent flaky example for a same commit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20hot_mode_dev_cycle_ios__benchmark/4015
Commit: 00adf9a

Flaky builds:
https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20hot_mode_dev_cycle_ios__benchmark/4015
https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20hot_mode_dev_cycle_ios__benchmark/3995

Recent test runs:
https://flutter-dashboard.appspot.com/#/build?taskFilter=Mac_ios%20hot_mode_dev_cycle_ios__benchmark

Please follow https://github.com/flutter/flutter/wiki/Reducing-Test-Flakiness#fixing-flaky-tests to fix the flakiness and enable the test back after validating the fix (internal dashboard to validate: go/flutter_test_flakiness).

@fluttergithubbot fluttergithubbot added P1 c: flake Tests that sometimes, but not always, incorrectly pass tool Affects the "flutter" command-line tool. See also t: labels. labels Feb 15, 2023
@zanderso
Copy link
Member

The flake in 4015 is a host-side Dart VM GC crasher. I've only seen that once, and without a reliable repro, I don't think the VM team will be able to track that one down.

However, 3995 is a VM service address discovery timeout. It is interesting because the device logs don't capture the timeframe when the discovery timed out. Instead the device logs seem to begin only slightly before the run that follows the failing run. cc @jmagman.

Downgrading to P3 since this is only just over the threshold.

@zanderso zanderso added P1 High-priority issues at the top of the work list and removed P1 labels Feb 15, 2023
@zanderso zanderso removed their assignment Feb 15, 2023
@fluttergithubbot fluttergithubbot added P1 and removed P1 High-priority issues at the top of the work list labels Feb 22, 2023
@jmagman jmagman self-assigned this Feb 28, 2023
@jmagman
Copy link
Member

jmagman commented Feb 28, 2023

Hm, 4094 and 4171 lost connection to device, not sure what that's about? It ran on different devicelab bots.

[hot_mode_dev_cycle_ios__benchmark] [STDOUT] stdout: [+21099 ms] (lldb) warning: failed to set breakpoint site at 0x1b9c400f8 for breakpoint -4.1: error sending the breakpoint request
[hot_mode_dev_cycle_ios__benchmark] [STDOUT] stdout: [+5143 ms] Process 555 exited with status = -1 (0xffffffff) lost connection

4160 is VM service discovery timeout.

@zanderso
Copy link
Member

zanderso commented Mar 1, 2023

Routing to the infra ticket queue to investigate lost connection to the device.

@zanderso zanderso added team-infra Owned by Infrastructure team and removed tool Affects the "flutter" command-line tool. See also t: labels. labels Mar 1, 2023
@zanderso zanderso added this to New in Infra Ticket Queue via automation Mar 1, 2023
@zanderso zanderso removed their assignment Mar 1, 2023
@zanderso
Copy link
Member

zanderso commented Mar 1, 2023

4201 and 4200 are Dart VM GC crashers on the host

4233 is an Observatory discovery timeout

4316, 4284, and 4274 are all instances of:

Process 555 exited with status = -1 (0xffffffff) lost connection

I wanted to exclude the possibility that this is caused by a real engine crash during startup, but unfortunately the device logs (like here for the run here) don't go back far enough to capture the failure.

The GC crash is already flagged in a couple of other P1's, so we can those other issues to track it, as is the Observatory discovery timeout.

Since we haven't excluded the possibility that this is an engine crash, I'm going to remove from the infra queue, and route to platform-ios.

@zanderso zanderso removed this from New in Infra Ticket Queue Mar 1, 2023
@zanderso zanderso added platform-ios iOS applications specifically and removed team-infra Owned by Infrastructure team labels Mar 1, 2023
@zanderso
Copy link
Member

zanderso commented Mar 1, 2023

I re-opened #120809 to mark this test flaky.

@fluttergithubbot fluttergithubbot added P0 Critical issues such as a build break or regression and removed P1 High-priority issues at the top of the work list labels Aug 16, 2023
@vashworth vashworth added P1 High-priority issues at the top of the work list and removed P0 Critical issues such as a build break or regression labels Aug 21, 2023
@fluttergithubbot fluttergithubbot added P0 Critical issues such as a build break or regression and removed P1 High-priority issues at the top of the work list labels Aug 23, 2023
@vashworth
Copy link
Contributor

Each of the recent flakes are still the Process exited with status = -1 (0xffffffff) lost connection error. I'm deprioritizing again because it fails quickly, passes on rerun, doesn't affect the tree - but I did have an idea of a workaround to make it stop being flakey.

The idea is, if it errors when launching, try again (without throwing an error). It seems to always work on rerun so we can just bake retries into the launch code

@vashworth vashworth added P1 High-priority issues at the top of the work list and removed P0 Critical issues such as a build break or regression labels Aug 24, 2023
@fluttergithubbot fluttergithubbot added P0 Critical issues such as a build break or regression and removed P1 High-priority issues at the top of the work list labels Aug 30, 2023
auto-submit bot pushed a commit that referenced this issue Sep 5, 2023
Sometimes `ios-deploy` loses connection to the device after installing, starting debugserver, and launching. This is shown with an error message like:
```
Process 579 exited with status = -1 (0xffffffff) lost connection
```
This happens frequently in our CI system: #120808

Usually in CI, on retry it'll work and pass - so this is an attempt to retry without failing the test first. It's not guaranteed to fix since we're unable to recreate this error locally.
@vashworth
Copy link
Contributor

I pushed a workaround that should hopefully fix: #133769

Since we can't recreate it, though, we'll just have to wait to see if it stops flaking

@fluttergithubbot
Copy link
Contributor Author

[prod pool] flaky ratio for the past (up to) 100 commits between 2023-08-30 and 2023-09-03 is 1.03%. Flaky number: 1; total number: 97.
One recent flaky example for a same commit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20hot_mode_dev_cycle_ios__benchmark/6870
Commit: 1b1c8a1
Flaky builds:
https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20hot_mode_dev_cycle_ios__benchmark/6870

Recent test runs:
https://flutter-dashboard.appspot.com/#/build?taskFilter=Mac_ios%20hot_mode_dev_cycle_ios__benchmark

@fluttergithubbot
Copy link
Contributor Author

[prod pool] flaky ratio for the past (up to) 100 commits between 2023-09-07 and 2023-09-12 is 1.03%. Flaky number: 1; total number: 97.
One recent flaky example for a same commit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20hot_mode_dev_cycle_ios__benchmark/7043
Commit: 4a3ab68
Flaky builds:
https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20hot_mode_dev_cycle_ios__benchmark/7043

Recent test runs:
https://flutter-dashboard.appspot.com/#/build?taskFilter=Mac_ios%20hot_mode_dev_cycle_ios__benchmark

auto-submit bot pushed a commit that referenced this issue Sep 18, 2023
When retrying to connect to the device during app launch, don't uninstall the app first.

Latest test flake for #120808:
https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8770202475999850785/+/u/run_hot_mode_dev_cycle_ios__benchmark/test_stdout

Shows that it uninstalled and then tried debugging and failed, which would make sense since the app wasn't installed anymore.
```
[2023-09-11 18:02:24.555646] [STDOUT] stdout: [   +6 ms] Lost connection to device. Trying to connect again...
[2023-09-11 18:02:24.556949] [STDOUT] stdout: [   +1 ms] executing: /opt/s/w/ir/x/w/recipe_cleanup/tmp53fs1szo/flutter sdk/bin/cache/artifacts/libimobiledevice/idevicesyslog -u 00008030-00144DA10185402E
[2023-09-11 18:02:24.557323] [STDOUT] stdout: [        ] executing: script -t 0 /dev/null /opt/s/w/ir/x/w/recipe_cleanup/tmp53fs1szo/flutter sdk/bin/cache/artifacts/ios-deploy/ios-deploy --id 00008030-00144DA10185402E --bundle build/ios/iphoneos/Flutter Gallery.app --app_deltas build/ios/app-delta --uninstall --noinstall --debug --no-wifi --args --enable-dart-profiling --disable-vm-service-publication --enable-checked-mode --verify-entry-points
[2023-09-11 18:02:24.578010] [STDOUT] stdout: [  +20 ms] [....] Waiting for iOS device to be connected
[2023-09-11 18:02:24.712631] [STDOUT] stdout: [ +134 ms] [....] Using 00008030-00144DA10185402E (N104AP, iPhone 11, iphoneos, arm64e, 16.2, 20C65) a.k.a. 'iPhone 11'.
[2023-09-11 18:02:24.712725] [STDOUT] stdout: [        ] ------ Uninstall phase ------
[2023-09-11 18:02:24.818293] [STDOUT] stdout: [ +105 ms] [ OK ] Uninstalled package with bundle id io.flutter.examples.gallery
[2023-09-11 18:02:24.906833] [STDOUT] stdout: [  +88 ms] ------ Debug phase ------
[2023-09-11 18:02:24.906924] [STDOUT] stdout: [        ] Starting debug of 00008030-00144DA10185402E (N104AP, iPhone 11, iphoneos, arm64e, 16.2, 20C65) a.k.a. 'iPhone 11' connected through USB...
[2023-09-11 18:02:25.285252] [STDOUT] stdout: [ +378 ms] [  0%] Looking up developer disk image
[2023-09-11 18:02:25.529937] [STDOUT] stdout: [ +244 ms] [ 90%] Mounting developer disk image
[2023-09-11 18:02:25.545261] [STDOUT] stdout: [  +15 ms] [ 95%] Developer disk image already mounted
[2023-09-11 18:02:25.587923] [STDOUT] stdout: [  +42 ms] Detected path to iOS debug symbols: "Symbol Path: /Users/swarming/Library/Developer/Xcode/iOS DeviceSupport/16.2 (20C65) arm64e/Symbols"
[2023-09-11 18:02:25.857177] [STDOUT] stdout: [ +269 ms] Script started, output file is /dev/null
[2023-09-11 18:02:25.857259] [STDOUT] stdout: [        ] Script done, output file is /dev/null
[2023-09-11 18:02:25.857511] [STDOUT] stdout: [        ] ios-deploy exited with code 0
[2023-09-11 18:02:25.858066] [STDOUT] stderr: [        ] Could not run build/ios/iphoneos/Flutter Gallery.app on 00008030-00144DA10185402E.
[2023-09-11 18:02:25.858130] [STDOUT] stderr: [        ] Try launching Xcode and selecting "Product > Run" to fix the problem:
[2023-09-11 18:02:25.858214] [STDOUT] stderr: [        ]   open ios/Runner.xcworkspace
[2023-09-11 18:02:25.858537] [STDOUT] stdout: [        ] Installing and launching... (completed in 52.4s)
[2023-09-11 18:02:25.858956] [STDOUT] stderr: [        ] Error launching application on iPhone 11.
```
@vashworth
Copy link
Contributor

vashworth commented Sep 19, 2023

Updated the fix (#134542), so need to wait again to see if it stops flaking

@fluttergithubbot
Copy link
Contributor Author

@vashworth
Copy link
Contributor

This has not flaked since workaround was added. I'm going to be cautiously optimistic and consider this fixed.

For future, if this error reoccurs, see the following comment for a breakdown of the situation:
#120808 (comment)

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
c: flake Tests that sometimes, but not always, incorrectly pass P0 Critical issues such as a build break or regression platform-ios iOS applications specifically team-ios Owned by iOS platform team triaged-ios Triaged by iOS platform team
Projects
None yet
Development

No branches or pull requests