CI is broken and fails for trivial PR #2197

vorburger · 2023-09-25T23:19:55Z

Describe the bug
My trivial PR #2196, which only adds a .github/dependabot.yaml file, which IMHO is "impossible" to affect test runs and such things, failed to pass CI and is red instead of green.

This can discourage contributions (because you never clearly know if the change proposed in a PR breaks CI, or if it was "just already broken anyway"), and therefore perhaps should be fixed with top priority, over everything else?

IMHO any project should intentionally set itself a relatively high bar in this matter, and aim for CI on its main branch to always be green. Any PRs that break it and turn it red should be immediately be reverted by maintainers.

Glancing over https://github.com/google/android-fhir/commits/master this historically does not seem to be the case in this project.

To Reproduce
Steps to reproduce the behavior:

Raise a trivial PR e.g. adding a space to the README

Expected behavior
It can never say Some checks were not successful for such trivial PR changes.

Would you like to work on the issue?
I am happy to help "drive" this, but suspect it will need input from more people.

/Cc @fredhersch @jingtang10 @omarismail94 @williamito FYA

The text was updated successfully, but these errors were encountered:

vorburger · 2023-09-25T23:35:07Z

Looking more closely now, 2 checks are failing, and I suspect those are totally unrelated issues:

The build GitHub Action seems to fail due to Error: The operation was canceled. in the log; with the PR UI saying Failing after 25m. Can someone who has been on the project longer comment on if that has always been the case? If so, and unless we have an immediate fix, my recommendation would be to remove it ASAP, as it probably only adds confusion but no real value as-is, and then it can be separately worked on re-introducing it, but only merged if once its green. Does that sound reasonable and fair?
The Kokoro: Build and Device Tests action may "just" be red because (apparently) 1 test cases failed, 84 passed, 1 flaky in demo-debug-androidTest.apk │ datacapture-debug-androidTest.apk │ Nexus6P-24-en_US-portrait, see here and here. I'm new to the project and don't have much context yet, but as far as I can tell this is probably some flaky UI test? My recommendation would be to @Ignore (kind of thing, you know what I mean) that apparently flaky test? The value that individual test adds is likely lower than the "overhead" and "friction" it causes maintainers to have to manually consider for each PR if the proposed change breaks things, or if it was "just already broken anyway"? That test should be improved, and can then be re-enabled? (Or be de-activated again, as long as it's found to still be flaky.) Can someone please assist with how / where exactly one de-temporarily de-activates such a UI test?

This increases the size of the VM [used for CI](https://github.com/google/android-fhir/actions/workflows/build.yml) jobs runs from [GitHub's default](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources) from / to: * 2-core CPU (x86_64) => 64-cores * 7 GB of RAM => 256 GB RAM * 14 GB of SSD space => 2040 GB SSD This should fix the `Error: The operation was canceled` problem described in issue #2197. This is a worthwhile trade off of expensive engineering time VS infrastructure cost. Thank You Sundar for paying for this to enable this project to productively produce an Open Health Stack!

vorburger · 2023-09-26T00:02:54Z

could be fixed by Very significantly increase CI VM CPU Cores, RAM & disk (re. #2197) #2198, which is Successful in 7m. that's ⏩ more 🐎 how we like it!
still needs someone to help

…2198) This increases the size of the VM [used for CI](https://github.com/google/android-fhir/actions/workflows/build.yml) jobs runs from [GitHub's default](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources) from / to: * 2-core CPU (x86_64) => 64-cores * 7 GB of RAM => 256 GB RAM * 14 GB of SSD space => 2040 GB SSD This should fix the `Error: The operation was canceled` problem described in issue #2197. This is a worthwhile trade off of expensive engineering time VS infrastructure cost. Thank You Sundar for paying for this to enable this project to productively produce an Open Health Stack!

omarismail94 · 2023-09-26T08:23:07Z

@vorburger

Looking more closely now, 2 checks are failing, and I suspect those are totally unrelated issues:

The build GitHub Action seems to fail due to Error: The operation was canceled. in the log; with the PR UI saying Failing after 25m. Can someone who has been on the project longer comment on if that has always been the case? If so, and unless we have an immediate fix, my recommendation would be to remove it ASAP, as it probably only adds confusion but no real value as-is, and then it can be separately worked on re-introducing it, but only merged if once its green. Does that sound reasonable and fair?

Yes, GitHub Actions regularly fails with that error, and developers are aware that they have to look at the Kokoro logs. We do not have it blocking PRs being submitted though. I see you submitted a PR to increase the machine size, and we can see if that helps.

The Kokoro: Build and Device Tests action may "just" be red because (apparently) 1 test cases failed, 84 passed, 1 flaky in demo-debug-androidTest.apk │ datacapture-debug-androidTest.apk │ Nexus6P-24-en_US-portrait, see here and here. I'm new to the project and don't have much context yet, but as far as I can tell this is probably some flaky UI test? My recommendation would be to @Ignore (kind of thing, you know what I mean) that apparently flaky test? The value that individual test adds is likely lower than the "overhead" and "friction" it causes maintainers to have to manually consider for each PR if the proposed change breaks things, or if it was "just already broken anyway"? That test should be improved, and can then be re-enabled? (Or be de-activated again, as long as it's found to still be flaky.) Can someone please assist with how / where exactly one de-temporarily de-activates such a UI test?

This is a regularly occurring flaky test. It only is flaky for API Level 24. Outside of adding Ignore, is there any other way of ensuring this works?

vorburger · 2023-09-26T21:18:46Z

This is a regularly occurring flaky test. It only is flaky for API Level 24. Outside of adding Ignore, is there any other way of ensuring this works?

@omarismail94 the flakyness would have to be fixed... I have no idea how, unfortunately. But the way I would go about is to disable it ASAP, because (IMHO) CI is broken and fails for trivial PR due to that, but have an open bug to re-enable it - and prioritize that based on how much value doing that adds.

I spent a bit more time to learn exactly which test this is, and after a while of staring at logs found this Gradle Build Scan, from where I found the HtmlErrorReport.html here, which showed:

Nexus6P-24-en_US-portrait : com.google.android.fhir.datacapture.test.views.QuestionnaireItemDialogMultiSelectViewHolderFactoryEspressoTest#selectOther_shouldScrollDownToShowAddAnotherAnswer (1) androidx.test.espresso.NoMatchingViewException: No views in hierarchy found matching: view.getId() is <2131230791/com.google.android.fhir.datacapture.test:id/add_another>

Looks like #1482 might be the bug that's already open for this? I'll raise a PR proposing to @Igore it, until that test is fixed to be completely reliable and not flaky.

vorburger added type:bug Something isn't working help wanted Extra attention is needed P1 High priority issue effort:small Small effort - 2 days type:build Issues related to code build labels Sep 25, 2023

vorburger self-assigned this Sep 25, 2023

vorburger mentioned this issue Sep 25, 2023

Very significantly increase CI VM CPU Cores, RAM & disk (re. #2197) #2198

Merged

jingtang10 linked a pull request Sep 26, 2023 that will close this issue

Very significantly increase CI VM CPU Cores, RAM & disk (re. #2197) #2198

Merged

vorburger added a commit that referenced this issue Sep 26, 2023

Ignore flaky UI Test, for now (see #1482, fixes #2197)

ebefba4

This was referenced Sep 26, 2023

Ignore flaky UI Test, for now (see #1482, fixes #2197) #2203

Merged

Espresso Test fails if Screen Size too small #1482

Open

Spelling #2193

Merged

Create dependabot.yaml (fixes #2195) #2196

Merged

vorburger added a commit that referenced this issue Sep 29, 2023

Ignore flaky UI Test, for now (see #1482, fixes #2197)

1f8c494

vorburger added a commit to vorburger/android-fhir that referenced this issue Sep 29, 2023

Ignore flaky UI Test, for now (see google#1482, fixes google#2197)

1ed5737

vorburger added a commit that referenced this issue Sep 29, 2023

Ignore flaky UI Test, for now (see #1482, fixes #2197)

72a5eb0

vorburger added a commit that referenced this issue Oct 6, 2023

Ignore flaky UI Test, for now (see #1482, fixes #2197)

e390537

omarismail94 closed this as completed in #2203 Oct 6, 2023

omarismail94 pushed a commit that referenced this issue Oct 6, 2023

Ignore flaky UI Test, for now (see #1482, fixes #2197) (#2203)

1999e48

vorburger mentioned this issue Oct 6, 2023

FhirSyncWorkerBenchmark UI Test fails #2235

Open

vorburger mentioned this issue Oct 25, 2023

Reduce GitHub Action VM Size / Resource Consumption (Cost) #2294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI is broken and fails for trivial PR #2197

CI is broken and fails for trivial PR #2197

vorburger commented Sep 25, 2023

vorburger commented Sep 25, 2023 •

edited

Loading

vorburger commented Sep 26, 2023

omarismail94 commented Sep 26, 2023

vorburger commented Sep 26, 2023

CI is broken and fails for trivial PR #2197

CI is broken and fails for trivial PR #2197

Comments

vorburger commented Sep 25, 2023

vorburger commented Sep 25, 2023 • edited Loading

vorburger commented Sep 26, 2023

omarismail94 commented Sep 26, 2023

vorburger commented Sep 26, 2023

vorburger commented Sep 25, 2023 •

edited

Loading