Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI is broken and fails for trivial PR #2197

Closed
vorburger opened this issue Sep 25, 2023 · 4 comments · Fixed by #2198 or #2203
Closed

CI is broken and fails for trivial PR #2197

vorburger opened this issue Sep 25, 2023 · 4 comments · Fixed by #2198 or #2203
Assignees
Labels
effort:small Small effort - 2 days help wanted Extra attention is needed P1 High priority issue type:bug Something isn't working type:build Issues related to code build

Comments

@vorburger
Copy link
Member

Describe the bug
My trivial PR #2196, which only adds a .github/dependabot.yaml file, which IMHO is "impossible" to affect test runs and such things, failed to pass CI and is red instead of green.

This can discourage contributions (because you never clearly know if the change proposed in a PR breaks CI, or if it was "just already broken anyway"), and therefore perhaps should be fixed with top priority, over everything else?

IMHO any project should intentionally set itself a relatively high bar in this matter, and aim for CI on its main branch to always be green. Any PRs that break it and turn it red should be immediately be reverted by maintainers.

Glancing over https://github.com/google/android-fhir/commits/master this historically does not seem to be the case in this project.

To Reproduce
Steps to reproduce the behavior:

  1. Raise a trivial PR e.g. adding a space to the README

Expected behavior
It can never say Some checks were not successful for such trivial PR changes.

Would you like to work on the issue?
I am happy to help "drive" this, but suspect it will need input from more people.

/Cc @fredhersch @jingtang10 @omarismail94 @williamito FYA

@vorburger vorburger added type:bug Something isn't working help wanted Extra attention is needed P1 High priority issue effort:small Small effort - 2 days type:build Issues related to code build labels Sep 25, 2023
@vorburger vorburger self-assigned this Sep 25, 2023
@vorburger
Copy link
Member Author

vorburger commented Sep 25, 2023

Looking more closely now, 2 checks are failing, and I suspect those are totally unrelated issues:

  1. The build GitHub Action seems to fail due to Error: The operation was canceled. in the log; with the PR UI saying Failing after 25m. Can someone who has been on the project longer comment on if that has always been the case? If so, and unless we have an immediate fix, my recommendation would be to remove it ASAP, as it probably only adds confusion but no real value as-is, and then it can be separately worked on re-introducing it, but only merged if once its green. Does that sound reasonable and fair?

  2. The Kokoro: Build and Device Tests action may "just" be red because (apparently) 1 test cases failed, 84 passed, 1 flaky in demo-debug-androidTest.apk │ datacapture-debug-androidTest.apk │ Nexus6P-24-en_US-portrait, see here and here. I'm new to the project and don't have much context yet, but as far as I can tell this is probably some flaky UI test? My recommendation would be to @Ignore (kind of thing, you know what I mean) that apparently flaky test? The value that individual test adds is likely lower than the "overhead" and "friction" it causes maintainers to have to manually consider for each PR if the proposed change breaks things, or if it was "just already broken anyway"? That test should be improved, and can then be re-enabled? (Or be de-activated again, as long as it's found to still be flaky.) Can someone please assist with how / where exactly one de-temporarily de-activates such a UI test?

vorburger added a commit that referenced this issue Sep 25, 2023
This increases the size of the VM [used for CI](https://github.com/google/android-fhir/actions/workflows/build.yml) jobs runs from [GitHub's default](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources) from / to:

* 2-core CPU (x86_64) => 64-cores
* 7 GB of RAM => 256 GB RAM
* 14 GB of SSD space => 2040 GB SSD

This should fix the `Error: The operation was canceled` problem described in issue #2197.

This is a worthwhile trade off of expensive engineering time VS infrastructure cost. Thank You Sundar for paying for this to enable this project to productively produce an Open Health Stack!
@vorburger
Copy link
Member Author

  1. could be fixed by Very significantly increase CI VM CPU Cores, RAM & disk (re. #2197) #2198, which is Successful in 7m. that's ⏩ more 🐎 how we like it!
  2. still needs someone to help

omarismail94 pushed a commit that referenced this issue Sep 26, 2023
…2198)

This increases the size of the VM [used for CI](https://github.com/google/android-fhir/actions/workflows/build.yml) jobs runs from [GitHub's default](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources) from / to:

* 2-core CPU (x86_64) => 64-cores
* 7 GB of RAM => 256 GB RAM
* 14 GB of SSD space => 2040 GB SSD

This should fix the `Error: The operation was canceled` problem described in issue #2197.

This is a worthwhile trade off of expensive engineering time VS infrastructure cost. Thank You Sundar for paying for this to enable this project to productively produce an Open Health Stack!
@omarismail94
Copy link
Contributor

@vorburger

Looking more closely now, 2 checks are failing, and I suspect those are totally unrelated issues:

  1. The build GitHub Action seems to fail due to Error: The operation was canceled. in the log; with the PR UI saying Failing after 25m. Can someone who has been on the project longer comment on if that has always been the case? If so, and unless we have an immediate fix, my recommendation would be to remove it ASAP, as it probably only adds confusion but no real value as-is, and then it can be separately worked on re-introducing it, but only merged if once its green. Does that sound reasonable and fair?

Yes, GitHub Actions regularly fails with that error, and developers are aware that they have to look at the Kokoro logs. We do not have it blocking PRs being submitted though. I see you submitted a PR to increase the machine size, and we can see if that helps.

  1. The Kokoro: Build and Device Tests action may "just" be red because (apparently) 1 test cases failed, 84 passed, 1 flaky in demo-debug-androidTest.apk │ datacapture-debug-androidTest.apk │ Nexus6P-24-en_US-portrait, see here and here. I'm new to the project and don't have much context yet, but as far as I can tell this is probably some flaky UI test? My recommendation would be to @Ignore (kind of thing, you know what I mean) that apparently flaky test? The value that individual test adds is likely lower than the "overhead" and "friction" it causes maintainers to have to manually consider for each PR if the proposed change breaks things, or if it was "just already broken anyway"? That test should be improved, and can then be re-enabled? (Or be de-activated again, as long as it's found to still be flaky.) Can someone please assist with how / where exactly one de-temporarily de-activates such a UI test?

This is a regularly occurring flaky test. It only is flaky for API Level 24. Outside of adding Ignore, is there any other way of ensuring this works?

@vorburger
Copy link
Member Author

This is a regularly occurring flaky test. It only is flaky for API Level 24. Outside of adding Ignore, is there any other way of ensuring this works?

@omarismail94 the flakyness would have to be fixed... I have no idea how, unfortunately. But the way I would go about is to disable it ASAP, because (IMHO) CI is broken and fails for trivial PR due to that, but have an open bug to re-enable it - and prioritize that based on how much value doing that adds.

I spent a bit more time to learn exactly which test this is, and after a while of staring at logs found this Gradle Build Scan, from where I found the HtmlErrorReport.html here, which showed:

Nexus6P-24-en_US-portrait : com.google.android.fhir.datacapture.test.views.QuestionnaireItemDialogMultiSelectViewHolderFactoryEspressoTest#selectOther_shouldScrollDownToShowAddAnotherAnswer (1) androidx.test.espresso.NoMatchingViewException: No views in hierarchy found matching: view.getId() is <2131230791/com.google.android.fhir.datacapture.test:id/add_another>

Looks like #1482 might be the bug that's already open for this? I'll raise a PR proposing to @Igore it, until that test is fixed to be completely reliable and not flaky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort:small Small effort - 2 days help wanted Extra attention is needed P1 High priority issue type:bug Something isn't working type:build Issues related to code build
Projects
Status: Complete
2 participants