-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodic test failures in GitHub Actions CI and CodeCov #9873
Comments
Confirmed the exact same bug blocked #9848 - and is blocking lots of dependabot PRs too, so that's a good place to look for this happening! |
#9864 as well! So, what is the actual error? It seems related to this line from #9165 by @RuthNjeri -- Line 430 in 78bf29f
No worries, Ruth, it seems super obscure. But we're doing an Could we do this differently? |
I wonder if introducing pessimistic locking will help us prevent this race condition. https://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html |
Oh, interesting! So how would that look? Like this? Tag.transaction do
Tag.lock
.where(tid: tids)
.update_all(activity_timestamp: DateTime.now, latest_activity_nid: activity_id)
end @RuthNjeri @cesswairimu , what do you think? I've never done this before! 😅 |
@icarito, would it be OK to lock some tag records in this way while the rows are being updated together? Or would it cause some adverse performance? I'm wondering if:
We should be able to check for such errors in Sentry. I don't think i've been seeing any MySQL related errors though, so i think this should be OK to leave as-is in production mode. |
For reference, I think we're talking about locking about 6 rows in the |
…ld to avoid db errors in testing re: #9873
I've attempted the change irrespective of mode (so for both production and testing) in #9881 so let's see what happens there. |
This is looking good @jywarren 🎉 |
Hi @jywarren and Everyone 👋🏾 If the tag locking does not work, we could look into mocking this for the tests... |
OK, we merged #9881 to try to address this. Let's watch out and i suspect (and @Manasa2850 noted something related to this too) that there's another issue sometimes causing test failures. We can close it if things have stabilized in a week or two! I did notice that CodeCov failed twice for @17sushmita in #9885 and #9879 for no apparent reason, misreading test coverage changes. So let's watch that too, and if you see that behavior please mention and link us to it here! Thanks, all! |
Lots of codecov failures w dependabot: #9887 |
See here for example I asked dependabot to rebase #9886 (comment) |
…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873
So I might be shooting in the dark but I found this issue codecov/codecov-action#330 and this link https://stackoverflow.com/questions/67861379/codecov-fails-in-github-actions which suggests moving to 1.5.2 should solve our issue and then in our |
oh really nice!! @Tlazypanda checking this out. Thanks |
@jywarren I checked and couldn't find anywhere we could specify the codecov version. Any ideas? thanks |
Great digging here, @Tlazypanda @cesswairimu !!! Digging into when that ignore got added... #6290 relevant... also #9756 cited the same CodeCov issue! #9552 is maybe where CodeCov was last updated, on April 23 2021 Lines 89 to 90 in 6832a43
https://github.com/publiclab/plots2/pull/9583/files is where the newer Dependabot added the ignore? So wait, the ignore is only to ignore that specific version, not later ones. It's generated when we close a Dependabot PR on the specific version -- it "remembers" that we don't want to update to that particular version. But we got past it so we're now on v0.5.2: Lines 43 to 45 in 6832a43
|
Looking at a recent fail here for exact reference: #9894 |
I'm going to do 2 things to see if we get past it:
|
Also noting some people are using a specific GitHub Actions step like this:
|
Oh hmm, i see this message:
Sure! I'll try that! |
OK, we had a slight hiccup where the merge-pr action was in the wrong folder. Moved it and adjusted in #9910 |
OK, hopeful this will work: https://github.com/publiclab/plots2/actions/runs/1038211403 🤞 |
OMG ok this took a while. I think i got it. We now have this which triggers after a merge to main: https://github.com/publiclab/plots2/runs/3088995766?check_suite_focus=true That then is used as a trigger for another run of the regular tests workflow, which strangely here ran twice: https://github.com/publiclab/plots2/actions/runs/1038480174 https://github.com/publiclab/plots2/actions/runs/1038480867 One says OK done: #9916 |
Hmm. it's still running 2x. But the sequence is now:
This is good enough for now! |
Now i'll open a new PR and we'll see what it compares against! |
I'm also hoping we'll see "Missing base report" on this page go away: https://app.codecov.io/gh/publiclab/plots2/pulls?page=1&state=open&order=-pullid |
And #9909 | https://app.codecov.io/gh/publiclab/plots2/compare/9909 is now not showing "Missing base report" anymore! 🎉 |
OK - the good news is that the comparison is against the correct base commit: However, the /check/ isn't passing, and is reporting 59.59% (-22.56%) which is not right. CodeCov itself reports 74.07% here: https://app.codecov.io/gh/publiclab/plots2/compare/9909/. I think the 59.59% is from one commit behind. I.e. before it was really ready to report back. The "re-run" link doesn't work. |
After looking in documentation, i finally just reported the discrepancy to CodeCov in their forums: https://community.codecov.com/t/github-status-check-reported-too-early/3043 |
thanks Jeff 🚀 🚀 |
…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873
* add wait_for_ajax for barnstar errors (2nd attempt) copy of publiclab#9909 for codeCov debugging cc publiclab#9873 * notify: after_n_builds: 5
…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873
* add wait_for_ajax for barnstar errors (2nd attempt) copy of publiclab#9909 for codeCov debugging cc publiclab#9873 * notify: after_n_builds: 5
I'm seeing occasional errors that seem to self-resolve when we re-run the tests. I'm not sure the cause. Here's one log I found:
These can sometimes happen due to timing issues -- because tests run all at once with no time passing in between. We can sometimes resolve them by adding an order clause to the SQL, or by using TimeCop to manipulate the time during tests. Let's collect more data to see what tests are inconsistent (paste them here before restarting them, please) and go from there. cc @publiclab/soc
The text was updated successfully, but these errors were encountered: