Periodic test failures in GitHub Actions CI and CodeCov #9873

jywarren · 2021-06-29T20:55:27Z

I'm seeing occasional errors that seem to self-resolve when we re-run the tests. I'm not sure the cause. Here's one log I found:

2021-06-29 20:46:08 +0000 Rack app ("GET /barnstar/givestar=basic&nid=1" - (127.0.0.1)): Mysql2::Error
=[Screenshot]: tmp/screenshots/failures_test_awarding_barnstar_functions_correctly.png
ERROR PostTest#test_awarding_barnstar_functions_correctly (873.23s)
Minitest::UnexpectedError:         Mysql2::Error: 
            app/models/tag.rb:430:in `update_tags_activity'
            app/models/node.rb:678:in `tag_activity'
            app/models/node.rb:671:in `add_comment'
            app/controllers/tag_controller.rb:284:in `barnstar'

These can sometimes happen due to timing issues -- because tests run all at once with no time passing in between. We can sometimes resolve them by adding an order clause to the SQL, or by using TimeCop to manipulate the time during tests. Let's collect more data to see what tests are inconsistent (paste them here before restarting them, please) and go from there. cc @publiclab/soc

The text was updated successfully, but these errors were encountered:

jywarren · 2021-06-29T21:21:48Z

Confirmed the exact same bug blocked #9848 - and is blocking lots of dependabot PRs too, so that's a good place to look for this happening!

jywarren · 2021-06-29T21:24:17Z

#9864 as well! So, what is the actual error?

It seems related to this line from #9165 by @RuthNjeri --

plots2/app/models/tag.rb

Line 430 in 78bf29f

    
           Tag.where(tid: tids).update_all(activity_timestamp: DateTime.now, latest_activity_nid: activity_id)

No worries, Ruth, it seems super obscure. But we're doing an update_all -- could it be blocked by some other database lock? We are running tests in parallel, which kind of simulates how this could run in production with many users at once, you know?

Could we do this differently?

daemon1024 · 2021-06-30T14:00:31Z

I wonder if introducing pessimistic locking will help us prevent this race condition.

https://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html

jywarren · 2021-07-02T21:58:30Z

Oh, interesting! So how would that look? Like this?

Tag.transaction do
 Tag.lock
  .where(tid: tids)
  .update_all(activity_timestamp: DateTime.now, latest_activity_nid: activity_id) 
end

@RuthNjeri @cesswairimu , what do you think? I've never done this before! 😅

jywarren · 2021-07-03T20:49:31Z

@icarito, would it be OK to lock some tag records in this way while the rows are being updated together? Or would it cause some adverse performance?

I'm wondering if:

we should only lock in test mode, to avoid a performance issue in production, or
we should just lock in general because we may have this kind of MySQL error happening in production?

We should be able to check for such errors in Sentry. I don't think i've been seeing any MySQL related errors though, so i think this should be OK to leave as-is in production mode.

jywarren · 2021-07-03T20:52:29Z

For reference, I think we're talking about locking about 6 rows in the term_data table (the tags table) while we update those 6 rows. So not a huge issue, and only when tags are added to a node:

See https://github.com/publiclab/plots2/pull/9165/files#diff-2845087add773fb20ddff83e51ab1ef70788ad7e522c7363e03458c6adab93b3R11

…ld to avoid db errors in testing re: #9873

jywarren · 2021-07-03T20:55:15Z

I've attempted the change irrespective of mode (so for both production and testing) in #9881 so let's see what happens there.

cesswairimu · 2021-07-04T17:57:36Z

Oh, interesting! So how would that look? Like this?
Tag.transaction do
 Tag.lock
  .where(tid: tids)
  .update_all(activity_timestamp: DateTime.now, latest_activity_nid: activity_id) 
end
@RuthNjeri @cesswairimu , what do you think? I've never done this before!

This is looking good @jywarren 🎉

RuthNjeri · 2021-07-05T12:20:27Z

Hi @jywarren and Everyone 👋🏾
I'm having a look at this now, it seems interesting locking the database transaction... I thought the fact that MySQL is relational, it will lock a transaction based on the ACID property...

If the tag locking does not work, we could look into mocking this for the tests...

…ld to avoid db errors in testing (#9881) re: #9873

jywarren · 2021-07-06T16:50:06Z

OK, we merged #9881 to try to address this. Let's watch out and i suspect (and @Manasa2850 noted something related to this too) that there's another issue sometimes causing test failures. We can close it if things have stabilized in a week or two!

I did notice that CodeCov failed twice for @17sushmita in #9885 and #9879 for no apparent reason, misreading test coverage changes. So let's watch that too, and if you see that behavior please mention and link us to it here! Thanks, all!

jywarren · 2021-07-06T17:49:48Z

Lots of codecov failures w dependabot: #9887

jywarren · 2021-07-06T17:50:58Z

See here for example I asked dependabot to rebase #9886 (comment)

…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873

Tlazypanda · 2021-07-08T20:56:04Z

So I might be shooting in the dark but I found this issue codecov/codecov-action#330 and this link https://stackoverflow.com/questions/67861379/codecov-fails-in-github-actions which suggests moving to 1.5.2 should solve our issue and then in our dependabot.yml file codecov dependency is under ignore category so could it be that we aren't updating our codecov and we are facing this? not sure if it gets updated in another way as I said throwing darts :p @jywarren @cesswairimu

cesswairimu · 2021-07-10T02:47:55Z

oh really nice!! @Tlazypanda checking this out. Thanks

cesswairimu · 2021-07-10T04:04:49Z

@jywarren I checked and couldn't find anywhere we could specify the codecov version. Any ideas? thanks

jywarren · 2021-07-13T19:33:46Z

Great digging here, @Tlazypanda @cesswairimu !!!

Digging into when that ignore got added...

#6290 relevant... also #9756 cited the same CodeCov issue!

#9552 is maybe where CodeCov was last updated, on April 23 2021

plots2/Gemfile.lock

Lines 89 to 90 in 6832a43

    
           codecov (0.5.2) 
        
             simplecov (>= 0.15, < 0.22)

shows v0.5.2 so we are already on that version...

https://github.com/publiclab/plots2/pull/9583/files is where the newer Dependabot added the ignore?

So wait, the ignore is only to ignore that specific version, not later ones. It's generated when we close a Dependabot PR on the specific version -- it "remembers" that we don't want to update to that particular version. But we got past it so we're now on v0.5.2:

plots2/.github/dependabot.yml

Lines 43 to 45 in 6832a43

    
           - dependency-name: codecov 
        
             versions: 
        
             - 0.4.2

jywarren · 2021-07-13T19:36:40Z

Looking at a recent fail here for exact reference: #9894

jywarren · 2021-07-13T19:41:27Z

So there seems to be some error for 9894 in CodeCov:

jywarren · 2021-07-13T19:44:05Z

I'm going to do 2 things to see if we get past it:

read and implement https://docs.codecov.com/docs/error-reference#section-missing-base-commit to get past that issue
get a new token, as codecov fails in github actions codecov/codecov-action#330 recommends

jywarren · 2021-07-13T19:45:22Z

Also noting some people are using a specific GitHub Actions step like this:

    - name: Upload to codecov
      uses: codecov/codecov-action@v1

jywarren · 2021-07-13T20:12:09Z

Could be this? https://community.codecov.com/t/unable-to-determine-a-parent-commit-to-compare-against-in-base-branch-after-squash-and-merge/2480/20

jywarren · 2021-07-13T20:12:47Z

Oh hmm, i see this message:

New! Install Codecov's GitHub App to improve application link. Comment posting will be on behalf of Codecov not a bot user. Install now

Sure! I'll try that!

jywarren · 2021-07-16T16:37:43Z

OK, we had a slight hiccup where the merge-pr action was in the wrong folder. Moved it and adjusted in #9910

jywarren · 2021-07-16T17:03:04Z

OK, hopeful this will work: https://github.com/publiclab/plots2/actions/runs/1038211403

🤞

jywarren · 2021-07-16T17:03:28Z

https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#permissions

jywarren · 2021-07-16T17:03:33Z

https://docs.github.com/en/actions/reference/authentication-in-a-workflow

jywarren · 2021-07-16T18:52:19Z

OMG ok this took a while. I think i got it. We now have this which triggers after a merge to main:

https://github.com/publiclab/plots2/runs/3088995766?check_suite_focus=true

That then is used as a trigger for another run of the regular tests workflow, which strangely here ran twice:

https://github.com/publiclab/plots2/actions/runs/1038480174

https://github.com/publiclab/plots2/actions/runs/1038480867

One says jywarren requested and the other jywarren completed so maybe we should trigger it on a specific event:

OK done: #9916

jywarren · 2021-07-16T18:53:42Z

actions/runner#950

jywarren · 2021-07-16T18:55:10Z

Hmm. it's still running 2x. But the sequence is now:

https://github.com/publiclab/plots2/actions/runs/1038489457 - original PR
https://github.com/publiclab/plots2/actions/runs/1038493942 to trigger one more run after merging PR
https://github.com/publiclab/plots2/actions/runs/1038493985 and https://github.com/publiclab/plots2/actions/runs/1038494629 run

This is good enough for now!

jywarren · 2021-07-16T19:00:59Z

Now i'll open a new PR and we'll see what it compares against!

jywarren · 2021-07-16T19:01:55Z

I'm also hoping we'll see "Missing base report" on this page go away:

https://app.codecov.io/gh/publiclab/plots2/pulls?page=1&state=open&order=-pullid

jywarren · 2021-07-16T19:13:27Z

Ooh! I think that's it!

jywarren · 2021-07-16T19:15:46Z

And #9909 | https://app.codecov.io/gh/publiclab/plots2/compare/9909 is now not showing "Missing base report" anymore! 🎉

jywarren · 2021-07-16T19:52:21Z

OK - the good news is that the comparison is against the correct base commit: compared to 3d3de3f

However, the /check/ isn't passing, and is reporting 59.59% (-22.56%) which is not right. CodeCov itself reports 74.07% here: https://app.codecov.io/gh/publiclab/plots2/compare/9909/. I think the 59.59% is from one commit behind. I.e. before it was really ready to report back. The "re-run" link doesn't work.

copy of #9909 for codeCov debugging cc #9873

jywarren · 2021-07-16T20:11:00Z

After looking in documentation, i finally just reported the discrepancy to CodeCov in their forums: https://community.codecov.com/t/github-status-check-reported-too-early/3043

https://docs.codecov.com/docs/github-checks

https://docs.codecov.com/docs/commit-status

cesswairimu · 2021-07-18T08:49:03Z

thanks Jeff 🚀 🚀

* add wait_for_ajax for barnstar errors (2nd attempt) copy of #9909 for codeCov debugging cc #9873 * notify: after_n_builds: 5

…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873

* add wait_for_ajax for barnstar errors (2nd attempt) copy of publiclab#9909 for codeCov debugging cc publiclab#9873 * notify: after_n_builds: 5

…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873

* add wait_for_ajax for barnstar errors (2nd attempt) copy of publiclab#9909 for codeCov debugging cc publiclab#9873 * notify: after_n_builds: 5

jywarren added the bug the issue is regarding one of our programs which faces problems when a certain task is executed label Jun 29, 2021

jywarren added the high-priority label Jun 29, 2021

jywarren added a commit that referenced this issue Jul 3, 2021

Try locking tag records while updating their "activity_timestamp" fie…

8a652a7

…ld to avoid db errors in testing re: #9873

jywarren mentioned this issue Jul 3, 2021

Try locking tag records while updating their "activity_timestamp" field #9881

Merged

jywarren mentioned this issue Jul 6, 2021

Order tags based on best match to search query #9879

Merged

5 tasks

jywarren added a commit that referenced this issue Jul 6, 2021

Try locking tag records while updating their "activity_timestamp" fie…

8b3b759

…ld to avoid db errors in testing (#9881) re: #9873

17sushmita pushed a commit to 17sushmita/plots2 that referenced this issue Jul 7, 2021

Try locking tag records while updating their "activity_timestamp" fie…

7292554

…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873

jywarren mentioned this issue Jul 13, 2021

Create .codecov.yml #9905

Merged

jywarren closed this as completed in #9905 Jul 16, 2021

jywarren mentioned this issue Jul 16, 2021

Update action.yml #9910

Merged

This was referenced Jul 16, 2021

Update and rename .github/merge-pr/action.yml to .github/workflows/me… #9911

Merged

Rename .github/workflows/tests.yml to .github/workflows/tests/action.yml #9912

Merged

change token permissions on merge-pr action #9913

Merged

This was referenced Jul 16, 2021

Make gh actions labeling conditional on event type #9914

Closed

Update merge-pr.yml #9915

Merged

Trigger extra CI after merge to main branch, on workflow completion only #9916

Merged

jywarren added a commit that referenced this issue Jul 16, 2021

add wait_for_ajax for barnstar errors (2nd attempt)

71be035

copy of #9909 for codeCov debugging cc #9873

jywarren mentioned this issue Jul 16, 2021

add wait_for_ajax for barnstar errors (2nd attempt) #9917

Merged

jywarren changed the title ~~Periodic test failures in GitHub Actions CI~~ Periodic test failures in GitHub Actions CI and CodeCov Jul 16, 2021

jywarren added a commit that referenced this issue Jul 20, 2021

add wait_for_ajax for barnstar errors (2nd attempt) (#9917)

02315eb

* add wait_for_ajax for barnstar errors (2nd attempt) copy of #9909 for codeCov debugging cc #9873 * notify: after_n_builds: 5

reginaalyssa pushed a commit to reginaalyssa/plots2 that referenced this issue Oct 16, 2021

Try locking tag records while updating their "activity_timestamp" fie…

258bacb

…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873

billymoroney1 pushed a commit to billymoroney1/plots2 that referenced this issue Dec 28, 2021

Try locking tag records while updating their "activity_timestamp" fie…

20d44ca

…ld to avoid db errors in testing (publiclab#9881) re: publiclab#9873

Periodic test failures in GitHub Actions CI and CodeCov #9873

Periodic test failures in GitHub Actions CI and CodeCov #9873

Comments

jywarren commented Jun 29, 2021

jywarren commented Jun 29, 2021

jywarren commented Jun 29, 2021

daemon1024 commented Jun 30, 2021

jywarren commented Jul 2, 2021

jywarren commented Jul 3, 2021

jywarren commented Jul 3, 2021

jywarren commented Jul 3, 2021 • edited

cesswairimu commented Jul 4, 2021

RuthNjeri commented Jul 5, 2021 • edited

jywarren commented Jul 6, 2021

jywarren commented Jul 6, 2021

jywarren commented Jul 6, 2021

Tlazypanda commented Jul 8, 2021

cesswairimu commented Jul 10, 2021 • edited

cesswairimu commented Jul 10, 2021

jywarren commented Jul 13, 2021 • edited

jywarren commented Jul 13, 2021

jywarren commented Jul 13, 2021

jywarren commented Jul 13, 2021

jywarren commented Jul 13, 2021

jywarren commented Jul 13, 2021

jywarren commented Jul 13, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

jywarren commented Jul 16, 2021

cesswairimu commented Jul 18, 2021

jywarren commented Jul 3, 2021 •

edited

RuthNjeri commented Jul 5, 2021 •

edited

cesswairimu commented Jul 10, 2021 •

edited

jywarren commented Jul 13, 2021 •

edited