Fix flaky TestEnqueueMDMCommand test#24697
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #24697 +/- ##
==========================================
- Coverage 63.58% 63.57% -0.01%
==========================================
Files 1601 1601
Lines 151666 151666
Branches 3898 3898
==========================================
- Hits 96431 96422 -9
- Misses 47571 47577 +6
- Partials 7664 7667 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
| require.Len(t, listCmdResp.Results, 1) | ||
| require.NotZero(t, listCmdResp.Results[0].UpdatedAt) | ||
| listCmdResp.Results[0].UpdatedAt = time.Time{} | ||
| results, err := json.Marshal(listCmdResp.Results) |
There was a problem hiding this comment.
tiny linting issue here (needs an error check)
jahzielv
left a comment
There was a problem hiding this comment.
Nice solve! LGTM (aside from the linting thing) because this does fix it, but to your point maybe some cleanup would be a better long-term solution?
We do have this cleanup in the TearDownTest method for the MDM suite. Perhaps we also need to delete from the host_mdm_apple_bootstrap_packages table? Or also the nano_commands table, seems like we're not cleaning that one out either.
That sounds promising, I can try that! |
@jahzielv alas this did not fix the issue -- I tried clearing I fixed the lint issue if you're comfortable 👍 with the current approach, or have other suggestions. |
|
@sgress454 aw nuts! Thanks for trying that anyway! I'd say let's go with this fix for now, and we can do a deeper investigation later. |
This PR adds a new workflow called "Stress Test Go Test" (aka the RandoKiller) that allows running one or more tests repeatedly up to a set number of times, or until a test fails. This is useful for: * Trying to diagnose and debug a flaky test * Verifying that a proposed fix for a flaky test actually works. To use: 1. Create a branch whose name ends with "-randokiller" 2. Modify the .github/workflows/config/randokiller.json file to your specifications (choosing the packages and tests to run, the mysql matrix, and the number of runs to do) 3. Push up the branch Since the stress test is intended to run a branch that you'll never merge, you should feel free to add whatever logs to your tests or code that will help diagnose failures. I used this to diagnose and fix #24697!
|
lol i accidentally closed this by saying that i used the randokiller to "fix" this issue 🤦 |
FYI this was diagnosed and fixed using the RandoKiller.
This PR fixes the TestEnqueueMDMCommand, which has been failing intermittently here. Most of the time the
/api/latest/fleet/mdm/apple/commandsAPI is returning one result as expected, but occasionally it returns 2, for example:It seems that the second command is related to trying to install a bootstrap package (uploaded by a previous test) to the newly-enrolled host.
The fix in this PR is to filter the API response to only the command we're verifying the presence of. It's a decent solve, but leaves open the edge case of a bug that causes multiple commands to be sent unexpectedly. The ideal solution would be to remove the interaction between the two tests, perhaps by deleting any created bootstraps before those tests complete, or by re-initializing the state in some other way. I don't currently have enough context to easily implement a solution like that (i.e. I know there's a "delete bootstrap" API, but not sure if that's enough to solve this issue).