Skip to content

Conversation

@davepacheco
Copy link
Collaborator

Fixes #2994. See #2994 (comment) for details.

I'm going to see if the instances test failure looks similar.

@davepacheco davepacheco requested a review from smklein May 3, 2023 21:45
Comment on lines 1350 to 1352
// Record the current start time. When fetching data points, we never need
// to look at times before this timestamp, since all the events we're
// interested in happen after this test starts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in chats: This will work for "metrics on a single disk", but might not work as well for "system-wide resources", which are made during the populate step.

Should be fine for this test, but caused me issues with the instance metrics test, where the sample "cpu_count = 0" was taken before the test started.

Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this!

@davepacheco davepacheco changed the title fix spurious metrics test failure fix spurious metrics test failures May 4, 2023
@davepacheco
Copy link
Collaborator Author

I confirmed that the test_instance_metrics test failure seen in CI looks the same:
#2994 (comment)

I updated the PR as discussed in chat so that we grab a timestamp as the very first thing when setting up a ControlPlaneTestContext. That's what we use now as the start time of our window when querying for metrics. The end time is Utc::now(), which as discussed seems like it should be fine now that we're waiting for oximeter collections to finish.

I was never able to reproduce the test_instance_metrics test failure locally so it'll be hard to be positive this fixed the problem, but it clearly fixes a problem and I will run a bunch of iterations and make sure I don't continue to see any of these failures.

@davepacheco davepacheco marked this pull request as ready for review May 4, 2023 00:02
@davepacheco
Copy link
Collaborator Author

This ran for a few hours without issue: while RUST_BACKTRACE=1 cargo test -p omicron-nexus -- integration_tests::instances integration_tests::disks; do :; done so I think this is at least an improvement.

@davepacheco davepacheco merged commit ce9ab7f into main May 4, 2023
@davepacheco davepacheco deleted the dap/test-fixes branch May 4, 2023 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spurious failures from disk/instance metric test

3 participants