[uss_quafilier] Lower severity of cleanup no to have 'high' during cleanup phase#1404
Conversation
|
|
||
| ## Cleanup | ||
| ### ⚠️ Successful flight deletion check | ||
| ### ℹ️ Successful flight deletion check |
There was a problem hiding this comment.
Since the requirement DeleteFlightSuccess specifies that a USS must remove a flight when instructed, I don't think Low severity is an appropriate categorization of this failure. A requirement is violated, but that violation doesn't warrant stopping the test scenario/cleanup -- that seems to match the description of Medium severity, so I don't think we want to change the severity of this check (or any of the others where a requirement is violated but we still want to continue test scenario/cleanup execution).
There was a problem hiding this comment.
Ok, but then for #1401, for "probably shouldn't fail regardless of {NotPlanned, Closed} in any situation since this is merely cleanup.", what should we do with fails?
Because it could be for multiple reasons (either dues to USS not failling the specs or something else), should we make the test smarter to differentiate both cases?
There was a problem hiding this comment.
The documentation is not super clear on what this check is actually checking, but since it cites DeleteFlightSuccess, presumably it is checking whether the flight was deleted successfully. I would therefore expect this check to fail under a wide variety of circumstances where uss_qualifier could not verify that the flight was deleted (e.g., query was not successful, return code was anything other than the proper code for success, response was improperly formatted, server indicated extant flight was not deleted, etc).
Ok, but then for #1401, for "probably shouldn't fail regardless of {NotPlanned, Closed} in any situation since this is merely cleanup.", what should we do with fails?
Medium severity indicates that a requirement was violated, but test scenario execution should not stop. That seems like the appropriate thing to do with failure in this case. Low severity does not seem appropriate because a requirement actually was violated if the flight was not deleted. The problem addressed by "probably shouldn't fail regardless of {NotPlanned, Closed} in any situation since this is merely cleanup." is that we don't know for sure that a flight actually exists when we perform this check in the cleanup phase since the cleanup phase can be reached from many different points in test scenario execution. Since we are really only interested in making sure the flight is gone, accepting either NotPlanned or Closed satisfies that interest without needing to make sure our logic of expected response versus existing flight is correct.
Because it could be for multiple reasons (either dues to USS not failling the specs or something else), should we make the test smarter to differentiate both cases?
We could often make checks more granular. For instance, instead of just checking that a response satisfies the API format, we could have one check for field 1, a separate check for field 2, etc. Whether that is useful depends primarily on whether different requirements would be violated since the different details of the violation (e.g., field 1 was invalid versus field 2 was invalid; or format was invalid versus closure state was invalid) can be clearly communicated for the same check in the details of the failed check. In this case, I'm not sure what we would gain by splitting this check into multiple more granular checks. However, we should certainly always clearly communicate the reason the check failed.
monitoring/uss_qualifier/scenarios/astm/utm/off_nominal_planning/down_uss.md
Outdated
Show resolved
Hide resolved
There should be very few circumstances in which a check during cleanup should have High severity, as failing that check will prevent all the rest of the cleanup. |
Ok, I did a global pass and removed most of them. The only one left is "Restore locality check" in configure_locality.md, as not restoring original state seems bad, but tell me if it need to be changed as well. |
configure_locality works by attempting to set the locality of multiple mock_uss instances (or just one if just one is specified, but it is capable of handling multiple). If mock_uss 1 and mock_uss 2 have their localities changed successfully and then mock_uss 3 fails to change its locality, the idea is to try to leave the ecosystem in a consistent state. Since mock_uss 3 can't be set to the new locality (and is therefore set to the old locality), the only consistent ecosystem state is to be set to the old locality. Therefore, mock_uss 1 and mock_uss 2 should be set back to the old locality in cleanup. If there is a problem setting mock_uss 1 back to the old locality in cleanup, the question is whether we should still attempt to set mock_uss 2 back to the old locality. High severity prevents mock_uss 2 from being set back to the old locality since cleanup will stop after mock_uss 1 can't be set back to the old locality. This seems neutral since we're guaranteed to be unable to achieve a consistent ecosystem state, but I think the check should still be Medium severity since we may (at some future time) add another cleanup action after attempting to restore old localities, and that action will potentially not be attempted if we stop cleanup early with a High severity check. That is why I think High severity checks should be extremely rare in cleanup phases, including in configure_locality: we generally expect all cleanup actions to be attempted, and High severity checks potentially prevent actions after that check from occurring. Unless that is actually the intent (stop attempting to clean up if that check fails), no cleanup check severities should exceed Medium. |
BenjaminPelletier
left a comment
There was a problem hiding this comment.
This is great, but I suspect we have a large number of High-severity checks still present in cleanup phases. For instance, op_intent_ref_simple's cleanup links to a test step fragment with three High-severity checks.
This PR marks all flight deletion check during cleanup as informative, as failling the check may be considered informative, as described in #1401 .
This also fix globally cleanup checks; removing high severity.
Edit: This also contribute to #1400 to avoid stopping during flight cleanup