Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

staticpod: improve operator messages #913

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
91 changes: 76 additions & 15 deletions pkg/operator/staticpod/controller/installer/installer_controller.go
Expand Up @@ -422,15 +422,77 @@ func setNodeStatusFn(status *operatorv1.NodeStatus) v1helpers.UpdateStaticPodSta
}
}

type revisionDescriptionPrinter struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an object in the first place and not just a simple func?

nodes []operatorv1.NodeStatus
latestRevision int32
progressing bool
}

func newRevisionDescriptionPrinter(nodes []operatorv1.NodeStatus, latestRevision int32, progressing bool) *revisionDescriptionPrinter {
return &revisionDescriptionPrinter{nodes: nodes, latestRevision: latestRevision, progressing: progressing}
}

func pluralizedNode(i int) string {
if i < 2 {
return "node"
}
return "nodes"
}

func (r revisionDescriptionPrinter) String() string {
currentRevisions := sets.NewInt32()
latestCount := 0
availableCount := 0
for i := range r.nodes {
c := r.nodes[i].CurrentRevision
currentRevisions.Insert(c)
if c == r.latestRevision {
latestCount++
}
if c != 0 {
availableCount++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this mean anything about availability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the existing code "numAvailable" is determined by CurrentRevision!=0 ... So i guess CurrentRevision==0 means the node is not available or not actively making progress? i'm a bit confused here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has nothing to do with availability. It only says that this revision was the last that got ready eventually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastReadyRevision ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it nonZeroCount

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastReadyRevision

It's a count.

}
}

notAvailable := fmt.Sprintf("%d %s are not available and ", availableCount, pluralizedNode(availableCount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot follow. This must be len(nodes)-availableCount, no?

And better turn the wording into: %d %s have never been available and

if availableCount == len(r.nodes) {
notAvailable = ""
}
if availableCount == 0 {
return "none of the nodes are available"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of the nodes was ever available.

}

if !currentRevisions.Has(r.latestRevision) {
if r.progressing {
return fmt.Sprintf("%snodes progressing towards revision %d", notAvailable, r.latestRevision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am confused about notAvailable (e.g. none of the nodes are available) to be plugged into the first %s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried to incorporate that into message, so we don't lose it but also get rid of the ; .... I think it will read as "none of the nodes are available and nodes progressing towards revision N"

Copy link
Contributor

@sttts sttts Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!currentRevisions.Has(r.latestRevision) does not match message. Better check that there is at least one node that is not at latestRevision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. you mean "all nodes progressing ..."

}
return fmt.Sprintf("%snone of the nodes reached latest available revision %d", notAvailable, r.latestRevision)
}
if currentRevisions.Equal(sets.NewInt32(r.latestRevision)) {
return fmt.Sprintf("%sall nodes reached revision %d", notAvailable, r.latestRevision)
}
if currentRevisions.Has(r.latestRevision) {
oldRevisionStringList := []string{}
for _, i := range currentRevisions.Difference(sets.NewInt32(r.latestRevision)).List() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop over currentRevisions and exclude latestRevision. No need for set arithmetic.

oldRevisionStringList = append(oldRevisionStringList, fmt.Sprintf("%d", i))
}
return fmt.Sprintf("%ssome nodes are on old revisions (%s), %d %s reached latest available revision %d", notAvailable, strings.Join(oldRevisionStringList, ","), latestCount, pluralizedNode(latestCount), r.latestRevision)
}
return ""
}

// setAvailableProgressingConditions sets the Available and Progressing conditions
func setAvailableProgressingNodeInstallerFailingConditions(newStatus *operatorv1.StaticPodOperatorStatus) error {
// Available means that we have at least one pod at the latest level
numAvailable := 0
numAtLatestRevision := 0
numProgressing := 0
numNodes := len(newStatus.NodeStatuses)
counts := map[int32]int{}
failingCount := map[int32]int{}
failing := map[int32][]string{}
var latestRevision int32

for _, currNodeStatus := range newStatus.NodeStatuses {
counts[currNodeStatus.CurrentRevision] = counts[currNodeStatus.CurrentRevision] + 1
if currNodeStatus.CurrentRevision != 0 {
Expand All @@ -448,31 +510,26 @@ func setAvailableProgressingNodeInstallerFailingConditions(newStatus *operatorv1
} else {
numProgressing += 1
}
}

revisionStrings := []string{}
for _, currentRevision := range Int32KeySet(counts).List() {
count := counts[currentRevision]
revisionStrings = append(revisionStrings, fmt.Sprintf("%d nodes are at revision %d", count, currentRevision))
}
// if we are progressing and no nodes have achieved that level, we should indicate
if numProgressing > 0 && counts[newStatus.LatestAvailableRevision] == 0 {
revisionStrings = append(revisionStrings, fmt.Sprintf("%d nodes have achieved new revision %d", 0, newStatus.LatestAvailableRevision))
if newStatus.LatestAvailableRevision > latestRevision {
latestRevision = newStatus.LatestAvailableRevision
}
}
revisionDescription := strings.Join(revisionStrings, "; ")

revisionDescription := newRevisionDescriptionPrinter(newStatus.NodeStatuses, latestRevision, numProgressing > 0).String()

if numAvailable > 0 {
v1helpers.SetOperatorCondition(&newStatus.Conditions, operatorv1.OperatorCondition{
Type: condition.StaticPodsAvailableConditionType,
Status: operatorv1.ConditionTrue,
Message: fmt.Sprintf("%d nodes are active; %s", numAvailable, revisionDescription),
Message: revisionDescription,
})
} else {
v1helpers.SetOperatorCondition(&newStatus.Conditions, operatorv1.OperatorCondition{
Type: condition.StaticPodsAvailableConditionType,
Status: operatorv1.ConditionFalse,
Reason: "ZeroNodesActive",
Message: fmt.Sprintf("%d nodes are active; %s", numAvailable, revisionDescription),
Message: revisionDescription,
})
}

Expand All @@ -481,22 +538,26 @@ func setAvailableProgressingNodeInstallerFailingConditions(newStatus *operatorv1
v1helpers.SetOperatorCondition(&newStatus.Conditions, operatorv1.OperatorCondition{
Type: condition.NodeInstallerProgressingConditionType,
Status: operatorv1.ConditionTrue,
Message: fmt.Sprintf("%s", revisionDescription),
Message: revisionDescription,
})
} else {
v1helpers.SetOperatorCondition(&newStatus.Conditions, operatorv1.OperatorCondition{
Type: condition.NodeInstallerProgressingConditionType,
Status: operatorv1.ConditionFalse,
Reason: "AllNodesAtLatestRevision",
Message: fmt.Sprintf("%s", revisionDescription),
Message: revisionDescription,
})
}

if len(failing) > 0 {
failingStrings := []string{}
for _, failingRevision := range Int32KeySet(failing).List() {
errorStrings := failing[failingRevision]
failingStrings = append(failingStrings, fmt.Sprintf("%d nodes are failing on revision %d:\n%v", failingCount[failingRevision], failingRevision, strings.Join(errorStrings, "\n")))
if count := failingCount[failingRevision]; count == 1 {
failingStrings = append(failingStrings, fmt.Sprintf("%d/%d node is failing to achieve revision %d:\n%v", failingCount[failingRevision], numNodes, failingRevision, strings.Join(errorStrings, "\n")))
} else {
failingStrings = append(failingStrings, fmt.Sprintf("%d/%d nodes are failing to achieve revision %d:\n%v", failingCount[failingRevision], numNodes, failingRevision, strings.Join(errorStrings, "\n")))
}
}
failingDescription := strings.Join(failingStrings, "; ")

Expand Down
Expand Up @@ -1307,7 +1307,6 @@ func TestNodeToStartRevisionWith(t *testing.T) {
}

func TestSetConditions(t *testing.T) {

type TestCase struct {
name string
latestAvailableRevision int32
Expand All @@ -1316,9 +1315,12 @@ func TestSetConditions(t *testing.T) {
expectedAvailableStatus operatorv1.ConditionStatus
expectedProgressingStatus operatorv1.ConditionStatus
expectedFailingStatus operatorv1.ConditionStatus
expectedAvailableMessage string
expectedPendingMessage string
expectedFailingMessage string
}

testCase := func(name string, available, progressing, failed bool, lastFailedRevision, latest int32, current ...int32) TestCase {
testCase := func(name string, available, progressing, failed bool, lastFailedRevision, latest int32, availableMessage, pendingMessage, failingMessage string, current ...int32) TestCase {
availableStatus := operatorv1.ConditionFalse
pendingStatus := operatorv1.ConditionFalse
expectedFailingStatus := operatorv1.ConditionFalse
Expand All @@ -1331,16 +1333,56 @@ func TestSetConditions(t *testing.T) {
if failed {
expectedFailingStatus = operatorv1.ConditionTrue
}
return TestCase{name, latest, lastFailedRevision, current, availableStatus, pendingStatus, expectedFailingStatus}
return TestCase{
name: name,
latestAvailableRevision: latest,
lastFailedRevision: lastFailedRevision,
currentRevisions: current,
expectedAvailableStatus: availableStatus,
expectedProgressingStatus: pendingStatus,
expectedFailingStatus: expectedFailingStatus,
expectedAvailableMessage: availableMessage,
expectedPendingMessage: pendingMessage,
expectedFailingMessage: failingMessage,
}
}

testCases := []TestCase{
testCase("AvailableProgressingDegraded", true, true, true, 1, 2, 2, 1, 2, 1),
testCase("AvailableProgressing", true, true, false, 0, 2, 2, 1, 2, 1),
testCase("AvailableNotProgressing", true, false, false, 0, 2, 2, 2, 2),
testCase("NotAvailableProgressing", false, true, false, 0, 2, 0, 0),
testCase("NotAvailableAtOldLevelProgressing", true, true, false, 0, 2, 1, 1),
testCase("NotAvailableNotProgressing", false, false, false, 0, 2),
testCase("AvailableProgressingDegraded", true, true, true, 1, 2,
"some nodes are on old revisions (1), 2 nodes reached latest available revision 2",
"some nodes are on old revisions (1), 2 nodes reached latest available revision 2",
"4/4 nodes are failing to achieve revision 1:\n",
2, 1, 2, 1),
testCase("AvailableProgressingSingular", true, true, true, 1, 2,
"nodes progressing towards revision 2",
"nodes progressing towards revision 2",
"4/4 nodes are failing to achieve revision 1:\n",
1, 1, 3, 1),
testCase("AvailableProgressing", true, true, false, 0, 2,
"some nodes are on old revisions (1), 2 nodes reached latest available revision 2",
"some nodes are on old revisions (1), 2 nodes reached latest available revision 2",
"",
2, 1, 2, 1),
testCase("AvailableNotProgressing", true, false, false, 0, 2,
"all nodes reached revision 2",
"all nodes reached revision 2",
"",
2, 2, 2),
testCase("NotAvailableProgressing", false, true, false, 0, 2,
"none of the nodes are available",
"none of the nodes are available",
"",
0, 0),
testCase("NotAvailableAtOldLevelProgressing", true, true, false, 0, 2,
"nodes progressing towards revision 2",
"nodes progressing towards revision 2",
"",
1, 1),
testCase("NotAvailableNotProgressing", false, false, false, 0, 2,
"none of the nodes are available",
"none of the nodes are available",
"",
),
}

for _, tc := range testCases {
Expand All @@ -1351,28 +1393,45 @@ func TestSetConditions(t *testing.T) {
for _, current := range tc.currentRevisions {
status.NodeStatuses = append(status.NodeStatuses, operatorv1.NodeStatus{CurrentRevision: current, LastFailedRevision: tc.lastFailedRevision})
}
setAvailableProgressingNodeInstallerFailingConditions(status)
if err := setAvailableProgressingNodeInstallerFailingConditions(status); err != nil {
t.Fatalf("unexpected error: %v", err)
}

availableCondition := v1helpers.FindOperatorCondition(status.Conditions, condition.StaticPodsAvailableConditionType)
if availableCondition == nil {
t.Error("Available condition: not found")
} else if availableCondition.Status != tc.expectedAvailableStatus {
return
}
if availableCondition.Status != tc.expectedAvailableStatus {
t.Errorf("Available condition: expected status %v, actual status %v", tc.expectedAvailableStatus, availableCondition.Status)
}
if availableCondition.Message != tc.expectedAvailableMessage {
t.Errorf("expected available message:\n%q\ngot:\n%q\n", tc.expectedAvailableMessage, availableCondition.Message)
}

pendingCondition := v1helpers.FindOperatorCondition(status.Conditions, condition.NodeInstallerProgressingConditionType)
if pendingCondition == nil {
t.Error("Progressing condition: not found")
} else if pendingCondition.Status != tc.expectedProgressingStatus {
return
}
if pendingCondition.Status != tc.expectedProgressingStatus {
t.Errorf("Progressing condition: expected status %v, actual status %v", tc.expectedProgressingStatus, pendingCondition.Status)
}
if pendingCondition.Message != tc.expectedPendingMessage {
t.Errorf("expected pending message:\n%q\ngot:\n%q\n", tc.expectedPendingMessage, pendingCondition.Message)
}

failingCondition := v1helpers.FindOperatorCondition(status.Conditions, condition.NodeInstallerDegradedConditionType)
if failingCondition == nil {
t.Error("Failing condition: not found")
} else if failingCondition.Status != tc.expectedFailingStatus {
return
}
if failingCondition.Status != tc.expectedFailingStatus {
t.Errorf("Failing condition: expected status %v, actual status %v", tc.expectedFailingStatus, failingCondition.Status)
}
if failingCondition.Message != tc.expectedFailingMessage {
t.Errorf("expected failing message:\n%q\ngot:\n%q\n", tc.expectedFailingMessage, failingCondition.Message)
}
})
}

Expand Down