Update PV and LB to use Unmounted #1363

Sean-Holcomb · 2022-08-22T22:46:22Z

What does this PR change?

This PR fixes a few issues with PV Ingestion from Prom data with the goal of achieving Cluster Equality. Cluster Equality in this context means that queries on Allocations and Assets grouped by Cluster would have totals that are equal. Cluster Equality does depend on post ingestion processes such as Reconciliation and shared tenancy costs, however the way certain values were being ingestion had down stream effects those processes. The main focus here has been ensuring that PV and LB costs that were not being set to Allocations are now being applied to an Unmounted Allocation for a cluster. This Unmount Allocation now has its PVs and Services populated which allows it to be properly reconciled.

This PR fixes a bug with the interval calculations on PV ingestion. The intervals are a great tool for determining the proportion of a PV's cost should go to pods which are attached to it. This is especially true in Windows when pods do not exist for the entire interval. The bug is that we were determining the time coef (precent of time that an interval) was being calculate off the pods window rather than off the PV window. For example given:

PV:     |----|
POD1:   |--|
POD2:      |--|

In this example the PV runs for an hour and Pod1 runs for the first half hour and Pod2 for the second half hour.
With the bug these coefs:

POD1: {
  Time: 1
  Proportion: 1
},
POD2: {
  Time: 1
  Proportion: 1
}

This says that 100% of the time POD1 was running it can be attributed 100% of PV's cost, however this coefficient is multiplied with the total cost of the PVC which means 200% of the cost is distributed. To address this we calculate Time with the PVC window rather tan the POD window.

This Issue ties in with the larger issue of Unmounted resources because there was no handling of the case where no Pods were running at the same time a PVC.

Does this PR relate to any other PRs?

How will this PR impact users?

Does this PR address any GitHub or Zendesk issues?

Closes ...

How was this PR tested?

Unit tests for Interval have been updated, but there is a need for more Unit testing on other helper funcs in costmodel/allocation.go

Does this PR require changes to documentation?

Have you labeled this PR and its corresponding Issue as "next release" if it should be part of the next Opencost release? If not, why not?

Sean-Holcomb · 2022-08-22T22:49:17Z

pkg/kubecost/allocation.go

@@ -30,7 +30,7 @@ const SharedSuffix = "__shared__"
 // UnallocatedSuffix indicates an unallocated allocation property
 const UnallocatedSuffix = "__unallocated__"

-// UnmountedSuffix indicated allocation to an unmounted PV


One of the big changes in this PR is that an Unmounted allocation can now have LB cost for LBs for which no pod has their selector for the given window

Sean-Holcomb · 2022-08-22T22:54:40Z

pkg/costmodel/allocation.go

@@ -2724,7 +2708,6 @@ func (p Pod) AppendContainer(container string) {
 // TODO:CLEANUP add PersistentVolumeClaims field to type Allocation?
 type PVC struct {
 	Bytes     float64   `json:"bytes"`
-	Count     int       `json:"count"`


count is replaced by the coefficient calculated with intervals and the number of allocations in a pod

Sean-Holcomb · 2022-08-22T23:24:23Z

pkg/costmodel/allocation.go

-					gib := pvc.Bytes / 1024 / 1024 / 1024
-					cost := pvc.Volume.CostPerGiBHour * gib * hrs
+				pvcPodWindowMap[thisPVCKey][thisPodKey] = kubecost.NewWindow(&s, &e)
+			}
+		}
+	}

-					// Scale PV cost by PVC sharing coefficient.
-					if coeffComponents, ok := sharedPVCCostCoefficientMap[pvcKey][podKey]; ok {
-						cost *= getCoefficientFromComponents(coeffComponents)
-					} else {
-						log.Warnf("CostModel.ComputeAllocation: allocation %s and PVC %s have relation but no coeff", alloc.Name, pvc.Name)
-					}


This is where the total PVC cost is being multiplied by Coefficients which could sum to greater than 100%

Sean-Holcomb · 2022-08-23T00:11:25Z

pkg/costmodel/allocation.go

-						Name:    pvc.Volume.Name,
-					}
-					alloc.PVs[pvKey] = &kubecost.PVAllocation{
-						ByteHours: pvc.Bytes * hrs / count,


Not multiplying by coef here can result in reconciliation giving more cost to this allocation then it should actually get.

nikovacevic

I think this looks good, though I will admit that this is a challenging review. Is there any way I can help with additional testing to make sure that this is all rock solid?

nikovacevic · 2022-08-31T18:04:49Z

pkg/kubecost/audit.go

+	case string(AuditClusterEquality):
+		return AuditClusterEquality


It's nice to have this to rely on -- I'm assuming that uncommenting this means that it passes now?

It passes our tests, and on my cluster, given everything that I have seen in the Costmodel in the last couple weeks I don't think it will pass in all situations but I think it is a good audit have in that it lets us know our short comings on this invariant.

nikovacevic · 2022-08-31T18:06:49Z

pkg/costmodel/cluster.go

@@ -155,7 +155,7 @@ func ClusterDisks(client prometheus.Client, provider cloud.Provider, start, end
 	ctx := prom.NewNamedContext(client, prom.ClusterContextName)
 	queryPVCost := fmt.Sprintf(`avg(avg_over_time(pv_hourly_cost[%s])) by (%s, persistentvolume,provider_id)`, durStr, env.GetPromClusterLabel())
 	queryPVSize := fmt.Sprintf(`avg(avg_over_time(kube_persistentvolume_capacity_bytes[%s])) by (%s, persistentvolume)`, durStr, env.GetPromClusterLabel())
-	queryActiveMins := fmt.Sprintf(`count(pv_hourly_cost) by (%s, persistentvolume)[%s:%dm]`, env.GetPromClusterLabel(), durStr, minsPerResolution)
+	queryActiveMins := fmt.Sprintf(`avg(kube_persistentvolume_capacity_bytes) by (%s, persistentvolume)[%s:%dm]`, env.GetPromClusterLabel(), durStr, minsPerResolution)


Any risks to swapping this? And what exactly are the benefits? I can't think of any, in either direction, but want to check in the interest of being super careful.

This is in regards to https://github.com/kubecost/kubecost-cost-model/pull/927 I suspect that the use of pv_hourly_cost for our active mins query is causing this log when kubecost is down and not emitting the metric but KSM still is. I might be totally off base on this, but I think this change is low risk and may address the problem

nikovacevic · 2022-08-31T18:09:49Z

pkg/costmodel/allocation.go

 	for _, pod := range podMap {
 		for _, alloc := range pod.Allocations {
 			cluster := alloc.Properties.Cluster
 			nodeName := alloc.Properties.Node
 			namespace := alloc.Properties.Namespace
-			pod := alloc.Properties.Pod
+			podName := alloc.Properties.Pod


nikovacevic · 2022-08-31T18:10:47Z

pkg/costmodel/allocation.go

-			return err
-		}
+// LB describes the start and end time of a Load Balancer along with cost
+type LB struct {


Nit, but we might not want to export this. Seems reasonable for internal use.

michaelmdresser

Okay, I think I'm on board with the idea here: allocating PV cost to Allocations based on the amount of time the Allocation attached the PV via PVC. I sort of bias towards Niko's (?) idea of allocating based on weighted cost, but at the end of the day I think having proper aggregation equality at the end of the day is more important.

I like the refactoring, too. Hopefully it'll be easier to navigate computeAllocation() logic in the future!

Sean-Holcomb · 2022-09-02T05:24:24Z

I too would have advised against the interval approach if I knew about it in the start, it seemed like a waste to throw it out at this point rather than fix it. The accuracy to complexity trade off probably isn't worth it though.

michaelmdresser · 2022-09-02T13:25:44Z

Yep, totally makes sense!

Signed-off-by: Sean Holcomb <seanholcomb@gmail.com>

Sean-Holcomb commented Aug 22, 2022

View reviewed changes

Sean-Holcomb force-pushed the sean/cost-model-alloc-pv-lb branch from 673232d to 3a63202 Compare August 22, 2022 23:48

Sean-Holcomb commented Aug 23, 2022

View reviewed changes

Sean-Holcomb requested review from michaelmdresser and mbolt35 August 23, 2022 00:13

Sean-Holcomb force-pushed the sean/cost-model-alloc-pv-lb branch from 3a63202 to 6cbded8 Compare August 31, 2022 18:03

nikovacevic approved these changes Aug 31, 2022

View reviewed changes

Sean-Holcomb force-pushed the sean/cost-model-alloc-pv-lb branch from 6cbded8 to d01b648 Compare September 1, 2022 01:57

michaelmdresser approved these changes Sep 1, 2022

View reviewed changes

Fix intervals, handle unmounted PVC and LBs

0fabcbf

Signed-off-by: Sean Holcomb <seanholcomb@gmail.com>

Sean-Holcomb force-pushed the sean/cost-model-alloc-pv-lb branch from d01b648 to 0fabcbf Compare September 6, 2022 04:36

Sean-Holcomb merged commit 221141a into develop Sep 6, 2022

Sean-Holcomb added the v1.97 label Sep 6, 2022

Adam-Stack-PM added the enhancement New feature or request label Sep 19, 2022

michaelmdresser deleted the sean/cost-model-alloc-pv-lb branch June 23, 2023 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update PV and LB to use Unmounted #1363

Update PV and LB to use Unmounted #1363

Sean-Holcomb commented Aug 22, 2022 •

edited

Sean-Holcomb Aug 22, 2022

Sean-Holcomb Aug 22, 2022

Sean-Holcomb Aug 22, 2022

Sean-Holcomb Aug 23, 2022

nikovacevic left a comment

nikovacevic Aug 31, 2022

Sean-Holcomb Aug 31, 2022

nikovacevic Aug 31, 2022

Sean-Holcomb Aug 31, 2022

nikovacevic Aug 31, 2022

nikovacevic Aug 31, 2022

michaelmdresser left a comment

Sean-Holcomb commented Sep 2, 2022

michaelmdresser commented Sep 2, 2022

		case string(AuditClusterEquality):
		return AuditClusterEquality

Update PV and LB to use Unmounted #1363

Update PV and LB to use Unmounted #1363

Conversation

Sean-Holcomb commented Aug 22, 2022 • edited

What does this PR change?

Does this PR relate to any other PRs?

How will this PR impact users?

Does this PR address any GitHub or Zendesk issues?

How was this PR tested?

Does this PR require changes to documentation?

Have you labeled this PR and its corresponding Issue as "next release" if it should be part of the next Opencost release? If not, why not?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikovacevic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelmdresser left a comment

Choose a reason for hiding this comment

Sean-Holcomb commented Sep 2, 2022

michaelmdresser commented Sep 2, 2022

Sean-Holcomb commented Aug 22, 2022 •

edited