Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't add gpu pricing when no gpu #1193

Merged
merged 1 commit into from
May 6, 2022
Merged

Don't add gpu pricing when no gpu #1193

merged 1 commit into from
May 6, 2022

Conversation

dramich
Copy link
Contributor

@dramich dramich commented May 4, 2022

What does this PR change?

  • When building node pricing, no longer add GPU costs if the node does not have an associated GPU

How will this PR impact users?

  • Fix idle cost being associated with nodes that do not have a GPU when using custom pricing

Does this PR address any GitHub or Zendesk issues?

How was this PR tested?

  • Running in a custom cluster before and after the change
  • Running in an EKS cluster with two non-GPU nodes and a single GPU node. Tested before and after the fix to ensure idle time was not attributed to nodes without the GPU and GPU node showed idle time.

Does this PR require changes to documentation?

  • n/a

Have you labeled this PR and its corresponding Issue as "next release" if it should be part of the next Kubecost release? If not, why not?

@dramich dramich changed the title Don't add gpu pricing when no gpu [WIP] Don't add gpu pricing when no gpu May 4, 2022
gpuCostMap[key] = gpuCost
gpuCostMap[key] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to be in a situation with custom pricing in which:

  1. A node's GPU cost is (intentionally) set to something > 0
  2. Our GPU count query doesn't pick up the node's GPU, possibly because the node metadata is set up incorrectly or our tracking of node metadata is incomplete

I think this change would cause the node's GPU cost to be 0 when it shouldn't be.


But perhaps that's okay and we should just be setting count correctly! If that's the case, then the code change you've made is perfect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was one of the scenarios I was considering and I think that if we can't/didn't count the GPUs correctly pricing would be wrong anyways. At least if it shows up as a zero and you know the node has a GPU that is a better indication that something is wrong vs showing incorrect pricing on a node that has 10 GPUs and we show the cost of 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with that reasoning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Let's just make sure that @AjayTripathy is aware.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine but let's not snap this into the release since I'm not sure of all the implications and we don't have good GPU integration tests...that sounds good to everyone? Let's take a follow on task to add a GPU integration test for GCP and AWS, we'll have a month to get confident here.

@dramich dramich added the v1.94 label May 5, 2022
@dramich dramich changed the title [WIP] Don't add gpu pricing when no gpu Don't add gpu pricing when no gpu May 5, 2022
@dramich dramich marked this pull request as ready for review May 5, 2022 23:11
gpuCostMap[key] = gpuCost
gpuCostMap[key] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Let's just make sure that @AjayTripathy is aware.

gpuCostMap[key] = gpuCost
gpuCostMap[key] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine but let's not snap this into the release since I'm not sure of all the implications and we don't have good GPU integration tests...that sounds good to everyone? Let's take a follow on task to add a GPU integration test for GCP and AWS, we'll have a month to get confident here.

@dramich dramich merged commit e8ecd57 into develop May 6, 2022
@dramich dramich deleted the ramich-gpu branch June 21, 2022 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allocation API has GPU costs when GPU count is 0
4 participants