Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Estimates on VMs] - Improving vHost-Ratio splitting #4

Closed
ArneTR opened this issue Jul 1, 2023 · 2 comments
Closed

[Estimates on VMs] - Improving vHost-Ratio splitting #4

ArneTR opened this issue Jul 1, 2023 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@ArneTR
Copy link
Member

ArneTR commented Jul 1, 2023

At the moment the model uses the vhost-ratio parameter to split the energy in a virtualized system.

An example:

  • A machine might have 40 cores, but we are only assigned one
  • We can then measure a CPU utilization of 100%
  • We have a vhost-ratio parameter of 1/40
  • Then the result will be whatever the power draw of the full machine is divided by 40

This is the mechanism we account for at the moment when the model is in a virtualized system.

However, this is most likely not the case. The machine as a whole is most likely rather on a 20-70% utilization, as it is normal with cloud vendors.
VMs-are-utilized-only10-percent-_OpenHPI_VMWare_Talk

The picture from VM ware shows non-hyperscaler datacenters. Hyperscalers report however a higher utilization.

I propose setting a new variable that is the "bare-metal-utilization" and then rather using the vhost-ratio as a factor to shift that a little.

An example:

  • If we have a machine that is assigned 1 core out of 40
  • and we assume we are on Google
  • then we set the bare-metal-utilization to 0.5, which means 50%
  • When our measured utilzation on our core is now 100% we request from the model the utilization of (0.5 + 1/40 = 0.525)

The downside with that approach is that it is even one more extra assumption. However assuming that no one else on the machine, as we did before, is most likely wrong.

On a very high-core machine the resulting values will then change only in very small quantities, which is probably more closer to reality, but also incentivises users less to reduce CPU consumption as the effects are smaller.

This is an idea and I would like to discuss it. Especially if there are logical errors in it ...

@ArneTR ArneTR added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Jul 1, 2023
@ArneTR
Copy link
Member Author

ArneTR commented Jul 1, 2023

Added to this: Here some actual measurements from a 48-core machine we have:

  • Power draw in idle: 17.60 W
  • Power draw with one core on 100%: 65.5W
  • Power draw with 24 cores on 100%: 181.57 W
  • Power draw with 25 cores on 100%: 186.18 W
  • Power draw with 48 cores on 100%: 214.62 W

As we can see the power draw in idle is greatly reduced and strongly non linear. However, this is not what we wanted to look at.

If the model estimation would be spot on for the bare metal machine, but we would now virtualize it and assign ourselves one core than we would guess 214.62 W and then divide by 48, which equals to 4.47 W

if we use the method proposed here by setting an operating point (bare-metal-utilization ) we would guess 186.18 W and then divided it by 48 which equals to 3.88 W

@ArneTR
Copy link
Member Author

ArneTR commented Jul 24, 2023

Another idea on this topic:

If you assume the host machine is typically loaded 50% you are likely to be more correct on average given that cloud vendors operate typically in this region.

But if you do not assume and rather give the whole spectrum, as this model is currently doing, then you are also incentvized more to use less, and on average you will likely see the same as when put as in the 50% case, because sometimes you are actually on a low machine and sometimes you are not.

But the penality for doing high CPU is bigger.

In general the question arisis if you want more reproducible results from this model (which is better for benchmarking and quantifiying your own improvements to the code) or more results that are actually closer to what your code would actually consume in the cloud.

Both cases are valid and maybe both should be an option .... however I believe that this distinction is quite complex to understand for beginners and it might be better to be opionionated ...?

@green-coding-solutions green-coding-solutions locked and limited conversation to collaborators Aug 13, 2023
@ArneTR ArneTR converted this issue into discussion #5 Aug 13, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant