Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix platform power return unit #1468

Merged
merged 1 commit into from
May 24, 2024

Conversation

sunya-ch
Copy link
Collaborator

@sunya-ch sunya-ch commented May 23, 2024

This PR is to fix the issue regarding platform power reported in the slack channel as well as my direct experience in #1466.

I have found the different handling between component power and platform power in the model package.
Power model return power in joule unit in float. However, we handle the power in millijoule in Kepler.
Component power has been converted correctly by GetComponentPower in utils module. On the other hand, platform power is used as-is.

After fix:
Screenshot 2024-05-23 at 23 22 20

As seen in the above validation, there are still more points to investigate or fix.
I have stressed 70% of CPU in a single CPU (it supposes to use 700ms).
The most critical part is the BPF CPU time. As seen above, the total CPU time is much higher than stress time.
However, as seen in the top two panel which are the power meters, the power is significantly increased at the stress time.

Another point need concern is the model that are being used. The default one is 4 cores which produces maximum power at around 200W. In the above result, I changed to use 32 cores model. We need to change default power model to the largest one on the estimator case.

Signed-off-by: Sunyanan Choochotkaew sunyanan.choochotkaew1@ibm.com

Copy link

github-actions bot commented May 23, 2024

🤖 SeineSailor

Here is a concise summary of the pull request changes:

Summary: This pull request addresses issues related to platform power handling in the Kepler project, focusing on consistent energy unit handling and correction of power estimation functions.

Key Modifications:

  1. Logging and unit conversion: Added logging statements for idle and dynamic energy stats, and corrected the conversion of energy units from millijoules to joules.
  2. Constant renaming: Renamed MiliJouleToJoule to MilliJouleToJoule to maintain consistency.
  3. Power estimation function updates: Modified GetPlatformPower functions to return platform power in millijoules instead of joules, and updated return types from []float64 to []uint64.
  4. New utility function: Introduced utils.GetPlatformPower to convert power values from floats to uint64.

Impact: These changes ensure consistent energy unit handling across components and platforms, aligning with the rest of the Kepler project. The updates to power estimation functions and the introduction of a new utility function improve the accuracy and reliability of power handling in the codebase.

Observations/Suggestions:

  • It would be beneficial to include additional tests to verify the correctness of the updated power estimation functions and ensure that the changes do not introduce any regressions.
  • Consider adding documentation or comments to explain the reasoning behind the changes and the importance of consistent energy unit handling in the Kepler project.
  • The pull request could be further improved by breaking down the changes into smaller, more focused commits, making it easier to review and understand the individual modifications.

Copy link
Collaborator

@dave-tucker dave-tucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits, but overall LGTM

@@ -31,7 +31,7 @@ var (
var _ = Describe("Test Exponential Predictor Unit", func() {
It("Get Node Platform Power By Exponential Regression", func() {
powers := GetNodePlatformPowerFromDummyServer(dummyExponentialWeightHandler, types.ExponentialTrainer)
Expect(int(powers[0])).Should(BeEquivalentTo(4))
Expect(int(powers[0]/1000) * 1000).Should(BeEquivalentTo(4000))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the *1000 and /1000 cancel out?

Copy link
Collaborator Author

@sunya-ch sunya-ch May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are to ignore the leftover from exponential, logarithm, and logistic function but still keep the unit to millijoule.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to break this out into a test helper function with a comment on what it's doing then.

@@ -31,7 +31,7 @@ var (
var _ = Describe("Test Logarithmic Predictor Unit", func() {
It("Get Node Platform Power By Logarithmic Regression", func() {
powers := GetNodePlatformPowerFromDummyServer(dummyLogarithmicWeightHandler, types.LogarithmicTrainer)
Expect(int(powers[0])).Should(BeEquivalentTo(2))
Expect(int(powers[0]/1000) * 1000).Should(BeEquivalentTo(2000))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -31,7 +31,7 @@ var (
var _ = Describe("Test Logistic Predictor Unit", func() {
It("Get Node Platform Power By Logistic Regression", func() {
powers := GetNodePlatformPowerFromDummyServer(dummyLogisticWeightHandler, types.LogisticTrainer)
Expect(int(powers[0])).Should(BeEquivalentTo(2))
Expect(int(powers[0]/1000) * 1000).Should(BeEquivalentTo(2000))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

pkg/collector/stats/stats.go Show resolved Hide resolved
@@ -24,7 +24,7 @@ const (
MetricsNamespace = "kepler"
EnergyMetricNameSuffix = "_joules_total"
UsageMetricNameSuffix = "_total"
MiliJouleToJoule = 1000
MilliJouleToJoule = 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could be used to convert in either direction depending on the operator used. So it may be better to rename something like JouleMillijouleConversionFactor 🤷 naming is hard.

@@ -104,7 +104,7 @@ func (r *RatioPowerModel) GetPlatformPower(isIdlePower bool) ([]float64, error)
} else {
processPower = r.getPowerByRatio(processIdx, int(PlatformUsageMetric), int(PlatformDynPower), numProcesses)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make more sense here to fix the return value of r.getPowerByRatio given the float is immediately converted to uint64 here.

@@ -31,7 +31,7 @@ var (
var _ = Describe("Test Exponential Predictor Unit", func() {
It("Get Node Platform Power By Exponential Regression", func() {
powers := GetNodePlatformPowerFromDummyServer(dummyExponentialWeightHandler, types.ExponentialTrainer)
Expect(int(powers[0])).Should(BeEquivalentTo(4))
Expect(int(powers[0]/1000) * 1000).Should(BeEquivalentTo(4000))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to break this out into a test helper function with a comment on what it's doing then.

@@ -21,7 +21,7 @@ import (
)

const (
jouleToMiliJoule = 1000
jouleToMilliJoule = 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use the same constant as eariler?

@sunya-ch
Copy link
Collaborator Author

sunya-ch commented May 24, 2024

@dave-tucker Thank you so much for the review.
I pushed the changes regarding your review. Please confirm here.

[edit] add minor go-lint fix: here

Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
Copy link
Collaborator

@dave-tucker dave-tucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rootfs rootfs merged commit ada7884 into sustainable-computing-io:main May 24, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants