New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement model-based power estimator #104
Conversation
thank you @sunya-ch for this impressive work! wonder how much cpu and memory the estimator will consume, do you have any data? |
pkg/model/estimate.go
Outdated
} | ||
|
||
|
||
func (e *Estimator) GetPower(modelName string, xCols []string, xValues [][]float32, corePower, dramPower, gpuPower, otherPower []float32) []float32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GetPower
get metrics from all Pods as input, and get power consumption in a batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Power consumption will be reported as a list of RAPL power package while xCols, xValues refers to read metric values such as cpu_cycles per pod (for all pods at the same time).
For example,
at ticker t,
RAPL pkg0 - core = 20, dram = 5, gpu = unknown, other = unknown
RAPL pkg1 - core =30, dram = 1, gpu = unknown, other = unknown
There are 3 pods
Pod A: cpu_cycles = 100, cache_miss=1
Pod B: cpu_cycles = 50, cache_miss=0
Pod C: cpu_cycles = 10, cache_miss=0
xCols = [cpu_cycles, cache_miss]
xValues = [[100, 1], [50, 0], [10, 0]]
corePower = [20, 30]
dramPower = [5, 1]
gpuPower = []
otherPower = []
pkg/model/estimate.go
Outdated
type PowerRequest struct { | ||
ModelName string `json:"model_name"` | ||
XCols []string `json:"x_cols"` | ||
XValues [][]float32 `json:"x_values"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is XCols and XValues?
Might be nice the have more meaningful names or some comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have it in the PR description, and might be good to have it in the code as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see example above.
About naming, how's about
XCols --> MetricNames
XValues --> MetricValuesOfAllPods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
That is great, I really want to have different Power Models, specially bring back the Power Model based on Ratio |
In the current implementation, I treat the ratio approach same to the trained approach considered it as a model. |
I'm wondering if instead of calling the python code, we could have a micro-service running as Power Model Server (which will be running the python code) and we access it through an API (http or grpc)... @rootfs was creating an external server to do something like this right? A server that receives some data, does some calculations, and responds to some information. |
I have no experimental data yet. It passes the feature values through the unix domain socket and just apply the mathematical model to it for estimation. The training process is not included here. |
We will also need some documentation, describing how to configure the models and with details about the supported models |
The external server is for training the model which not all data should be sent. This module is called for prediction for every read data. I think it might be better to use local socket instead of going through the networks for microservice. |
If we need to use python, I will strongly argue to have it running as a different service Otherwise, we could consider using a machine learning library in golang |
pkg/model/estimate.go
Outdated
} | ||
|
||
|
||
func (e *Estimator) GetPower(modelName string, xCols []string, xValues [][]float32, corePower, dramPower, gpuPower, otherPower []float32) []float32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not find where this function is called
Will it replace some code in pkg/collector/reader.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I haven't replaced it the reader.go yet.
Need to convert the current PodEnergy to xCols, xValues and Package power to corePower, dramPower, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh ok, this PR is draft then!
I think at this step, unix domain socket should be fine because it is just to apply the trained method (it is not going to do training or fancy thing rather than read the trained weight and do multiplications to the read data). I will evaluate the end-to-end power estimation time per one tick. |
+1 that The estimator python may have its own repo and run as a sidecar, so we don't have to upgrade the kepler container image if the estimator changes. |
9610edd
to
883ba9c
Compare
I amended the commit by
TO-DO:
|
sounds good, I just created a repo there for your next push |
pkg/model/estimate.go
Outdated
|
||
type PowerRequest struct { | ||
ModelName string `json:"model_name"` | ||
MetricNames []string `json:"metrics"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please run gofmt
These are results testing on the varied number of pod from 10 to 100 (as the maximum number of pod per worker node is about 100).
summary
@rootfs what do you think? |
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
@sunya-ch thank you for this comprehensive study! This study result is worth a doc of its own. Please add the result to the PR as well. looking forward to your full integration |
the work is moved to kepler-estimator, closing |
This PR introduces a dynamic way to estimate the power by Estimator class (pkg/model/estimator.go).
data/model
.h5
of keras model,.sav
of scikit-learn model, and simple ratio model computed metric importance by correlation to powerThere are additional three dependent points to integrate this class to the Kepler
exporter.go
GetPower
function inreader.go
/data/model
which containsmetadata.json
giving the rest details of model such as model file, feature engineering pkl files, features, error, so on. (auto-select the minimum error model if it is empty, "")data/model
of container folder (can be done by statically add in the docker image or deployment manifest volumes)check example use in
pkg/model/estimator_test.go
If you are agree with this direction, we can modify estimator.py to
Signed-off-by: Sunyanan Choochotkaew sunyanan.choochotkaew1@ibm.com