You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kepler container goes into error just after it started:
<omit>
I0529 02:05:36.588422 690634 exporter.go:175] starting to listen on 0.0.0.0:9102
I0529 02:05:36.588445 690634 exporter.go:181] Started Kepler in 2.243991957s
I0529 02:05:39.594488 690634 exporter.go:457] successfully get data with batch get and delete with 700 pids in 3.298332ms
I0529 02:05:39.914526 690634 estimate.go:139] estimator unmarshal error: json: cannot unmarshal array into Go struct field ComponentPowerResponse.powers of type map[string][]float64 ({"powers": [], "msg": "\"None of [Index(['bpf_cpu_time_us'], dtype='object')] are in the [columns]\"\n"})
I0529 02:05:39.914657 690634 process_energy.go:210] Could not estimate the Process Platform Power
panic: runtime error: index out of range [0] with length 0
goroutine 33 [running]:
github.com/sustainable-computing-io/kepler/pkg/model.addEstimatedEnergy({0xc000746400, 0x3d, 0xc00075c820?}, 0x0?, 0x1)
/workspace/pkg/model/process_energy.go:219 +0xbf0
github.com/sustainable-computing-io/kepler/pkg/model.UpdateProcessEnergy(0xc0005d4000?, 0xc000b88660?)
/workspace/pkg/model/process_energy.go:145 +0x145
github.com/sustainable-computing-io/kepler/pkg/collector/energy.UpdateProcessEnergy(...)
/workspace/pkg/collector/energy/process_energy_collector.go:26
github.com/sustainable-computing-io/kepler/pkg/collector.(*Collector).UpdateProcessEnergyUtilizationMetrics(...)
/workspace/pkg/collector/metric_collector.go:152
github.com/sustainable-computing-io/kepler/pkg/collector.(*Collector).UpdateEnergyUtilizationMetrics(0xc0005d4000)
/workspace/pkg/collector/metric_collector.go:139 +0x2a
github.com/sustainable-computing-io/kepler/pkg/collector.(*Collector).Update(0xb2d05e00?)
/workspace/pkg/collector/metric_collector.go:113 +0x65
github.com/sustainable-computing-io/kepler/pkg/manager.(*CollectorManager).Start.func1()
/workspace/pkg/manager/manager.go:75 +0x7b
created by github.com/sustainable-computing-io/kepler/pkg/manager.(*CollectorManager).Start in goroutine 1
/workspace/pkg/manager/manager.go:67 +0x65
There are some errors in kepler-estimator container too:
<omit>
failed to get model from request {"metrics":["bpf_page_cache_hit","task_clock_ms","bpf_cpu_time_ms","bpf_net_tx_irq","bpf_net_rx_irq","bpf_block_irq","cpu_cycles","cpu_instructions","cache_miss"],"values":[[0,0,0,0,0,0,0,0,0]],"output_type":"DynPower","source":"acpi","system_features":["cpu_architecture"],"system_values":["Skylake"],"trainer_name":"GradientBoostingRegressorTrainer","filter":""}
get archived model
failed to get model from request {"metrics":["bpf_page_cache_hit","task_clock_ms","bpf_cpu_time_ms","bpf_net_tx_irq","bpf_net_rx_irq","bpf_block_irq","cpu_cycles","cpu_instructions","cache_miss"],"values":[[0,0,0,0,0,0,0,0,0]],"output_type":"DynPower","source":"intel_rapl","system_features":["cpu_architecture"],"system_values":["Skylake"],"trainer_name":"GradientBoostingRegressorTrainer","filter":""}
get archived model
failed to get model from request {"metrics":["bpf_page_cache_hit","task_clock_ms","bpf_cpu_time_ms","bpf_net_tx_irq","bpf_net_rx_irq","bpf_block_irq","cpu_cycles","cpu_instructions","cache_miss"],"values":[[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]],"output_type":"DynPower","source":"intel_rapl","system_features":["cpu_architecture"],"system_values":["Skylake"],"trainer_name":"GradientBoostingRegressorTrainer","filter":""}
GradientBoostingRegressorTrainer_1 fail to predict, removed: "None of [Index(['bpf_cpu_time_us'], dtype='object')] are in the [columns]"
<omit>
The complete kepler log can be found here : kepler.log
The complete kepler-estimator log can be found here: kepler-estimator.log
What did you expect to happen?
Kepler should be run without any panics
How can we reproduce it (as minimally and precisely as possible)?
run kepler with the kepler deployment configurations below.
It seems the trained power model using the CPU time metric exported by Kepler before v0.7 (bpf_cpu_time_us); however, the estimation is called by the new Kepler (with bpf_cpu_time_ms). You may have to retrain the power model with new Kepler version.
I0529 02:05:34.744887 690634 utils.go:86] Available ebpf counters: [bpf_page_cache_hit task_clock_ms bpf_cpu_time_ms bpf_net_tx_irq bpf_net_rx_irq bpf_block_irq cpu_cycles cpu_instructions cache_miss]
...
I0529 02:05:39.914526 690634 estimate.go:139] estimator unmarshal error: json: cannot unmarshal array into Go struct field ComponentPowerResponse.powers of type map[string][]float64 ({"powers": [], "msg": "\"None of [Index(['bpf_cpu_time_us'], dtype='object')] are in the [columns]\"\n"})
What happened?
When running the kepler in K8S with the pretrained model to estimate the process power, kepler pod just go panics after launch.
The models are trained by following kepler model server tekton training process, using the complete run.
Kepler container goes into error just after it started:
There are some errors in kepler-estimator container too:
The complete kepler log can be found here :
kepler.log
The complete kepler-estimator log can be found here:
kepler-estimator.log
What did you expect to happen?
Kepler should be run without any panics
How can we reproduce it (as minimally and precisely as possible)?
run kepler with the kepler deployment configurations below.
Anything else we need to know?
No response
Kepler image tag
estimator: quay.io/sustainable_computing_io/kepler_model_server:latest
Kubernetes version
Cloud provider or bare metal
OS version
Install tools
Kepler deployment config
For on kubernetes:
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: