-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update model server support #235
Conversation
a656fb1
to
ffd2c08
Compare
Please feel free to review on the design first. |
ffd2c08
to
4f92f5f
Compare
Now fix critical issue on deployment. However, please allow me to have some issue left (need help from other to fix with other PR):
|
4f92f5f
to
eb80508
Compare
@sunya-ch Thanks a lot for adding the feature 🤗 |
cmd/manager/main.go
Outdated
keplerEstimatorImage := env("KEPLER_ESTIMATOR_IMAGE", estimator.StableImage) | ||
flag.StringVar(&estimator.Config.Image, "kepler-estimator.image", keplerEstimatorImage, "kepler estimator image") | ||
|
||
keplerModelServerImage := env("KEPLER_MODEL_SERVER_IMAGE", modelserver.StableImage) | ||
flag.StringVar(&modelserver.Config.Image, "kepler-model-server.image", keplerModelServerImage, "kepler model server image") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(suggestion): we have an opportunity to refactor this pattern of getting the env from the flag.
Lets use estimator.image
and model-server.image
in the flag please.
I have a question about the name model-server
. Do you know why we call it that? what does it really serve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model-server serves the model selection based on available features. When online-trainer is validated, it will serve the fine-tuned model online to the exporter.
pkg/api/v1alpha1/kepler_types.go
Outdated
NodeTotalEstimator *EstimatorSpec `json:"nodeTotalEstimator,omitempty"` | ||
NodeComponentsEstimator *EstimatorSpec `json:"nodeComponentsEstimator,omitempty"` | ||
ContainerTotalEstimator *EstimatorSpec `json:"containerTotalEstimator,omitempty"` | ||
ContainerComponentsEstimator *EstimatorSpec `json:"containerComponentsEstimator,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we keep all estimator
related spec under
spec:
estimator:
node:
total
components
container
total:
components:
func GetModifiedCommandAndArgs(inCmd []string) (outCmd, outArgs []string) { | ||
outCmd = bashCommand | ||
outArgs = []string{fmt.Sprintf(waitForSocketCommand, strings.Join(inCmd, " "))} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func GetModifiedCommandAndArgs(inCmd []string) (outCmd, outArgs []string) { | |
outCmd = bashCommand | |
outArgs = []string{fmt.Sprintf(waitForSocketCommand, strings.Join(inCmd, " "))} | |
return | |
} | |
func estimatorCmdAndArgs(inCmd []string) ([]string, []string) { | |
return ( | |
bashCommand | |
[]string{fmt.Sprintf(waitForSocketCommand, strings.Join(inCmd, " "))} | |
) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, it is used by exporter module. Now, put it (together with tmp volume mount) inside newly-defined AddEstimatorDependency
function.
pkg/components/exporter/exporter.go
Outdated
k8s.VolumeFromConfigMap("cfm", ConfigmapName), | ||
} | ||
|
||
if estimator.IsEstimatorSidecarEnabled(k) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if estimator.IsEstimatorSidecarEnabled(k) { | |
containers := []corev1.Container{*exporterContainer} | |
if estimator.NeedsSidecar(k) { | |
containers = addEstimatorSidecar(exporterContainer) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it also needs changes on volume, I modify your suggestion as below.
containers, volumes = addEstimatorSidecar(exporterContainer, volumes)
pkg/api/v1alpha1/kepler_types.go
Outdated
@@ -58,16 +64,13 @@ type ModelServerTrainerSpec struct { | |||
|
|||
// +kubebuilder:default=true | |||
PromSSLDisable bool `json:"promSSLDisable,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we move all Prometheus Config to separate struct?
@sunya-ch , Thanks a lot for adding this feature 🙇 . We also need e2e tests to validate most common configuration and scenarios ... Any thoughts on having both model-server and estimator disabled by default? |
eb80508
to
ea5c124
Compare
@sthaha Thank you so much for the review.
Both should be disabled by default. |
ea5c124
to
4a70570
Compare
if ms.Port > 0 { | ||
port = ms.Port | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be needed since the api server should reject anything less than 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see webhook that check Kepler CR.
I need this at least for the unit test because it is not filled by kbuilder as in csv file.
Even though, in deployment, if user does not define it will be 8100, I just have this in case to make sure we set the default port if it is not properly defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need webhook if we add the right markers
// +kubebuilder:validation:Minimum
specifies the minimum numeric value that this field can have. Negative numbers are supported.
https://book.kubebuilder.io/reference/markers/crd-validation.html
see: exporter.deployment.port
This also reminds me that we should also separate out deployment
related settings to a DeploymentSpec
msConfig := k8s.StringMap{} | ||
|
||
prom := trainer.Prom | ||
if prom != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit): prefer early returns for better readability
if prom != nil { | |
if prom == nil { | |
return msConfig | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally not return it at the beginning for the future case when we have more setting to the trainer not only prom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please feel free to ignore nits
.
imho, it is better to write the code for the current scenario and readjust if/when there is a change in future 👼 (YAGNI) :)
promHeaders.WriteString(h.Key) | ||
promHeaders.WriteString(":") | ||
promHeaders.WriteString(h.Value) | ||
promHeaders.WriteString(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a trailing comma ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure who wrote this part. It will be used here: https://github.com/sustainable-computing-io/kepler-model-server/blob/4af262f2bb2ae2420eaddf64cdc8480eb05ff3e6/src/train/prom/prom_query.py#L26.
Could I left someone work on that later if not match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am happy to have this validated later.
Could you please add a todo .. something like // TODO: ensure trailing , is accepted
pkg/controllers/kepler.go
Outdated
if cluster == k8s.OpenShift { | ||
updater := newUpdaterForKepler(k) | ||
rs = append(rs, updater(exporter.NewSCC(components.Full, k))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if cluster == k8s.OpenShift { | |
updater := newUpdaterForKepler(k) | |
rs = append(rs, updater(exporter.NewSCC(components.Full, k))) | |
} |
TargetPort: intstr.FromString("http"), | ||
}}, | ||
}, | ||
} | ||
} | ||
|
||
func NewConfigMap(d components.Detail, k *v1alpha1.Kepler) *corev1.ConfigMap { | ||
func GetModelServerConfigForClient(ms *v1alpha1.ModelServerSpec) map[string]string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func GetModelServerConfigForClient(ms *v1alpha1.ModelServerSpec) map[string]string { | |
func ModelServerConfigForClient(ms v1alpha1.ModelServerSpec) k8s.StringMap { |
Keeping it k8s.StringMap has the advantage that we can merge 2 or more maps.
@@ -274,3 +289,11 @@ func defaultIfEmpty(given, defaultVal string) string { | |||
} | |||
return defaultVal | |||
} | |||
|
|||
func GetDefaultModelServer(ms v1alpha1.ModelServerSpec) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func GetDefaultModelServer(ms v1alpha1.ModelServerSpec) string { | |
func serverUrl(ms v1alpha1.ModelServerSpec) string { |
@@ -61,6 +61,13 @@ func (l StringMap) ToMap() map[string]string { | |||
return l | |||
} | |||
|
|||
func (l StringMap) AddIfNotEmpty(k, v string) map[string]string { | |||
if v != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check the key k
as well?
4a70570
to
ca4cf5c
Compare
Made an update to the review that marked the icon. Here are example deployments. minimum deploymentspec:
exporter:
deployment:
port: 9103
with estimator onlyspec:
exporter:
deployment:
port: 9103
estimator:
node:
components:
sidecar: true
initUrl: https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/Linux-4.15.0-213-generic-x86_64_v0.6/rapl/AbsPower/KubeletOnly/GradientBoostingRegressorTrainer_1.zip
full deploymentspec:
exporter:
deployment:
port: 9103
estimator:
node:
components:
sidecar: true
initUrl: https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/Linux-4.15.0-213-generic-x86_64_v0.6/rapl/AbsPower/KubeletOnly/GradientBoostingRegressorTrainer_1.zip
modelServer:
enabled: true
|
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
ca4cf5c
to
c15c776
Compare
if k == "" { | ||
return StringMap{} | ||
} | ||
if v != "" { | ||
l[k] = v | ||
} | ||
return l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please feel free to ignore
if k == "" { | |
return StringMap{} | |
} | |
if v != "" { | |
l[k] = v | |
} | |
return l | |
if k != "" && v != "" { | |
l[k] = v | |
} | |
return l |
@@ -33,6 +33,7 @@ type Runner struct { | |||
Logger logr.Logger | |||
} | |||
|
|||
// TODO: make sure that model server container (deployment) is ready before creating kepler daemonset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to do that ... we just have to indicate in the status that kepler Available = false
until all resources are available. The runner should not hold any specific information about objects it is reconciling.
If there is a requirement that X should be created before Y then create a reconciler.WaitFor{ resource: X}
and add it to the list if reconcilers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide me a specific guide on this or can I leave this work for the other contributor who can quickly add it? I will move todo line to another place such as line 351 in kepler.go.
svc, | ||
pvc, | ||
) | ||
return rs, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure that the status
of kepler
is properly updated based on the state of deployments.
@@ -1,3 +1,4 @@ | |||
# TODO: add integration test for all deployment choices (minimum, estimator-only, server-only, full) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunya-ch IMHO as a requirement, All major features must have an e2e before merging. We can of course create follow up PRs for testing edge cases but we must have e2e tests to exercise the green and the most common red path at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
type EstimatorConfig struct { | ||
// +kubebuilder:default=false | ||
SidecarEnabled bool `json:"sidecar,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't Enabled
this better? We don't need to be specific about how
a feature is enabled. That is implementation detail.
SidecarEnabled bool `json:"sidecar,omitempty"` | |
Enabled bool `json:"enabled,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kepler estimator could mean the local estimator and sidecar estimator. The config here (e.g., initURL) is a common for both estimator. If sidecar is not enabled, the local estimator inside kepler exporter will be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we then specify that in the spec itself
estimator:
enabled: true|false
mode: sidecar|built-in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have only with and without sidecar. I think it is okay to have only sidecar spec to be true or false.
moved to #322 |
This PR updates model server support aiming for release v0.6 as mentioned in #232.
API doc:
https://github.com/sustainable-computing-io/kepler-operator/blob/c15c77621958cc79d1921d9af378915158abc4ca/docs/api.md
The PR contains changes on :
Note that,
The holder for setting filters and model name is here on kepler:
https://github.com/sustainable-computing-io/kepler/blob/73cb11fb963f425013cf7f03f214c8f8b85c7853/pkg/config/config.go#L390.
However, it is not determined how to use it. So, it is not supported yet from end to end.
Example configmap change from full deployment on OpenShift on IBM Cloud
(kepler CR:
config/samples/kepler_full_deploy.yaml
)Resources:
exporter log
estimator log
Signed-off-by: Sunyanan Choochotkaew sunyanan.choochotkaew1@ibm.com