Skip to content

Commit

Permalink
[ML] fix bug where certain allocation retries failed with mysterious …
Browse files Browse the repository at this point in the history
…message (#85446) (#85451)

When an allocation fails for a given model, we retry that allocation. We only do this in certain failure paths.

But, if an allocation makes it as far as setting the model config, we will attempt to set the model config again on the same task instance. This will throw a Lucene `Object cannot be set twice!` exception.

This commit addresses this bug by using a `trySet`, which is atomic and safe. And then starts tracking the model usage once the config is set.

(cherry picked from commit 1bde592)
  • Loading branch information
benwtrent committed Mar 29, 2022
1 parent 31df968 commit eb7bdb7
Showing 1 changed file with 3 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,9 @@ public TrainedModelDeploymentTask(
}

void init(InferenceConfig inferenceConfig) {
this.inferenceConfigHolder.set(inferenceConfig);
licensedFeature.startTracking(licenseState, "model-" + params.getModelId());
if (this.inferenceConfigHolder.trySet(inferenceConfig)) {
licensedFeature.startTracking(licenseState, "model-" + params.getModelId());
}
}

public String getModelId() {
Expand Down

0 comments on commit eb7bdb7

Please sign in to comment.