-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Labels
Description
The SageMaker client does not fully support handling of the "AsyncInferenceConfig" parameter for creating endpoint configs.
While investigating a test failure in another project (which relies on async inference), I found that create_endpoint_config did not actually take "AsyncInferenceConfig" into account, since describe_endpoint_config did not return it. This missing functionality caused tests using async inference to break unexpectedly.
Steps to Reproduce
Details
import boto3
from moto import mock_aws
import pytest
@pytest.fixture
def mock_sagemaker():
with mock_aws():
yield boto3.client("sagemaker", region_name="eu-central-1")
@pytest.mark.parametrize(
"prefix, endpoint_cfg, expected_result",
[
(
"async", {
"AsyncInferenceConfig": {
"ClientConfig": {"MaxConcurrentInvocationsPerInstance": 3},
"OutputConfig": {"S3OutputPath": "s3://output-bucket", "NotificationConfig": {}},
}
},
True,
),
("", {}, False),
],
)
def test_is_async_endpoint(mock_sagemaker, prefix, endpoint_cfg, expected_result):
## given
_MODEL_NAME = f"{prefix}test"
_ENDPOINT_NAME = f"{prefix}test_endpoint_name"
sm = mock_sagemaker
sm.create_model(
ModelName=_MODEL_NAME,
PrimaryContainer={
"Image": "test_image",
"ModelDataUrl": f"s3://test_bucket/model.zip",
},
)
sm.create_endpoint_config(
EndpointConfigName=_ENDPOINT_NAME,
ProductionVariants=[
{
"VariantName": "AllTraffic",
"ModelName": _MODEL_NAME,
"InitialInstanceCount": 1,
"InstanceType": "ml.m5.large",
}
],
**endpoint_cfg,
)
sm.create_endpoint(
EndpointName=_ENDPOINT_NAME,
EndpointConfigName=_ENDPOINT_NAME,
)
## when & then
config_name = sm.describe_endpoint(EndpointName=_ENDPOINT_NAME)['EndpointConfigName']
endpoint_config = sm.describe_endpoint_config(EndpointConfigName=config_name)
assert ("AsyncInferenceConfig" in endpoint_config) is expected_resultExpected Behavior
- When specifying "AsyncInferenceConfig" in
create_endpoint_config, Moto should store this parameter and return it whendescribe_endpoint_configis called.
Actual Behavior
- Moto ignores the "AsyncInferenceConfig" parameter in
create_endpoint_configand does not surface this info indescribe_endpoint_config, leading to test failures for async endpoint usage.
Proposed Solution
- Update
FakeEndpointConfig.__init__&SageMakerModelBackend.create_endpoint_configto include "AsyncInferenceConfig" params - Update
SageMakerResponse.create_endpoint_configto include "AsyncInferenceConfig" params