Skip to content

(sagemaker): Missing support for "AsyncInferenceConfig" in create_endpoint_config and describe_endpoint_config #8783

@bweigel

Description

@bweigel

The SageMaker client does not fully support handling of the "AsyncInferenceConfig" parameter for creating endpoint configs.

While investigating a test failure in another project (which relies on async inference), I found that create_endpoint_config did not actually take "AsyncInferenceConfig" into account, since describe_endpoint_config did not return it. This missing functionality caused tests using async inference to break unexpectedly.

Steps to Reproduce

Details
import boto3
from moto import mock_aws
import pytest

@pytest.fixture
def mock_sagemaker():
    with mock_aws():
        yield boto3.client("sagemaker", region_name="eu-central-1")

@pytest.mark.parametrize(
    "prefix, endpoint_cfg, expected_result",
    [
        (
            "async", {
                "AsyncInferenceConfig": {
                    "ClientConfig": {"MaxConcurrentInvocationsPerInstance": 3},
                    "OutputConfig": {"S3OutputPath": "s3://output-bucket", "NotificationConfig": {}},
                }
            },
            True,
        ),
        ("", {}, False),
    ],
)
def test_is_async_endpoint(mock_sagemaker, prefix, endpoint_cfg, expected_result):
    ## given
    _MODEL_NAME = f"{prefix}test"
    _ENDPOINT_NAME = f"{prefix}test_endpoint_name"
    sm = mock_sagemaker
    sm.create_model(
        ModelName=_MODEL_NAME,
        PrimaryContainer={
            "Image": "test_image",
            "ModelDataUrl": f"s3://test_bucket/model.zip",
        },
    )
    sm.create_endpoint_config(
        EndpointConfigName=_ENDPOINT_NAME,
        ProductionVariants=[
            {
                "VariantName": "AllTraffic",
                "ModelName": _MODEL_NAME,
                "InitialInstanceCount": 1,
                "InstanceType": "ml.m5.large",
            }
        ],
        **endpoint_cfg,
    )
    sm.create_endpoint(
        EndpointName=_ENDPOINT_NAME,
        EndpointConfigName=_ENDPOINT_NAME,
    )
    ## when & then
    config_name = sm.describe_endpoint(EndpointName=_ENDPOINT_NAME)['EndpointConfigName']
    endpoint_config = sm.describe_endpoint_config(EndpointConfigName=config_name)
    assert ("AsyncInferenceConfig" in endpoint_config) is expected_result

Expected Behavior

  • When specifying "AsyncInferenceConfig" in create_endpoint_config, Moto should store this parameter and return it when describe_endpoint_config is called.

Actual Behavior

  • Moto ignores the "AsyncInferenceConfig" parameter in create_endpoint_config and does not surface this info in describe_endpoint_config, leading to test failures for async endpoint usage.

Proposed Solution

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions