-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
Description
Expected Behavior
- Auto-configuration path should be able to retry on pre-response network failures (e.g., connection timeout) in the same way as model defaults do, so transient network issues don’t immediately fail requests.
- Concretely, behavior should be consistent with the default builder path where network exceptions are treated as retryable.
Relevant code indicating builder default behavior:
public static final RetryTemplate DEFAULT_RETRY_TEMPLATE = RetryTemplate.builder()
.maxAttempts(10)
.retryOn(TransientAiException.class)
.retryOn(ResourceAccessException.class)
.exponentialBackoff(Duration.ofMillis(2000), 5, Duration.ofMillis(3 * 60000))
.withListener(new RetryListener() {
@Override
public <T extends Object, E extends Throwable> void onError(RetryContext context,
RetryCallback<T, E> callback, Throwable throwable) {
logger.warn("Retry error. Retry count:" + context.getRetryCount(), throwable);
}
})
.build();
Current Behavior
- With auto-configuration, the
RetryTemplate
only retries onTransientAiException
(HTTP-status-based), not on network exceptions that occur before any response is available. As a result, connection timeouts and similar network errors are not retried in the auto-config path.
Relevant code showing the auto-configured behavior:
public RetryTemplate retryTemplate(SpringAiRetryProperties properties) {
return RetryTemplate.builder()
.maxAttempts(properties.getMaxAttempts())
.retryOn(TransientAiException.class)
.exponentialBackoff(properties.getBackoff().getInitialInterval(), properties.getBackoff().getMultiplier(),
properties.getBackoff().getMaxInterval())
.withListener(new RetryListener() {
@Override
public <T, E extends Throwable> void onError(RetryContext context, RetryCallback<T, E> callback,
Throwable throwable) {
logger.warn("Retry error. Retry count: {}, Exception: {}", context.getRetryCount(),
throwable.getMessage(), throwable);
}
})
.build();
}
- The auto-configured
ResponseErrorHandler
maps HTTP responses toTransientAiException
/NonTransientAiException
, but network errors (e.g., connection refused/timeout) occur before anyClientHttpResponse
, so they never reach this handler and are not retried.
Context
- Impact: Applications using auto-configuration observe immediate failures on transient network issues (e.g., connection timeout), even though the default builder path would treat these as retryable. This creates inconsistent resilience depending on wiring path (auto-config vs builder default).
- Why now: We observed that
RetryUtils.DEFAULT_RETRY_TEMPLATE
includesResourceAccessException
, but the auto-configuredRetryTemplate
does not, leading to differing behavior in otherwise similar setups. - Reproduction: Use Spring AI via auto-config defaults, trigger a connection timeout to the model provider (e.g., short client timeout or unreachable host). The request fails without retry, because the exception occurs pre-response and never becomes
TransientAiException
. - Prior art: Related discussion indicates the gap between default builder and auto-config behavior (e.g., Occasionally, when Spring AI calls a llm, it reports a network error:HTTP/1.1 header parser received no bytes #4082 marked resolved for the builder path), but auto-config path still does not retry network exceptions, causing the inconsistency described above.