Skip to content

Auto-config: Retries for network exceptions (ResourceAccessException, WebClientRequestException) #4567

@seungy0

Description

@seungy0

Expected Behavior

  • Auto-configuration path should be able to retry on pre-response network failures (e.g., connection timeout) in the same way as model defaults do, so transient network issues don’t immediately fail requests.
  • Concretely, behavior should be consistent with the default builder path where network exceptions are treated as retryable.

Relevant code indicating builder default behavior:

public static final RetryTemplate DEFAULT_RETRY_TEMPLATE = RetryTemplate.builder()
	.maxAttempts(10)
	.retryOn(TransientAiException.class)
	.retryOn(ResourceAccessException.class)
	.exponentialBackoff(Duration.ofMillis(2000), 5, Duration.ofMillis(3 * 60000))
	.withListener(new RetryListener() {
		@Override
		public <T extends Object, E extends Throwable> void onError(RetryContext context,
				RetryCallback<T, E> callback, Throwable throwable) {
			logger.warn("Retry error. Retry count:" + context.getRetryCount(), throwable);
		}
	})
	.build();

Current Behavior

  • With auto-configuration, the RetryTemplate only retries on TransientAiException (HTTP-status-based), not on network exceptions that occur before any response is available. As a result, connection timeouts and similar network errors are not retried in the auto-config path.

Relevant code showing the auto-configured behavior:

public RetryTemplate retryTemplate(SpringAiRetryProperties properties) {
	return RetryTemplate.builder()
		.maxAttempts(properties.getMaxAttempts())
		.retryOn(TransientAiException.class)
		.exponentialBackoff(properties.getBackoff().getInitialInterval(), properties.getBackoff().getMultiplier(),
				properties.getBackoff().getMaxInterval())
		.withListener(new RetryListener() {
			@Override
			public <T, E extends Throwable> void onError(RetryContext context, RetryCallback<T, E> callback,
					Throwable throwable) {
				logger.warn("Retry error. Retry count: {}, Exception: {}", context.getRetryCount(),
						throwable.getMessage(), throwable);
			}
		})
		.build();
}
  • The auto-configured ResponseErrorHandler maps HTTP responses to TransientAiException/NonTransientAiException, but network errors (e.g., connection refused/timeout) occur before any ClientHttpResponse, so they never reach this handler and are not retried.

Context

  • Impact: Applications using auto-configuration observe immediate failures on transient network issues (e.g., connection timeout), even though the default builder path would treat these as retryable. This creates inconsistent resilience depending on wiring path (auto-config vs builder default).
  • Why now: We observed that RetryUtils.DEFAULT_RETRY_TEMPLATE includes ResourceAccessException, but the auto-configured RetryTemplate does not, leading to differing behavior in otherwise similar setups.
  • Reproduction: Use Spring AI via auto-config defaults, trigger a connection timeout to the model provider (e.g., short client timeout or unreachable host). The request fails without retry, because the exception occurs pre-response and never becomes TransientAiException.
  • Prior art: Related discussion indicates the gap between default builder and auto-config behavior (e.g., Occasionally, when Spring AI calls a llm, it reports a network error:HTTP/1.1 header parser received no bytes #4082 marked resolved for the builder path), but auto-config path still does not retry network exceptions, causing the inconsistency described above.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingretry

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions