Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PYIC-6584: POC for async SQS messaging #2020

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Joe-Edwards-GDS
Copy link
Contributor

@Joe-Edwards-GDS Joe-Edwards-GDS commented Jun 14, 2024

Try creating an async SQS client, to potentially improve performance by parallelising AWS API calls.

The X-Ray traces show SQS requests running in parallel to others, so this appears to work. It'll be hard to tell without a proper performance test whether it makes an appreciable difference.

Other clients could also be made async, but the value will be reduced - e.g. most DynamoDB and SSM calls need to be synchronous as we have to wait for the result.

@@ -197,6 +197,8 @@ public Map<String, Object> handleRequest(JourneyRequest event, Context context)
return new JourneyErrorResponse(
JOURNEY_ERROR_PATH, e.getResponseCode(), e.getErrorResponse())
.toObjectMap();
} finally {
auditService.awaitAuditEvents();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scary thing about this approach is that if we don't call this, then the lambda runtime will suspend and (potentially) interrupt an in-progress request.

I can't think of a great solution to this beyond 'make sure it's included in all the handlers' and 'never construct a new audit service outside of a handler'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we'd catch this in the pipeline before any mistakes hit prod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, which is the scary part :/

.region(EU_WEST_2)
.httpClientBuilder(UrlConnectionHttpClient.builder())
.httpClientBuilder(AwsCrtAsyncHttpClient.builder())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AWS docs seem to imply that we should use the AWS CRT client for the sync case as well (https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/http-configuration.html#http-clients-recommend) - unless it's a snapstart thing to prefer the UrlConnectionHttpClient?

Copy link

sonarcloud bot commented Jun 14, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
33.3% Coverage on New Code (required ≥ 80%)
1 New Bugs (required ≤ 0)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

CompletableFuture.allOf(eventsToAwait.toArray(new CompletableFuture[0])).get();
} catch (InterruptedException | ExecutionException e) {
LOGGER.error(LogHelper.buildErrorMessage("Failed to send audit event(s)", e));
throw new AuditException("Failed to send audit event(s)", e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a RuntimeException to ease its use - this means an audit failure will fail the lambda, but this is probably a good thing!

I'm in discussion about whether any of the audit events can be marked as non-critical (i.e. we can ignore if they fail), but for now let's assume that they all are, because TICF may need them.

var eventsToAwait = events;
events = new ArrayList<>();
CompletableFuture.allOf(eventsToAwait.toArray(new CompletableFuture[0])).get();
} catch (InterruptedException | ExecutionException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get an InterruptedException (which I can't imagine we will), do we need to call Thread.interrupted()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants