Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Partial Transaction for AWS Lambda #753

Closed
3 tasks done
estolfo opened this issue Feb 1, 2023 · 12 comments
Closed
3 tasks done

Support Partial Transaction for AWS Lambda #753

estolfo opened this issue Feb 1, 2023 · 12 comments

Comments

@estolfo
Copy link
Contributor

estolfo commented Feb 1, 2023

Description

With elastic/apm-aws-lambda#344 the AWS Lambda extension supports registering transactions at the beginning of the transaction/lambda invocation. With that, the extension can finalize the transaction in case that there is an error and the agent fails to report the final transaction.
Implement the ability to send(register) a partial transaction (in sync) to the AWS Lambda extension with the header x-elastic-aws-request-id containing the lambda invocation's request ID. The extension will cache the partial transaction and return a 200 OK response. If the extension returns 404 then the agent should stop sending the registration request for future invocations.

Spec: #799

Agent Issues

@basepi
Copy link
Contributor

basepi commented Mar 21, 2023

@lahsivjar would you mind adding some more context? I know we have the draft PR from the go agent but I'm still having a hard time parsing the fine details here.

What is a "partial transaction"? Should I compose a barebones transaction dict, with only the id, timestamp, trace_id, and name? Or send everything I have at transaction start? We'll have most of the request context, but obviously will be missing span counts, outcome, response context, etc.

I assume we're sending this to the normal intake/v2/events endpoint? I know I need to add the x-elastic-aws-request-id header, is that the only change from a "normal" transaction json object?

I assume I send the metadata blob first as usual? Currently any time we flush our events buffer, the metadata gets sent first. Since the python agent is the only agent that doesn't stream to the APM server this might be an important point for only python -- I don't know how the APM Server would feel about two metadata blobs in a row, assuming you forward the metadata blob but not the partial transaction. It would certainly be easier for my implementation if the extension didn't complain about the second metadata blob when I report the full transaction at the end, but if needed I can modify the transport to exclude it in this specific case.

Thanks in advance.

@lahsivjar
Copy link
Contributor

lahsivjar commented Mar 27, 2023

@basepi The partial transaction flow works like below (there is a pending PR that implements some part of the following based on the received feedback regarding handling metadata):

  1. At the start of the invocation, the agent will create a partial transaction. Partial transaction is nothing but a JSON encoding of all the transaction data that can be provided at this instance of time (containing transaction ID, FAAS fields, traceparent, tracestate, and other fields as usual). Along with the partial transaction, the agent is encouraged to send the metadata as the first line in the request body. Note that, if metadata is sent then it is used by the extension for all future calls made to APM-server and it cannot be updated.
  2. Next, the agent will send(register) the optional_metadata + partial transaction (in sync) to the extension with the header x-elastic-aws-request-id containing the lambda invocation's request ID. The endpoint for this is /register/transaction and it accepts Content-Type as application/vnd.elastic.apm.transaction+ndjson. The extension will cache the optional metadata + partial transaction and return a 200 OK response. If the extension returns 404 then the agent should stop sending the registration request for future invocations.
  3. After registration, agent will continue as normal. The extension will receive the transactions, spans, etc via intake v2 protocol and inspect the request payload to check if the request contains the transaction with the ID registered by the agent.
  4. If the extension finds a transaction then it releases its cache, otherwise, on shutdown or on receiving platform.runtimeDone event for the invocation the extension will create a proxy transaction on the behalf of the agent and send it to APM-Server.
Old (out of date) questions

Based on above, I will try to answer your questions.

What is a "partial transaction"? Should I compose a barebones transaction dict, with only the id, timestamp, trace_id, and name? Or send everything I have at transaction start? We'll have most of the request context, but obviously will be missing span counts, outcome, response context, etc.

I think we should send everything that we can glean at the transaction start start of the invocation.

I assume we're sending this to the normal intake/v2/events endpoint? I know I need to add the x-elastic-aws-request-id header, is that the only change from a "normal" transaction json object?

The partial transaction should be sent to the /register/transaction endpoint with the x-elastic-aws-request-id header. The body will be encoded in the same way we encode the transaction JSON object while sending it to the intake endpoint.

I assume I send the metadata blob first as usual?

When calling the register endpoint only the encoded partial transaction is expected by the /register/transaction endpoint. This means it shouldn't have the metadata blob.

@basepi
Copy link
Contributor

basepi commented Apr 10, 2023

@lahsivjar Can you update the above with your changes in elastic/apm-aws-lambda#384 ?

Also we should probably update the spec at some point with all of this stuff.

@lahsivjar
Copy link
Contributor

@basepi I have updated the above message with the new details. Once the PR is finalized I will take a stab at updating the spec.

@JonasKunz
Copy link
Contributor

If the extension returns 404 then the agent should stop sending the registration request for future invocations.

I think we should extend that to if the extension returns 4xx. I implemented the approach explained here and this yields a 415 from the current release of the extension due to the changed Content-Type header.

@trentm
Copy link
Member

trentm commented Apr 28, 2023

I think we should extend that to if the extension returns 4xx.

Sounds good to me. In my implementation if it returns anything other than the expected 200, then I log a message and stop using the endpoint. Do you think that is overkill?

@basepi
Copy link
Contributor

basepi commented Apr 28, 2023

In my implementation if it returns anything other than the expected 200, then I log a message and stop using the endpoint. Do you think that is overkill?

Probably fine. Technically an APM server can return 429 for rate-limited. But (1) I don't think the extension does this and (2) in a back-off scenario it probably makes sense to just stop registering partial transactions anyway.

I'll update my implementation to be more broad, right now I'm only doing the explicit 404.

@lahsivjar
Copy link
Contributor

In my implementation if it returns anything other than the expected 200, then I log a message and stop using the endpoint. Do you think that is overkill?

The initial intention was to ensure backward compatibility of the extension with the agents but by looking at the bigger picture, taking a broader approach to stop registration on any error sounds good.

Thanks for raising this @JonasKunz. I will also release the new version of the extension with the latest changes this week.

@lahsivjar
Copy link
Contributor

Lambda extension v1.4.0 has been released with the support for metadata in the register transaction call.

@basepi
Copy link
Contributor

basepi commented May 11, 2023

@lahsivjar do you have the bandwidth to do the spec PR for this? If not I can do it.

@lahsivjar
Copy link
Contributor

@basepi Thanks for the offer, I am a bit swamped right now and will appreciate the help 🙇

@basepi
Copy link
Contributor

basepi commented May 23, 2023

Spec is in #799

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants