Skip to content

[AWS][Lambda] Fix AWS Lambda data stream aws.lambda.message handling#19250

Merged
Kavindu-Dodan merged 4 commits into
elastic:mainfrom
Kavindu-Dodan:fix/fix-lambda-message-handling
Jun 3, 2026
Merged

[AWS][Lambda] Fix AWS Lambda data stream aws.lambda.message handling#19250
Kavindu-Dodan merged 4 commits into
elastic:mainfrom
Kavindu-Dodan:fix/fix-lambda-message-handling

Conversation

@Kavindu-Dodan
Copy link
Copy Markdown
Contributor

@Kavindu-Dodan Kavindu-Dodan commented May 27, 2026

Proposed commit message

aws.lambda.message is defined as a flattened field. However, Lambda integration's JSON pipeline ignore checks to validate this. This causes JSON pipeline to parse but Elastic indexing to fail with expecting token of type [START_OBJECT] but found [VALUE_STRING]

See fields.yaml of this integration for specific field definition (see extraction below),

- name: aws.lambda
  type: group
  fields:
  - name: message
    type: flattened

If the payload contains a non-compliant content, then this fix preserve the original message at root level message field.

See test changes and newly added tests to understand the change better.

Additionally, message section is now added in plain text mode as well.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

@Kavindu-Dodan Kavindu-Dodan force-pushed the fix/fix-lambda-message-handling branch from a32f83b to 0277758 Compare May 27, 2026 21:16
@Kavindu-Dodan Kavindu-Dodan force-pushed the fix/fix-lambda-message-handling branch from 0277758 to d4d3bb8 Compare May 28, 2026 17:30
@github-actions
Copy link
Copy Markdown
Contributor

Elastic Docs Style Checker (Vale)

Summary: 1 suggestion found

💡 Suggestions (1): Optional style improvements. Apply when helpful.
File Line Rule Message
packages/aws/changelog.yml 1 Elastic.Versions Use 'later versions' instead of 'newer versions' when referring to versions.

The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale.

@Kavindu-Dodan Kavindu-Dodan force-pushed the fix/fix-lambda-message-handling branch from d4d3bb8 to 53df15e Compare May 28, 2026 17:45
@Kavindu-Dodan Kavindu-Dodan marked this pull request as ready for review May 28, 2026 17:57
@Kavindu-Dodan Kavindu-Dodan requested review from a team as code owners May 28, 2026 17:57
@Kavindu-Dodan Kavindu-Dodan changed the title fix AWS Lambda message handling [AWS][Lambda] Fix AWS Lambda data stream aws.lambda.message handling May 28, 2026
@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

elastic-vault-github-plugin-prod Bot commented May 28, 2026

🚀 Benchmarks report

Package aws 👍(11) 💚(6) 💔(5)

Expand to view
Data stream Previous EPS New EPS Diff (%) Result
rds 41666.67 32258.06 -9408.61 (-22.58%) 💔
route53_public_logs 23809.52 8695.65 -15113.87 (-63.48%) 💔
s3access 5291.01 2932.55 -2358.46 (-44.57%) 💔
vpcflow 8264.46 4504.5 -3759.96 (-45.5%) 💔
ec2_logs 47619.05 38461.54 -9157.51 (-19.23%) 💔

To see the full report comment with /test benchmark fullreport

@andrewkroh andrewkroh added the Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] label May 28, 2026
@agithomas
Copy link
Copy Markdown
Contributor

@Kavindu-Dodan, is this PR a duplicate of #17398?

cc @gpop63

@Kavindu-Dodan Kavindu-Dodan force-pushed the fix/fix-lambda-message-handling branch from 53df15e to cffb4fe Compare May 29, 2026 13:48
@markjandejong
Copy link
Copy Markdown

Additional Root Cause: parsed.timestamp blocks the catchall rename

I've been debugging this same issue in production and found an additional bug beyond the string message problem that both this PR and #17398 are addressing.

The deeper issue

The pipeline's final catchall rename (rename parsed → aws.lambda.message) has this condition:

if: "ctx.parsed instanceof Map && !(ctx.parsed.containsKey('message') || ctx.parsed.containsKey('record') || ctx.parsed.containsKey('_aws') || ctx.parsed.containsKey('time') || ctx.parsed.containsKey('timestamp'))"

For Powertools-style logs (which have a timestamp field), the date processor extracts the value into @timestamp but never removes parsed.timestamp from the map. This means the condition ctx.parsed.containsKey('timestamp') is always true, so the entire if evaluates to false, and the catchall rename never fires for any Powertools log.

After the string message is handled, parsed still contains all the remaining structured fields (cold_start, function_name, service, name, function_version, etc.) — but they're orphaned and never make it into aws.lambda.message.

The result

Instead of getting a rich flattened object like:

{"log": "Creating Allow Policy", "name": "MyLogger", "function_version": "$LATEST", "cold_start": false, ...}

You get an incomplete object (or just a string at root message), losing all the structured context that Powertools provides.

Proposed fix (3 changes)

1. Remove parsed.timestamp after the date processor extracts it:

- date:
    if: "ctx.parsed?.timestamp != null"
    field: parsed.timestamp
    target_field: "@timestamp"
    formats: ["yyyy-MM-dd HH:mm:ss,SSSZ", "ISO8601"]
    ignore_failure: true

- remove:
    field: parsed.timestamp
    ignore_missing: true
    ignore_failure: true

2. Rename parsed.messageparsed.log instead of copying to root message:

- rename:
    field: parsed.message
    target_field: parsed.log
    if: "ctx.parsed?.message != null && ctx.parsed?.message instanceof String"
    ignore_failure: true

This keeps the string value inside parsed so it flows into aws.lambda.message.log via the catchall rename — preserving all sibling fields alongside it.

3. Remove timestamp from the catchall rename condition (since parsed.timestamp is now removed earlier, the check is no longer needed):

Original:

if: "ctx.parsed instanceof Map && !(ctx.parsed.containsKey('message') || ctx.parsed.containsKey('record') || ctx.parsed.containsKey('_aws') || ctx.parsed.containsKey('time') || ctx.parsed.containsKey('timestamp'))"

Fixed:

if: "ctx.parsed instanceof Map && !(ctx.parsed.containsKey('message') || ctx.parsed.containsKey('record') || ctx.parsed.containsKey('_aws') || ctx.parsed.containsKey('time'))"

Results

  • No data loss: All Powertools structured fields (cold_start, function_name, service, name, xray_trace_id, etc.) end up in aws.lambda.message as a proper flattened object
  • No duplication: The string message lives only at aws.lambda.message.log, not duplicated at root message AND aws.lambda.message.message (addresses @kcreddy's concern)
  • Consistent with flattened mapping: aws.lambda.message always gets a Map, never a bare string
  • Aligns with @MichaelKatsoulis's suggestion: We drop parsed.message from parsed before the fallback rename — but we preserve its content under a different key (log) rather than discarding it entirely

Happy to open a separate PR with these changes + tests if that's preferred, or contribute them here. You can see my version HERE

@Kavindu-Dodan
Copy link
Copy Markdown
Contributor Author

@markjandejong really appreciate the detailed explanation and the solution for the issue.

IMO this deserves a dedicated PR with a focused tests given this specific PR focuses on Lambda fix. Happy to review once it's open.

@markjandejong
Copy link
Copy Markdown

@markjandejong really appreciate the detailed explanation and the solution for the issue.

IMO this deserves a dedicated PR with a focused tests given this specific PR focuses on Lambda fix. Happy to review once it's open.

@Kavindu-Dodan A new PR will be a duplicate of this issue as they both address the exact same problem with regards to Lambda log ingestion of JSON payloads. I figure I would provide input of my findings in hopes to come to a unified resolution.

@Kavindu-Dodan
Copy link
Copy Markdown
Contributor Author

@markjandejong really appreciate the detailed explanation and the solution for the issue.
IMO this deserves a dedicated PR with a focused tests given this specific PR focuses on Lambda fix. Happy to review once it's open.

@Kavindu-Dodan A new PR will be a duplicate of this issue as they both address the exact same problem with regards to Lambda log ingestion of JSON payloads. I figure I would provide input of my findings in hopes to come to a unified resolution.

Thanks for insisting on this issue. I added the change you proposed through commit 67466a3

target_field: "@timestamp"
formats: ["yyyy-MM-dd HH:mm:ss,SSSZ"]
ignore_failure: true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsed.timestamp should be removed after extraction here.

- remove:
    field: parsed.timestamp
    ignore_missing: true
    ignore_failure: true

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I do not think we should remove this. aws.lambda.message is expected to preserve the original message. See here - https://github.com/elastic/integrations/pull/19250/changes#diff-47c5bd851c7038676fd767648faf7ee6b269124328340a869ba2326cf56187e6R59-R63

@Kavindu-Dodan
Copy link
Copy Markdown
Contributor Author

@MichaelKatsoulis @kcreddy @markjandejong can I get another round of review 🙏

Copy link
Copy Markdown
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication (root message + aws.lambda.message.message) won't cause indexing failures. It's a storage/clarity trade-off, not a correctness issue.

The core issue for the string-into-flattened-field bug should be fixed by this PR. Deferring to @elastic/obs-infraobs-integrations as the integration owners to make the final call.

@MichaelKatsoulis
Copy link
Copy Markdown
Contributor

Hey @Kavindu-Dodan ,

while testing this I noticed aws-lambda-plaintext.yml does not populate the message field for any events (REPORT, END, INIT_START). Their grok patterns does extract fields like aws.lambda.event_type and aws.lambda.metrics.* but not message. The document lands with message blank in Discover.

aws-lambda-json.yml already handles this :

- set:
    field: message
    copy_from: event.original
    if: "ctx.message == null && ctx.aws?.lambda?.message == null"

Could we add the same fallback to the plaintext pipeline?
One small thing I noticed: the START pattern has a trailing %{GREEDYMULTILINE:message} that matches no characters on a START RequestId: ... Version: $LATEST line, so the field comes out as an empty string rather than null. So I tested with this condition to cover both:

- set:
    field: message
    copy_from: event.original
    if: "ctx.message == null || ctx.message == ''"
    ignore_failure: true

I believe it is worth adding it in this PR

@Kavindu-Dodan
Copy link
Copy Markdown
Contributor Author

Hey @Kavindu-Dodan ,

while testing this I noticed aws-lambda-plaintext.yml does not populate the message field for any events (REPORT, END, INIT_START). Their grok patterns does extract fields like aws.lambda.event_type and aws.lambda.metrics.* but not message. The document lands with message blank in Discover.

aws-lambda-json.yml already handles this :

- set:
    field: message
    copy_from: event.original
    if: "ctx.message == null && ctx.aws?.lambda?.message == null"

Could we add the same fallback to the plaintext pipeline? One small thing I noticed: the START pattern has a trailing %{GREEDYMULTILINE:message} that matches no characters on a START RequestId: ... Version: $LATEST line, so the field comes out as an empty string rather than null. So I tested with this condition to cover both:

- set:
    field: message
    copy_from: event.original
    if: "ctx.message == null || ctx.message == ''"
    ignore_failure: true

I believe it is worth adding it in this PR

Thanks @MichaelKatsoulis . Yes I think we can add this too

See 7c0f33c, this is done :)

If no further remarks by end of the day, I will merge this PR.

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	packages/aws/changelog.yml

# Conflicts:
#	packages/aws/changelog.yml
#	packages/aws/manifest.yml
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
@Kavindu-Dodan Kavindu-Dodan force-pushed the fix/fix-lambda-message-handling branch from 7c0f33c to 150b526 Compare June 2, 2026 19:17
@Kavindu-Dodan
Copy link
Copy Markdown
Contributor Author

@agithomas could you please have a final look 🙏

@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

Copy link
Copy Markdown
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Kavindu-Dodan Kavindu-Dodan merged commit 4be736d into elastic:main Jun 3, 2026
9 checks passed
@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

Package aws - 6.19.1 containing this change is available at https://epr.elastic.co/package/aws/6.19.1/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Integration:aws AWS Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants