gcp: add keep_json option for audit, dns, firewall and vpcflow datastreams #8299

efd6 · 2023-10-26T07:23:49Z

Proposed commit message

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

[ ]

How to test this PR locally

Related issues

Closes proposal: gcp: add a gcp.<datastream>.flattened field for each of the datastreams #8184

Screenshots

elasticmachine · 2023-10-26T07:26:58Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-10-31T01:43:02.919+0000
Duration: 19 min 15 sec

Test stats 🧪

Test	Results
Failed	0
Passed	68
Skipped	0
Total	68

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

P1llus · 2023-10-26T07:35:57Z

I think the change itself is great! I like these approaches and fully support it, however I would have liked the renames to stay as renames, rather than using set, as the user is usually after fields we are deleting and not the ones we have already renamed, this also makes the flattened object much smaller and we do not need to duplicate data.

This opens up one of the discussions we have had on the team a few times further in the past, and something I was always a bit against.
When we created integrations in the past from JSON payloads, what I liked to do was that I used the JSON processor from source field directly into the correct destination field (like gcp.audit), this has some pros and cons:
Pros:

We do not need to constantly cherry-pick which fields goes from the original JSON to the destination, but rather remove the few we do not want.
If new fields are introduced, they will not be automatically removed.
Cons:
New or missed fields will not have a field type mapped, which can cause issues down the line.

I like the flattened approach here! I believe we should discuss it further at a separate time, and change how we do our unpacking of JSON in general (default to flattened destination and cherry-pick renames out of the flattened field? Leaving any missed or newly added fields in flattened?

Still, I would really like to push for using renames rather than sets on as many as possible!

elasticmachine · 2023-10-26T07:44:40Z

🌐 Coverage report

Name	Metrics % (`covered/total`)	Diff
Packages	100.0% (`6/6`)	💚
Files	100.0% (`6/6`)	💚
Classes	100.0% (`6/6`)	💚
Methods	87.826% (`101/115`)	👎 -1.88
Lines	95.124% (`1502/1579`)	👎 -4.01
Conditionals	100.0% (`0/0`)	💚

…reams

efd6 · 2023-10-26T20:08:15Z

I am torn about the rename v copy; as you can see from the original issue I was using renames. The code changes are much simpler with renames since we don't need to spend so much care on making sure that everything remains in the final flattened field. We also have fewer (~0) futile allocations. The reason I went with the copies here (I did intend to note all this here, but forgot) is the simplicity in documenting behaviour in the copies case compared to the renames case; documenting the copies case is simple and clear, while the renames case has a bunch of understanding required by the user (including reading and understanding the ingest pipeline). I'd like to fully understand the perf implications of copy v rename here, but memory behaviour in Java is a dark art that I know little about.

P1llus · 2023-10-27T07:37:26Z

After some further discussion I think its fine to keep it this way, I just felt it was a bit too close to event.original, and the only reason we have this is so we do not have to use the JSON processor on event.original again. and it felt like a waste, it also puts the pressure on the end-user to remove this large duplicate object at the end to not incur storage cost.

P1llus · 2023-10-27T07:39:35Z

I would like to wait for @andrewkroh opinion on this as well.

We could also just not implement it, and ask users to run the JSON processor on event.original. If the performance is a concern, I would have liked to know a bit more about the statistics used to determine that the JSON processor is actually more heavy than having to pass this large duplicate object through the pipeline.

elasticmachine · 2023-10-29T23:11:39Z

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

andrewkroh · 2023-10-30T16:02:26Z

If we copy, then downstream consumers of the field are less affected by changes when we map a new field (i.e. the flattened field's value is more stable). The downside is increased storage costs for keep_json. You would probably even want disable preserve_original_event if you used keep_json since those are close to duplicative.

If we rename, then the value in the flat field is less stable and less reflective of the original data which probably makes maintaining any follow-on custom pipelines more difficult.

As a custom pipeline author I would probably ask for "copy" because it gives me the greatest control.

efd6 · 2023-10-30T19:41:13Z

I'd be reluctant to automatically disable preserve_original_event (they may be retaining only temporarily to do some processing and will delete it at the end of the @custom), but I do think it's worth documenting in the keep_json option that this is recommended. I agree that the copy is least surprising, and will add docs to note the potential costs of retaining the data.

elasticmachine · 2023-11-12T21:59:56Z

Package gcp - 2.31.0 containing this change is available at https://epr.elastic.co/search?package=gcp

efd6 added enhancement New feature or request Team:Security-External Integrations Integration:GCP labels Oct 26, 2023

efd6 self-assigned this Oct 26, 2023

efd6 force-pushed the 8184-gcp branch from a10a8fd to 97ee488 Compare October 26, 2023 07:25

gcp: add keep_json option for audit, dns, firewall and vpcflow datast…

d51fcb6

…reams

efd6 force-pushed the 8184-gcp branch from 97ee488 to d51fcb6 Compare October 26, 2023 08:25

efd6 marked this pull request as ready for review October 29, 2023 23:11

efd6 requested review from a team as code owners October 29, 2023 23:11

add extended documentation/caveat

4e261fb

efd6 force-pushed the 8184-gcp branch from f5f51a9 to 4e261fb Compare October 31, 2023 01:42

andrewkroh approved these changes Nov 12, 2023

View reviewed changes

efd6 merged commit af38b97 into elastic:main Nov 12, 2023
4 checks passed

efd6 mentioned this pull request Nov 27, 2023

[o365] Mapping, parsing of o365.audit fields Platform and Data.* #8571

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcp: add keep_json option for audit, dns, firewall and vpcflow datastreams #8299

gcp: add keep_json option for audit, dns, firewall and vpcflow datastreams #8299

efd6 commented Oct 26, 2023 •

edited

elasticmachine commented Oct 26, 2023 •

edited

Build stats

Test stats 🧪

P1llus commented Oct 26, 2023

elasticmachine commented Oct 26, 2023 •

edited

efd6 commented Oct 26, 2023

P1llus commented Oct 27, 2023

P1llus commented Oct 27, 2023

elasticmachine commented Oct 29, 2023

andrewkroh commented Oct 30, 2023

efd6 commented Oct 30, 2023 •

edited

elasticmachine commented Nov 12, 2023

gcp: add keep_json option for audit, dns, firewall and vpcflow datastreams #8299

gcp: add keep_json option for audit, dns, firewall and vpcflow datastreams #8299

Conversation

efd6 commented Oct 26, 2023 • edited

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Screenshots

elasticmachine commented Oct 26, 2023 • edited

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

P1llus commented Oct 26, 2023

elasticmachine commented Oct 26, 2023 • edited

🌐 Coverage report

efd6 commented Oct 26, 2023

P1llus commented Oct 27, 2023

P1llus commented Oct 27, 2023

elasticmachine commented Oct 29, 2023

andrewkroh commented Oct 30, 2023

efd6 commented Oct 30, 2023 • edited

elasticmachine commented Nov 12, 2023

efd6 commented Oct 26, 2023 •

edited

elasticmachine commented Oct 26, 2023 •

edited

elasticmachine commented Oct 26, 2023 •

edited

efd6 commented Oct 30, 2023 •

edited