Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ID processor #14524

Merged
merged 50 commits into from
Dec 11, 2019
Merged

Add ID processor #14524

merged 50 commits into from
Dec 11, 2019

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Nov 14, 2019

This PR introduces a new add_id processor that generates unique IDs for events to use.

The processor will take the following configuration options:

Name Required? Default Description
target_field No @metadata.id Field in which the generated ID should be stored.
type No elasticsearch Type of ID to generate. Determines the ID generation algorithm.

Currently the only type of ID that can be generated using this processor is elasticsearch. IDs of this type are generated using the same algorithm that Elasticsearch uses for its auto-generated document IDs. These IDs are conceptually similar to Flake IDs in that the ID generation algorithm generates IDs that are roughly ordered as time progresses. However, there are some optimizations done with choosing the ordering of the bytes in the ID to give Elasticsearch a better chance of compressing the IDs.

Related: #14363.

libbeat/processors/elasticsearch_id/mac.go Outdated Show resolved Hide resolved
libbeat/processors/elasticsearch_id/mac.go Outdated Show resolved Hide resolved
libbeat/processors/elasticsearch_id/generator.go Outdated Show resolved Hide resolved
libbeat/processors/elasticsearch_id/generator.go Outdated Show resolved Hide resolved
@ycombinator ycombinator changed the title Elasticsearch ID processor UUID processor Nov 15, 2019
@ycombinator
Copy link
Contributor Author

ycombinator commented Nov 15, 2019

@urso I'm torn about the name of this processor.

Originally I had named it elasticsearch_id because that's the most accurate description of the IDs it generates. But then I decided to generalize it a bit (by giving it an optional type setting) in case we want to teach this processor to generate different types of IDs later. So I renamed it to uuid because conceptually this processor is generating universally-unique IDs. But that could be misleading because actual UUIDs are expected to be 16 bytes long whereas the ES time-based ID generation algorithm generates 15-byte IDs.

Do you have any suggestions about this?

Copy link
Contributor

@dedemorton dedemorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested a couple of minor changes plus one that will fix the build.

Would you mind adding fingerprint to the link list? It'll save you having to rebase.

I need to rethink how we organize the lists because it means a different order if you organize by processor name vs topic title.

[[uuid]]
=== Generate UUID for an event

The `uuid` processor generates a random but roughly ordered UUID for an event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a sentence that explains what a UUID is (for novice users).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to hold off on this change until we've finalized the name of this processor. See #14524 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is moot now, since we renamed the processor to Add ID (add_id) processor.

libbeat/processors/uuid/docs/uuid.asciidoc Outdated Show resolved Hide resolved
libbeat/processors/uuid/docs/uuid.asciidoc Outdated Show resolved Hide resolved
libbeat/docs/processors-list.asciidoc Outdated Show resolved Hide resolved
@dedemorton
Copy link
Contributor

Re: your question about conditional coding: we only need to add conditions if the processor isn't available to all Beats. If it is, then no extra coding is required.

@ycombinator
Copy link
Contributor Author

@urso I believe I've addressed all your feedback from the last round of review now. Please re-review when you get a chance.

In particular, I'd like your thoughts on #14524 (comment) since my change there differs from the implementation you had proposed but I believe solves the underlying problem nevertheless.

I do need to update/add tests in this PR but I will wait to do that based on your feedback, to avoid churn.

@ycombinator
Copy link
Contributor Author

ycombinator commented Dec 10, 2019

Chatted with @urso off PR about the monotonic timestamp issue. We came up with a simpler approach, which I've implemented in f2faf14, along with unit tests in 51f8eed.

@urso Could you give this PR another review when you get a chance, please? Thanks.

@ycombinator
Copy link
Contributor Author

Travis CI is green. Jenkins CI failures are unrelated. Merging.

Copy link
Contributor

@dedemorton dedemorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTm

@ycombinator ycombinator merged commit c714d0b into elastic:master Dec 11, 2019
ycombinator added a commit to ycombinator/beats that referenced this pull request Dec 11, 2019
* WIP: Flake ID processor

* Fleshing out implementation of generator

* Rename package

* Unexport const

* Use increment operator

* Adding processor scaffolding

* Fixing default field

* Adding CHANGELOG entry

* Fixing compile errors

* WIP: unit tests

* Fixing byte copy

* Fixing up tests

* Adding test TODOs

* Adding non-default target field unit test

* Adding one more test TODO

* Adding TODO for post-benchmarking

* Introduce type

* Adding unit test for factory

* Adding unit test for mac

* Adding unit test for mac

* Fleshing out remaining mac unit tests

* Adding tests for ES ID generator

* Remove TODO after experimenting with IIFE (perf was worse)

* Moving doc

* Adding UUID processor to list in docs

* Apply suggestions from docs code review

Co-Authored-By: DeDe Morton <dede.morton@elastic.co>

* Adding godoc

* Rename generator function type

* Exporting and adding godoc

* Adding godoc

* Updating godoc

* Adding Unwrap error methods

* Moving ES ID generator into generators package + singleton construction

* Addressing Hound feedback

* Renaming processor to `add_id`

* Updating processor name in CHANGELOG entry

* More refactoring updates

* Fixing more vet errors

* Unexport config struct as it's only used within this package

* Fixing doc anchor

* Moving generator construction to processor constructor; simplifying factory

* Fixing compile error

* Validate ID generator type in config

* Finer-grained locking to reduce mutex contention

* Initialize package global variables that depend on randomness, later

* Compute last timestamp while accounting for system time going backwards

* Simpler and testable timestamp() function

* Adding unit test for timestamp function

* Re-implementing ES timestamp algorithm

* Removing unused variable
ycombinator added a commit that referenced this pull request Dec 11, 2019
* WIP: Flake ID processor

* Fleshing out implementation of generator

* Rename package

* Unexport const

* Use increment operator

* Adding processor scaffolding

* Fixing default field

* Adding CHANGELOG entry

* Fixing compile errors

* WIP: unit tests

* Fixing byte copy

* Fixing up tests

* Adding test TODOs

* Adding non-default target field unit test

* Adding one more test TODO

* Adding TODO for post-benchmarking

* Introduce type

* Adding unit test for factory

* Adding unit test for mac

* Adding unit test for mac

* Fleshing out remaining mac unit tests

* Adding tests for ES ID generator

* Remove TODO after experimenting with IIFE (perf was worse)

* Moving doc

* Adding UUID processor to list in docs

* Apply suggestions from docs code review

Co-Authored-By: DeDe Morton <dede.morton@elastic.co>

* Adding godoc

* Rename generator function type

* Exporting and adding godoc

* Adding godoc

* Updating godoc

* Adding Unwrap error methods

* Moving ES ID generator into generators package + singleton construction

* Addressing Hound feedback

* Renaming processor to `add_id`

* Updating processor name in CHANGELOG entry

* More refactoring updates

* Fixing more vet errors

* Unexport config struct as it's only used within this package

* Fixing doc anchor

* Moving generator construction to processor constructor; simplifying factory

* Fixing compile error

* Validate ID generator type in config

* Finer-grained locking to reduce mutex contention

* Initialize package global variables that depend on randomness, later

* Compute last timestamp while accounting for system time going backwards

* Simpler and testable timestamp() function

* Adding unit test for timestamp function

* Re-implementing ES timestamp algorithm

* Removing unused variable
@ycombinator ycombinator deleted the lb-processor-es-id branch December 25, 2019 11:09
@andresrc andresrc added the Team:Integrations Label for the Integrations team label Mar 6, 2020
@daqqad
Copy link

daqqad commented Apr 3, 2020

@ycombinator This may be a dumb question (and wrong place to ask it), but with time based UUID generator combined with high volume and multiple filebeat servers what are the chances for duplicate UUIDs?

I'm only asking because 20 character UUID add_id generates is considerably smaller than what fingerprint logstash filter plugin generates (36 chars) which add_id replaced in our environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants