New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redact Ingest Processor #92951
Redact Ingest Processor #92951
Conversation
ae0add3
to
270f8eb
Compare
a3c1f09
to
ecb7b0d
Compare
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @davidkyle, I've created a changelog YAML for you. |
Pinging @elastic/es-docs (Team:Docs) |
This reverts commit 1bf583bc1b733705bd77dff21883ac51dd22802e.
f04af4a
to
58e6480
Compare
@@ -86,7 +86,8 @@ public Map<String, Processor.Factory> getProcessors(Processor.Parameters paramet | |||
entry(NetworkDirectionProcessor.TYPE, new NetworkDirectionProcessor.Factory(parameters.scriptService)), | |||
entry(CommunityIdProcessor.TYPE, new CommunityIdProcessor.Factory()), | |||
entry(FingerprintProcessor.TYPE, new FingerprintProcessor.Factory()), | |||
entry(RegisteredDomainProcessor.TYPE, new RegisteredDomainProcessor.Factory()) | |||
entry(RegisteredDomainProcessor.TYPE, new RegisteredDomainProcessor.Factory()), | |||
entry(RedactProcessor.TYPE, new RedactProcessor.Factory(createGrokThreadWatchdog(parameters))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether we should be invoking createGrokThreadWatchdog
twice, or merely creating a single value once and passing it to both the grok
and redact
processor factories. I'll figure out the answer and let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! The MatchWatchdog
is used by all the Grok processors on this node and all the Grok
s that those processors create, it makes sense to use the same watchdog for the Grok
s created by the redact processor.
I pushed 940c3c1
|
||
If one of the existing Grok https://github.com/elastic/elasticsearch/blob/{branch}/libs/grok/src/main/resources/patterns/ecs-v1[patterns] | ||
does not fit your requirements extend the patterns with the `pattern_definitions` option. | ||
New patterns can be defined with a regular expression or combine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New patterns can be defined with a regular expression or combine Grok patterns from the base definitions to build complex patterns.
This sentence seems a bit stilted to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Updated with an example of defining a custom pattern
RegionTrackingMatchExtractor extractor = new RegionTrackingMatchExtractor(); | ||
for (var grok : groks) { | ||
String className = grok.captureConfig().get(0).name(); | ||
extractor.setCurrentClass(className); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: setCurrentClass(className)
-- is className
the best name for what this is (the variable, the setter, the field of the extractor that the setter is controlling)? Perhaps this is better as patternName
throughout? (I noticed a comment where you referenced it as "Grok pattern name".) I'm open to other ideas here, but I don't love className
(there's a bit of a garden path problem with 'classes' and 'class names' on the JVM having a particular meaning).
Anyway, open to ideas, or counter arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patternName
is better, I've used that consistently throughout now
Generally I think this looks great, please take my comments as relatively minor notes along the way towards me getting to a ✅. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a docs perspective. I've left some minor suggestions.
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feature! Thanks for writing it.
The redact processor uses the Grok rules engine to redact matching Grok patterns. Existing patterns from the Grok pattern bank can be referenced directly and new patterns added inline in the
pattern_definitions
option.One application of the redact processor is to obscure Personal Identifying Information by configuring the processor to detect known patterns such as email or IP addresses.
In an ingest pipeline the redact processor could be augmented by a Named Entity Recognition model to detect and remove names, places.
Given an input document with the field
to_redact
The redact processor, as configured above, will emit
The matched text is replaced by the Grok pattern name. The
<
and>
tokens surrounding the replaced text are configurable via theprefix
andsuffix
options.