Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move processors into new pipeline #4554

Merged
merged 1 commit into from Jul 3, 2017

Conversation

Projects
None yet
4 participants
@urso
Copy link
Collaborator

commented Jun 24, 2017

  • Update all processors to operate in *beat.Event
  • Update processors factory to use *common.Config, not common.Config. common.Config works, but go-ucfg interfaces/API assume to operate on pointers (passing by value might break functionality in future?)
  • Update most processors interface to pass the processors type/context by pointer instead of by value (due to use of interfaces, the processors have been allocated on heap anyways).
  • Install list of processors in new publisher pipeline. This allows publisher pipeline to account for events being dropped by filters, when reporting ACKs. From beats point of view, there should be no difference between dropped events (by filters) and ACKed events (by outputs). Interface mandates all events being properly ACKed by the publisher pipeline.
  • Tags, Fields and beat meta data settings are now implicitely converted to Processors as well
  • -> change ensures, all metadata (local and global) being applied before the
    actual processors (installed with the pipeline) are executed. Ensures all
    processors will see the very same events.
  • introduce (*processors.Processors).RunBC(common.MapStr) common.MapStr. Will be removed, once the beats themselves configure publisher pipeline to run processors (filebeat/metricbeat are executing processors on their own).
  • add Private field to beat.Event for event-based ACK callbacks. Use Private field to store additional event-metadata required for post-processing in the ACK callback (as processor pipeline can change event at will)
  • Prepare moving beats to new pipeline, but introducing ConnectX to BC API, for beats to connect to shared global pipeline (Note: will be removed when BC layer is removed).
  • Processors are still executed by the go-routine calling (beat.Client).Publish.
  • New Processor Execution Order (C=client based processor, P=pipeline based processor)
    1. (P) extract EventMetadataKey fields + tags (to be removed in favor of 4)
    2. (P) generalize/normalize event
    3. (P) add beats metadata (name, hostname, version)
    4. (P) add pipeline fields + tags
    5. (C) add client fields + tags
    6. (P/C) apply EventMetadataKey fields + tags (to be removed in favor of 4)
    7. (C) client processors list
    8. (P) pipeline processors list
    9. (P) (if publish/debug enabled) log event
    10. (P) (if output disabled) dropEvent

For docs: event processing execution order would be nice to have.

@urso urso added the in progress label Jun 24, 2017

@urso urso force-pushed the urso:enh/pipeline-processors branch from 789983d to fe75d24 Jun 26, 2017

// return event=nil to delete the entire event
return nil, nil
}

func (f dropEvent) String() string { return "drop_event" }
func (*dropEvent) String() string { return "drop_event" }

This comment has been minimized.

Copy link
@ruflin

ruflin Jun 27, 2017

Collaborator

Didn't know this works, thought _ is required.

@urso urso force-pushed the urso:enh/pipeline-processors branch from fe75d24 to bbe6888 Jun 30, 2017

@urso urso added review and removed in progress labels Jun 30, 2017

@urso urso changed the title [WIP] Move processors into new pipeline Move processors into new pipeline Jun 30, 2017

@urso urso added the refactoring label Jun 30, 2017

@urso urso force-pushed the urso:enh/pipeline-processors branch from 5a8b240 to 526cf2e Jul 1, 2017

@ruflin

ruflin approved these changes Jul 1, 2017

Copy link
Collaborator

left a comment

LGTM. Few questions:

  • There are quite a few changes on the JSON part. I wasn't sure how this is related to the other changes?
  • If I understand it right, config wise nothing changes for fields / tags, it's just the internal handling as processor. Also these cannot be configured by a user a processor as they are not registered processors.
actual, err := p.Run(input)

return actual
actual, _ := p.Run(&beat.Event{Fields: input})

This comment has been minimized.

Copy link
@ruflin

ruflin Jul 1, 2017

Collaborator

No error check?

This comment has been minimized.

Copy link
@urso

urso Jul 2, 2017

Author Collaborator

the error has never been checked for. In this PR I'm trying to adjust interfaces, not fix/improve/change behavior in tests or other pieces.

if err != nil {
// XXX: We don't drop the event, but continue filtering here iff the most

This comment has been minimized.

Copy link
@ruflin

ruflin Jul 1, 2017

Collaborator

Could we add an "error" to the event in case a processor did not work so it can be seen in the event, that something went wrong?

This comment has been minimized.

Copy link
@urso

urso Jul 2, 2017

Author Collaborator

Same here. Old/Current logic does ignore errors and at most prints a debug message. I don't intend to change any processing logic.

//
// Pipeline (C=client, P=pipeline)
//
// 1. (P) add EventMetadataKey fields + tags (to be removed in favor of 4)

This comment has been minimized.

Copy link
@ruflin

ruflin Jul 1, 2017

Collaborator

We probably should have that somewhere in the docs

This comment has been minimized.

Copy link
@urso

urso Jul 2, 2017

Author Collaborator

Yeah. Adding tag.

@urso urso added the needs_docs label Jul 2, 2017

@urso urso force-pushed the urso:enh/pipeline-processors branch from 526cf2e to 96eca70 Jul 2, 2017

@urso

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 2, 2017

There are quite a few changes on the JSON part. I wasn't sure how this is related to the other changes?

There's some shared json parsing logic treating @timestamp and such a little different. In processors we're using beat.Event, but the reader in filebeat is returning common.MapStr. As filebeat is still using the old publisher API, I had to convert to temporary beat.Event type. There is no change in logic.

If I understand it right, config wise nothing changes for fields / tags, it's just the internal handling as processor. Also these cannot be configured by a user a processor as they are not registered processors.

Yep. The pipeline/processor.go defines some internal processors, not globally registered/configurable.

@urso urso force-pushed the urso:enh/pipeline-processors branch from 96eca70 to aab3cb1 Jul 2, 2017

@urso urso force-pushed the urso:enh/pipeline-processors branch from aab3cb1 to 0a0bb10 Jul 2, 2017

Move processors into new publisher pipeline
- Update all processors to operate in *beat.Event
- Update processors factory to use `*common.Config`, not `common.Config`.
  `common.Config` works, but go-ucfg interfaces/API assume to operate on
  pointers (passing by value might break functionality in future?)
- Update most processors interface to pass the processors type/context by
  pointer instead of by value (due to use of interfaces, the processors have
  been allocated on heap anyways).
- Install list of processors in new publisher pipeline. This allows publisher
  pipeline to account for events being dropped by filters, when reporting ACKs.
  From beats point of view, there should be no difference between dropped
  events (by filters) and ACKed events (by outputs). Interface mandates all
  events being properly ACKed by the publisher pipeline.
- Tags, Fields and beat meta data settings are now implicitely converted to
  Processors as well
- -> change ensures, all metadata (local and global) being applied before the
  actual processors (installed with the pipeline) are executed. Ensures all
  processors will see the very same events.
- introduce `(*processors.Processors).RunBC(common.MapStr) common.MapStr`. Will
  be removed, once the beats themselves configure publisher pipeline to run
  processors (filebeat/metricbeat are executing processors on their own).
- add `Private` field to `beat.Event` for event-based ACK callbacks. Use
  `Private` field to store additional event-metadata required for post-processing
  in the ACK callback (as processor pipeline can change event at will)
- Prepare moving beats to new pipeline, but introducing `ConnectX` to BC API,
  for beats to connect to shared global pipeline (Note: will be removed when BC
  layer is removed).
- Processors are still executed by the go-routine calling `(beat.Client).Publish`.
- New Processor Execution Order (C=client based processor, P=pipeline based processor)
  1. (P) extract EventMetadataKey fields + tags (to be removed in favor of 4)
  2. (P) generalize/normalize event
  3. (P) add beats metadata (name, hostname, version)
  4. (P) add pipeline fields + tags
  5. (C) add client fields + tags
  6. (P/C) apply EventMetadataKey fields + tags (to be removed in favor of 4)
  7. (C) client processors list
  8. (P) pipeline processors list
  9. (P) (if publish/debug enabled) log event
  10. (P) (if output disabled) dropEvent

client processors higher prio

@urso urso force-pushed the urso:enh/pipeline-processors branch from 0a0bb10 to 6f47a11 Jul 3, 2017

@ruflin ruflin merged commit 5ddd685 into elastic:master Jul 3, 2017

6 checks passed

CLA Commit author is a member of Elasticsearch
Details
beats-ci Build finished.
Details
codecov/patch 51.75% of diff hit (within 100% threshold of 63.11%)
Details
codecov/project 63.16% (+0.04%) compared to 6699121
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@monicasarbu monicasarbu added the libbeat label Jul 3, 2017

@urso urso referenced this pull request Jul 3, 2017

Closed

Publisher Pipeline #4598

22 of 22 tasks complete

@dedemorton dedemorton referenced this pull request Jul 25, 2017

Closed

Beats doc updates in 6.0 #4540

42 of 42 tasks complete

ramon-garcia added a commit to ramon-garcia/beats that referenced this pull request Dec 5, 2017

Move processors into new publisher pipeline (elastic#4554)
- Update all processors to operate in *beat.Event
- Update processors factory to use `*common.Config`, not `common.Config`.
  `common.Config` works, but go-ucfg interfaces/API assume to operate on
  pointers (passing by value might break functionality in future?)
- Update most processors interface to pass the processors type/context by
  pointer instead of by value (due to use of interfaces, the processors have
  been allocated on heap anyways).
- Install list of processors in new publisher pipeline. This allows publisher
  pipeline to account for events being dropped by filters, when reporting ACKs.
  From beats point of view, there should be no difference between dropped
  events (by filters) and ACKed events (by outputs). Interface mandates all
  events being properly ACKed by the publisher pipeline.
- Tags, Fields and beat meta data settings are now implicitely converted to
  Processors as well
- -> change ensures, all metadata (local and global) being applied before the
  actual processors (installed with the pipeline) are executed. Ensures all
  processors will see the very same events.
- introduce `(*processors.Processors).RunBC(common.MapStr) common.MapStr`. Will
  be removed, once the beats themselves configure publisher pipeline to run
  processors (filebeat/metricbeat are executing processors on their own).
- add `Private` field to `beat.Event` for event-based ACK callbacks. Use
  `Private` field to store additional event-metadata required for post-processing
  in the ACK callback (as processor pipeline can change event at will)
- Prepare moving beats to new pipeline, but introducing `ConnectX` to BC API,
  for beats to connect to shared global pipeline (Note: will be removed when BC
  layer is removed).
- Processors are still executed by the go-routine calling `(beat.Client).Publish`.
- New Processor Execution Order (C=client based processor, P=pipeline based processor)
  1. (P) extract EventMetadataKey fields + tags (to be removed in favor of 4)
  2. (P) generalize/normalize event
  3. (P) add beats metadata (name, hostname, version)
  4. (P) add pipeline fields + tags
  5. (C) add client fields + tags
  6. (P/C) apply EventMetadataKey fields + tags (to be removed in favor of 4)
  7. (C) client processors list
  8. (P) pipeline processors list
  9. (P) (if publish/debug enabled) log event
  10. (P) (if output disabled) dropEvent

client processors higher prio

@dedemorton dedemorton referenced this pull request Dec 14, 2017

Closed

Beats doc updates in 6.x #5632

37 of 37 tasks complete
@dedemorton

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2018

I'm removing the needs_docs label because any outstanding doc work required here is tracked by #4598

@dedemorton dedemorton removed the needs_docs label Jan 22, 2018

athom added a commit to athom/beats that referenced this pull request Jan 25, 2018

Move processors into new publisher pipeline (elastic#4554)
- Update all processors to operate in *beat.Event
- Update processors factory to use `*common.Config`, not `common.Config`.
  `common.Config` works, but go-ucfg interfaces/API assume to operate on
  pointers (passing by value might break functionality in future?)
- Update most processors interface to pass the processors type/context by
  pointer instead of by value (due to use of interfaces, the processors have
  been allocated on heap anyways).
- Install list of processors in new publisher pipeline. This allows publisher
  pipeline to account for events being dropped by filters, when reporting ACKs.
  From beats point of view, there should be no difference between dropped
  events (by filters) and ACKed events (by outputs). Interface mandates all
  events being properly ACKed by the publisher pipeline.
- Tags, Fields and beat meta data settings are now implicitely converted to
  Processors as well
- -> change ensures, all metadata (local and global) being applied before the
  actual processors (installed with the pipeline) are executed. Ensures all
  processors will see the very same events.
- introduce `(*processors.Processors).RunBC(common.MapStr) common.MapStr`. Will
  be removed, once the beats themselves configure publisher pipeline to run
  processors (filebeat/metricbeat are executing processors on their own).
- add `Private` field to `beat.Event` for event-based ACK callbacks. Use
  `Private` field to store additional event-metadata required for post-processing
  in the ACK callback (as processor pipeline can change event at will)
- Prepare moving beats to new pipeline, but introducing `ConnectX` to BC API,
  for beats to connect to shared global pipeline (Note: will be removed when BC
  layer is removed).
- Processors are still executed by the go-routine calling `(beat.Client).Publish`.
- New Processor Execution Order (C=client based processor, P=pipeline based processor)
  1. (P) extract EventMetadataKey fields + tags (to be removed in favor of 4)
  2. (P) generalize/normalize event
  3. (P) add beats metadata (name, hostname, version)
  4. (P) add pipeline fields + tags
  5. (C) add client fields + tags
  6. (P/C) apply EventMetadataKey fields + tags (to be removed in favor of 4)
  7. (C) client processors list
  8. (P) pipeline processors list
  9. (P) (if publish/debug enabled) log event
  10. (P) (if output disabled) dropEvent

client processors higher prio

@urso urso deleted the urso:enh/pipeline-processors branch Feb 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.