Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing duplicate system/syslog ingest pipeline #725

Merged
merged 16 commits into from Feb 19, 2021

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Feb 18, 2021

What does this PR do?

This PR removes the duplicate ingest pipeline in the system/syslog data stream.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.

How to test this PR locally

See testing instructions in elastic/kibana#91778.

@ycombinator
Copy link
Contributor Author

Hi @jen-huang, thanks for finding and reporting the duplicate pipeline. What would be the easiest way to test this fix?

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix! I have a PR that adds some additional guards on the Kibana side elastic/kibana#91778. The 404 errors weren't breaking either way. I think this is a pretty safe change so I'm happy to test it once this gets deployed out somewhere. Otherwise, there are some testing instructions in my PR.

@ycombinator
Copy link
Contributor Author

I think this is a pretty safe change so I'm happy to test it once this gets deployed out somewhere.

Thanks @jen-huang. I'm good with testing this change once this package is published to the snapshot registry then. WDYT @fearful-symmetry?

@ycombinator ycombinator added the Team:Integrations Label for the Integrations team label Feb 18, 2021
@elasticmachine
Copy link

Pinging @elastic/integrations (Team:Integrations)

@elasticmachine
Copy link

elasticmachine commented Feb 18, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #725 updated

  • Start Time: 2021-02-19T02:44:12.636+0000

  • Duration: 15 min 2 sec

  • Commit: a9213c8

Test stats 🧪

Test Results
Failed 0
Passed 188
Skipped 0
Total 188

Trends 🧪

Image of Build Times

Image of Tests

@fearful-symmetry
Copy link
Contributor

@ycombinator I don't know enough how how ingest pipelines are installed to say for sure, but I'm overall fine with it. I'm mostly curious how this has flown under the radar. I don't think we've touched syslog for a while?

@ycombinator
Copy link
Contributor Author

ycombinator commented Feb 18, 2021

I'm mostly curious how this has flown under the radar.

Both pipelines for the system/syslog dataset are very similar (the only material difference I could find was that the .yml pipeline had an extra processor at the end to set event.type which was absent in the .json pipeline). So my guess is that the net effects on the indexed data were the same (or close enough) that no one noticed any issues. 🤷

I think what might help is adding pipeline tests for this data stream. Looks like we have some test files that could be ported over from the corresponding Filebeat fileset.

@jen-huang what happens in Kibana if it detects a default.yml and default.json? I assume it installs both pipelines but they end up having the same names so the first one gets overwritten by the second one? Is there any determinism here, e.g. the .json would always be installed before the .yml one or vice versa?

@ycombinator
Copy link
Contributor Author

I think what might help is adding pipeline tests for this data stream. Looks like we have some test files that could be ported over from the corresponding Filebeat fileset.

Just to clarify, I'm adding these to this PR.

@jen-huang
Copy link
Contributor

jen-huang commented Feb 18, 2021

@jen-huang what happens in Kibana if it detects a default.yml and default.json? I assume it installs both pipelines but they end up having the same names so the first one gets overwritten by the second one? Is there any determinism here, e.g. the .json would always be installed before the .yml one or vice versa?

Yep, both get installed and the second overwrites the first one. We don't do any sort of logic around the order, just whatever the order happens to be in the package manifest. Looking at system-0.10.9, the .json always seems to appear before .yml, so the .yml version would be the one that is "actually" installed.

Edit: Looking closer at the code, we actually send the ES requests in parallel, so it's possible that either format gets installed.

@ycombinator
Copy link
Contributor Author

@fearful-symmetry @mtojek do you have any tips on how to debug the tests failures in this PR: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/PR-725/8/tests/? Initially I'm focussing on getting the pipeline test for test-darwin-syslog-sample.log to pass. I assume the other test cases have similar issues.

@mtojek
Copy link
Contributor

mtojek commented Feb 18, 2021

I think you need to write the config to mark new lines for pipeline tests and probably timezone (as the pipeline doesn't accept lack of it). Something like this:

{
  "multiline": {
    "first_line_pattern": "^Dec 13 "
  },
  "fields": {
    "event.timezone": "GMT+2"
  }
}

@ycombinator
Copy link
Contributor Author

ycombinator commented Feb 18, 2021

Thanks for the tip, @mtojek. I had already added the multiline config but looks like I needed the fields config too for the time zone field. Trying that now...

@ycombinator
Copy link
Contributor Author

@mtojek I finally gave up and ran elastic-package test pipeline --data-stream syslog --generate. After doing this and making a couple of other tweaks to the test case config files, all pipeline tests for the system/syslog data stream are now passing.

Take a look at the test-darwin-syslog-sample.log-expected.json file and compare it with the test-darwin-syslog-sample.log file. The expected file has only one event while the corresponding raw log file has three events. This looks wrong 🙂 and the tests should not be passing, but they are! I think maybe there is a bug in the pipeline test runner?

@ycombinator
Copy link
Contributor Author

@mtojek ignore my previous comment 😄. I had to tweak the multiline.first_line_pattern in the test case config. 🤦

@ycombinator
Copy link
Contributor Author

@fearful-symmetry I added test case files for the system/syslog data stream. Would you mind re-reviewing this PR? Thanks!

packages/system/changelog.yml Show resolved Hide resolved
@ycombinator ycombinator merged commit d91f960 into elastic:master Feb 19, 2021
@ycombinator ycombinator deleted the system-dedup-pipeline branch February 19, 2021 15:34
eyalkraft pushed a commit to build-security/integrations that referenced this pull request Mar 30, 2022
* Removing duplicate system/syslog ingest pipeline

* Adding changelog entry

* Adding pipeline test files

* Re-formatting

* Renaming files

* Adding expected root key

* Adding test config

* Adding event timezone field

* Updating first line pattern

* Fixing up fields

* Adding more test configs

* Formatting one event as a test

* Using --generate

* Adding tz config

* Trying a different regex

* Fixing regexp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants