Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Zeek] Moving edge processing to ingest pipelines #895

Merged
merged 59 commits into from Apr 23, 2021

Conversation

P1llus
Copy link
Member

@P1llus P1llus commented Apr 7, 2021

What does this PR do?

This PR moves all edge processing for the zeek filesets to ES ingest pipelines, adds pipeline tests to each fileset, and adds splunk data to each fileset golden files to test.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.

Related issues

@P1llus
Copy link
Member Author

P1llus commented Apr 8, 2021

On the topic of splitting out the splunk specific parts into a specific pipeline, we decided to move that to a separate PR, once we have the possibility to store a single pipeline that can be shared between filesets.

@leehinman
Copy link
Contributor

I tried a diff noticed a couple of things ( diff src )
zeek.irc.command is missing
zeek.irc.user is difference
zeek.notice.actions is new
zeek.notice.suppress_for now has decimal point
in http url.port removed
in http url.path added
in dhcp destination.port changed to 68
zeek.dhcp.domain missing
zeek.dhcp.lease_time now has decimal point
zeek.files.duration now has decimal point
zeek.files.md5 missing
zeek.files.mime_type missing
zeek.files.sha1 missing
and dns is different but I think that is expected.

@P1llus
Copy link
Member Author

P1llus commented Apr 12, 2021

I tried a diff noticed a couple of things ( diff src )
zeek.irc.command is missing
zeek.irc.user is difference
zeek.notice.actions is new
zeek.notice.suppress_for now has decimal point
in http url.port removed
in http url.path added
in dhcp destination.port changed to 68
zeek.dhcp.domain missing
zeek.dhcp.lease_time now has decimal point
zeek.files.duration now has decimal point
zeek.files.md5 missing
zeek.files.mime_type missing
zeek.files.sha1 missing
and dns is different but I think that is expected.

Thanks for taking the time @leehinman . I think what happen here is that I renamed a few that was originally "converted". I was a bit unsure why we store the same data in both zeek fields and ECS fields. I am going through all the logs now and fixing up the small bits and pieces.

@P1llus
Copy link
Member Author

P1llus commented Apr 13, 2021

Manually went through all the golden files and applied some small fixes. The DNS fileset can still be ignored for now.

The only issue left is a few float values that does not want to convert, so I removed the convert processors for now, maybe someone else has an idea, the issues are in:

zeek.dhcp.duration
zeek.dhcp.lease_time
zeek.notice.suppress_for
zeek.files.duration

Running a convert processor with type long on these would return errors like:
zeek/files test-files.log:
[0] unexpected pipeline error: For input string: \"0.0\"
[1] unexpected pipeline error: For input string: \"5.316734313964844E-5\"
zeek/notice test-notice.log:
[0] unexpected pipeline error: For input string: \"3600.0\"
zeek/files test-files.log:
[0] unexpected pipeline error: For input string: \"0.0\"
[1] unexpected pipeline error: For input string: \"5.316734313964844E-5\"
zeek/notice test-notice.log:
[0] unexpected pipeline error: For input string: \"3600.0\"

@P1llus
Copy link
Member Author

P1llus commented Apr 13, 2021

Any handling of event.original, duplicate fields etc discussed earlier will be handled in a separate PR to this.

@andrewkroh
Copy link
Member

The only issue left is a few float values that does not want to convert, so I removed the convert processors for now, maybe someone else has an idea, the issues are in:

I did a quick test with convert and had no issues with those float values as inputs. Are the quotes part of the input value?

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_source": {
          "float_output": 0.000053167343,
          "float_in": "5.316734313964844E-5"
        },
        "_ingest": {
          "timestamp": "2021-04-13T14:06:12.744211906Z"
        }
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_source": {
          "float_output": 0,
          "float_in": "0.0"
        },
        "_ingest": {
          "timestamp": "2021-04-13T14:06:12.744218811Z"
        }
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_source": {
          "float_output": 3600,
          "float_in": "3600.0"
        },
        "_ingest": {
          "timestamp": "2021-04-13T14:06:12.744221446Z"
        }
      }
    }
  ]
}

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, this is a big undertaking. Apart from DNS changes that are pending, LGTM.

@leehinman
Copy link
Contributor

LGTM. awesome work.

@P1llus
Copy link
Member Author

P1llus commented Apr 23, 2021

jenkins run tests

@P1llus P1llus merged commit 1c19f76 into elastic:master Apr 23, 2021
james-elastic pushed a commit to james-elastic/integrations that referenced this pull request Jun 30, 2021
* update capture_loss dataset

* update capture_loss fileset

* adding updates to connection fileset

* updating capture_loss and connection pipeline, adding updated dce_rpc pipeline

* updating dhcp fileset, renaming some test files as well

* adding updates to dnp3 fileset

* adding unfinished dns fileset, waiting for registered_domain processor

* updating dpd pipeline and removing edge processors

* updating files pipeline and removing edge processors

* updating ftp pipeline and removing edge processors

* updating http pipeline and removing edge processors

* updating intel pipeline and removing edge processors

* updating irc pipeline and removing edge processors

* updating kerberos pipeline and removing edge processors

* updating modbus pipeline and removing edge processors

* updating mysql pipeline and removing edge processors

* updating notice pipeline and removing edge processors, unsure about if .f is dot expanded or not

* updating ntlm pipeline and removing edge processors

* updating ocsp pipeline and removing edge processors

* updating pe pipeline and removing edge processors

* updating radius pipeline and removing edge processors

* updating rdp pipeline and removing edge processors

* updating rfb pipeline and removing edge processors

* updating sip pipeline and removing edge processors

* updating smb_cmd pipeline and removing edge processors

* updating smb_files pipeline and removing edge processors

* updating smb_mapping pipeline and removing edge processors

* updating smtp pipeline and removing edge processors

* updating snmp pipeline and removing edge processors

* updating socks pipeline and removing edge processors

* updating ssh pipeline and removing edge processors

* updating ssl pipeline and removing edge processors

* updating stats pipeline and removing edge processors

* updating syslog pipeline and removing edge processors

* updating traceroute pipeline and removing edge processors

* updating tunnel pipeline and removing edge processors

* updating tunnel pipeline and removing edge processors

* updating weird pipeline and removing edge processors

* updating x509 pipeline and removing edge processors

* Cleaning up all filesets to make it more consistent, fixed some typos and added more test data

* update changelog

* applying small fixes to all the filesets

* moving edge processing to ingest pipeline for dns fileset

* moving edge processing to ingest pipeline for dns fileset

* moving edge processing to ingest pipeline for dns fileset

* elastic-package format

* remove underscore from testfile names to pass elastic-package check

* updating golden files for geo changes

* updating golden files for geo changes

* update golden files

* updating dynamic fields with geo fields as well, to pass CI

* adding all geo and as fields to dynamic fields

* reverting dynamic field changes and resolving issue in CI

* merging with master and updating the golden files one last time to fix the CI issues

* updating non existent port in test logs
eyalkraft pushed a commit to build-security/integrations that referenced this pull request Mar 30, 2022
* update capture_loss dataset

* update capture_loss fileset

* adding updates to connection fileset

* updating capture_loss and connection pipeline, adding updated dce_rpc pipeline

* updating dhcp fileset, renaming some test files as well

* adding updates to dnp3 fileset

* adding unfinished dns fileset, waiting for registered_domain processor

* updating dpd pipeline and removing edge processors

* updating files pipeline and removing edge processors

* updating ftp pipeline and removing edge processors

* updating http pipeline and removing edge processors

* updating intel pipeline and removing edge processors

* updating irc pipeline and removing edge processors

* updating kerberos pipeline and removing edge processors

* updating modbus pipeline and removing edge processors

* updating mysql pipeline and removing edge processors

* updating notice pipeline and removing edge processors, unsure about if .f is dot expanded or not

* updating ntlm pipeline and removing edge processors

* updating ocsp pipeline and removing edge processors

* updating pe pipeline and removing edge processors

* updating radius pipeline and removing edge processors

* updating rdp pipeline and removing edge processors

* updating rfb pipeline and removing edge processors

* updating sip pipeline and removing edge processors

* updating smb_cmd pipeline and removing edge processors

* updating smb_files pipeline and removing edge processors

* updating smb_mapping pipeline and removing edge processors

* updating smtp pipeline and removing edge processors

* updating snmp pipeline and removing edge processors

* updating socks pipeline and removing edge processors

* updating ssh pipeline and removing edge processors

* updating ssl pipeline and removing edge processors

* updating stats pipeline and removing edge processors

* updating syslog pipeline and removing edge processors

* updating traceroute pipeline and removing edge processors

* updating tunnel pipeline and removing edge processors

* updating tunnel pipeline and removing edge processors

* updating weird pipeline and removing edge processors

* updating x509 pipeline and removing edge processors

* Cleaning up all filesets to make it more consistent, fixed some typos and added more test data

* update changelog

* applying small fixes to all the filesets

* moving edge processing to ingest pipeline for dns fileset

* moving edge processing to ingest pipeline for dns fileset

* moving edge processing to ingest pipeline for dns fileset

* elastic-package format

* remove underscore from testfile names to pass elastic-package check

* updating golden files for geo changes

* updating golden files for geo changes

* update golden files

* updating dynamic fields with geo fields as well, to pass CI

* adding all geo and as fields to dynamic fields

* reverting dynamic field changes and resolving issue in CI

* merging with master and updating the golden files one last time to fix the CI issues

* updating non existent port in test logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert Zeek's edge processing to Ingest Node pipeline
4 participants