Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

third party logs are not parsed correctly from fluentbit -> fluentd aggregator -> Malcolm #318

Closed
mmguero opened this issue Jan 2, 2024 · 2 comments
Assignees
Labels
bug Something isn't working external Depends on a bug or feature external to this project logstash Relating to Malcolm's use of Logstash
Milestone

Comments

@mmguero
Copy link
Collaborator

mmguero commented Jan 2, 2024

The tested configuration for getting third party logs to Malcolm has been fluentbit -> Malcolm (see the docs).

Another configuration that should be valid, though, is for fluentbit clients to forward to a fluentd aggregator which then sends to Malcolm.

At the moment, this is not handled correctly. The messages are, I believe, being nested another level deeper which is not picked up by Malcolm's Logstash beats pipeline.

The pieces in play:

Ideally what would happen is that I detect it in the beats pipeline (probably here) and adjust accordingly. or perhaps there's a way in the filebeat config itself in the processors section.

@mmguero mmguero added bug Something isn't working external Depends on a bug or feature external to this project logstash Relating to Malcolm's use of Logstash labels Jan 2, 2024
@mmguero mmguero added this to the v24.01.0 milestone Jan 2, 2024
@mmguero mmguero self-assigned this Jan 5, 2024
@mmguero
Copy link
Collaborator Author

mmguero commented Jan 8, 2024

So I've done some digging: fluentd doesn't have direct TCP output, but (at least this is what the reporter of this issue was using) the loomsystems third-party plugin can be used to forward to our filebeat TCP input listener (the same way fluent-bit could do natively).

I'm attaching a few files that indicate the difference between the output from fluentbit vs. that from fluentd with the loomsystems output plugin.

These files are exported documents from the Dashboards Discover app:

fluentd.json
fluentbit.json

Note the diff:

Image

Looks like an extra message level being added in?

@mmguero
Copy link
Collaborator Author

mmguero commented Jan 8, 2024

Another way to look at it, without going through the opensearch document. This is straight from the wire:

fluent-bit

{
  "date": 1704730190.658066,
  "mem": {
    "Mem.total": 65763568,
    "Mem.used": 23649400,
    "Mem.free": 42114168,
    "Swap.total": 15625212,
    "Swap.used": 1362432,
    "Swap.free": 14262780
  },
  "module": "mem",
  "host": {
    "name": "sgrover"
  }
}

fluentd

{
  "timestamp": 1704730229,
  "host": "10.9.0.215",
  "message": {
    "date": 1704730228.658081,
    "mem": {
      "Mem.total": 65763568,
      "Mem.used": 23763928,
      "Mem.free": 41999640,
      "Swap.total": 15625212,
      "Swap.used": 1362432,
      "Swap.free": 14262780
    },
    "module": "mem",
    "host": {
      "name": "sgrover"
    }
  }
}

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Jan 8, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Jan 8, 2024
@mmguero mmguero closed this as completed Jan 8, 2024
@mmguero mmguero added the CISA label Jan 8, 2024
This was referenced Jan 17, 2024
mmguero added a commit that referenced this issue Jan 17, 2024
Malcolm v24.01.0 contains new features, improvements, bug fixes and component version updates.

v23.12.1...v24.0.1

* Features and enhancements
    + new Malcolm instance landing page (#252)
    + file carve download with password-protected .zip file (#288)
    + new "all files exept common plain text files" option for Malcolm's file carving to match Hedgehog capability (#290)
    + allow customizing indexes for logs written to OpenSearch/Elasticsearch (#313)
    + more consistently differentiate between uploaded and live-captured traffic (#321)
    + make download extracted file context item from Arkime smarter (#330)
    + improve netbox device type library import by using "official" import script (#384)
* Component version updates
    + Alpine Linux to [v3.19](https://alpinelinux.org/posts/Alpine-3.19.0-released.html) as the base for some Docker images
    + Fluent Bit to [v2.2.2](https://github.com/fluent/fluent-bit/releases/tag/v2.2.2)
    + Beats to [v8.11.4](https://www.elastic.co/guide/en/beats/libbeat/8.11/release-notes-8.11.4.html)
    + LogStash to [v8.11.4](https://www.elastic.co/guide/en/logstash/current/logstash-8-11-4.html)
* Bug fixes
    + Suricata Alerts dashboard "Alerts - Tags" visualization is useless (#314)
    + third party logs are not parsed correctly from fluentbit -> fluentd aggregator -> Malcolm (#318)
    + update document lookup APIs to search either network or host data (#322)
    + suricata rule update is broken (#323)
    + time sync from hedgehog to Malcolm opensearch instance not working (#324)
    + fix issue specifying database mode via command-line
    + have pruning of OpenSearch indices (based on size) include "other" Malcolm indices as well (e.g., nginx logs, system resources, third-party logs, etc.)
* Configuration changes (in [environment variables](https://idaholab.github.io/Malcolm/docs/malcolm-config.html#MalcolmConfigEnvVars) in [`./config/`](https://github.com/idaholab/Malcolm/tree/v24.0.1/config))
    + added the following variables with relation to #313
        - added `ARKIME_ROTATE_INDEX` to [`arkime.env`](https://github.com/idaholab/Malcolm/tree/v24.0.1/arkime.env.example) with default value of `daily` (see [Arkime docs on rotateIndex](https://arkime.com/settings#rotateIndex))
        - added the following variables and defaults to [`opensearch.env`](https://github.com/idaholab/Malcolm/tree/v24.0.1/opensearch.env.example):
        ```
        # OpenSearch index patterns and timestamp fields
        # Index pattern for network traffic logs written via Logstash (e.g., Zeek logs, Suricata alerts)
        MALCOLM_NETWORK_INDEX_PATTERN=arkime_sessions3-*
        # Default time field to use for network traffic logs in Logstash and Dashboards
        MALCOLM_NETWORK_INDEX_TIME_FIELD=firstPacket
        # Suffix used to create index to which network traffic logs are written (supports Ruby strftime strings in %{})
        MALCOLM_NETWORK_INDEX_SUFFIX=%{%y%m%d}
        # Index pattern for other logs written via Logstash (e.g., nginx, beats, fluent-bit, etc.)
        MALCOLM_OTHER_INDEX_PATTERN=malcolm_beats_*
        # Default time field to use for other logs in Logstash and Dashboards
        MALCOLM_OTHER_INDEX_TIME_FIELD=@timestamp
        # Suffix used to create index to which other logs are written (supports Ruby strftime strings in %{})
        MALCOLM_OTHER_INDEX_SUFFIX=%{%y%m%d}
        # Index pattern used specifically by Arkime (will probably match MALCOLM_NETWORK_INDEX_PATTERN, should probably be arkime_sessions3-*)
        ARKIME_NETWORK_INDEX_PATTERN=arkime_sessions3-*
        # Default time field used by for sessions in Arkime viewer
        ARKIME_NETWORK_INDEX_TIME_FIELD=firstPacket
        ```
    + changed default for `EXTRACTED_FILE_HTTP_SERVER_KEY` to `infected` in [`zeek-secret.env`](https://github.com/idaholab/Malcolm/tree/v24.0.1/zeek-secret.env.example)
    + added `EXTRACTED_FILE_HTTP_SERVER_ZIP` with default value of `false` in [`zeek.env`](https://github.com/idaholab/Malcolm/tree/v24.0.1/zeek.env.example), see (#288)
mmguero added a commit to cisagov/Malcolm that referenced this issue Jan 17, 2024
Malcolm v24.01.0 contains new features, improvements, bug fixes and component version updates.

v23.12.1...v24.0.1

* Features and enhancements
    + new Malcolm instance landing page (idaholab#252)
    + file carve download with password-protected .zip file (idaholab#288)
    + new "all files exept common plain text files" option for Malcolm's file carving to match Hedgehog capability (idaholab#290)
    + allow customizing indexes for logs written to OpenSearch/Elasticsearch (idaholab#313)
    + more consistently differentiate between uploaded and live-captured traffic (idaholab#321)
    + make download extracted file context item from Arkime smarter (idaholab#330)
    + improve netbox device type library import by using "official" import script (idaholab#384)
* Component version updates
    + Alpine Linux to [v3.19](https://alpinelinux.org/posts/Alpine-3.19.0-released.html) as the base for some Docker images
    + Fluent Bit to [v2.2.2](https://github.com/fluent/fluent-bit/releases/tag/v2.2.2)
    + Beats to [v8.11.4](https://www.elastic.co/guide/en/beats/libbeat/8.11/release-notes-8.11.4.html)
    + LogStash to [v8.11.4](https://www.elastic.co/guide/en/logstash/current/logstash-8-11-4.html)
* Bug fixes
    + Suricata Alerts dashboard "Alerts - Tags" visualization is useless (idaholab#314)
    + third party logs are not parsed correctly from fluentbit -> fluentd aggregator -> Malcolm (idaholab#318)
    + update document lookup APIs to search either network or host data (idaholab#322)
    + suricata rule update is broken (idaholab#323)
    + time sync from hedgehog to Malcolm opensearch instance not working (idaholab#324)
    + fix issue specifying database mode via command-line
    + have pruning of OpenSearch indices (based on size) include "other" Malcolm indices as well (e.g., nginx logs, system resources, third-party logs, etc.)
* Configuration changes (in [environment variables](https://idaholab.github.io/Malcolm/docs/malcolm-config.html#MalcolmConfigEnvVars) in [`./config/`](https://github.com/cisagov/Malcolm/tree/v24.0.1/config))
    + added the following variables with relation to idaholab#313
        - added `ARKIME_ROTATE_INDEX` to [`arkime.env`](https://github.com/cisagov/Malcolm/tree/v24.0.1/arkime.env.example) with default value of `daily` (see [Arkime docs on rotateIndex](https://arkime.com/settings#rotateIndex))
        - added the following variables and defaults to [`opensearch.env`](https://github.com/cisagov/Malcolm/tree/v24.0.1/opensearch.env.example):
        ```
        # OpenSearch index patterns and timestamp fields
        # Index pattern for network traffic logs written via Logstash (e.g., Zeek logs, Suricata alerts)
        MALCOLM_NETWORK_INDEX_PATTERN=arkime_sessions3-*
        # Default time field to use for network traffic logs in Logstash and Dashboards
        MALCOLM_NETWORK_INDEX_TIME_FIELD=firstPacket
        # Suffix used to create index to which network traffic logs are written (supports Ruby strftime strings in %{})
        MALCOLM_NETWORK_INDEX_SUFFIX=%{%y%m%d}
        # Index pattern for other logs written via Logstash (e.g., nginx, beats, fluent-bit, etc.)
        MALCOLM_OTHER_INDEX_PATTERN=malcolm_beats_*
        # Default time field to use for other logs in Logstash and Dashboards
        MALCOLM_OTHER_INDEX_TIME_FIELD=@timestamp
        # Suffix used to create index to which other logs are written (supports Ruby strftime strings in %{})
        MALCOLM_OTHER_INDEX_SUFFIX=%{%y%m%d}
        # Index pattern used specifically by Arkime (will probably match MALCOLM_NETWORK_INDEX_PATTERN, should probably be arkime_sessions3-*)
        ARKIME_NETWORK_INDEX_PATTERN=arkime_sessions3-*
        # Default time field used by for sessions in Arkime viewer
        ARKIME_NETWORK_INDEX_TIME_FIELD=firstPacket
        ```
    + changed default for `EXTRACTED_FILE_HTTP_SERVER_KEY` to `infected` in [`zeek-secret.env`](https://github.com/cisagov/Malcolm/tree/v24.0.1/zeek-secret.env.example)
    + added `EXTRACTED_FILE_HTTP_SERVER_ZIP` with default value of `false` in [`zeek.env`](https://github.com/cisagov/Malcolm/tree/v24.0.1/zeek.env.example), see (idaholab#288)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working external Depends on a bug or feature external to this project logstash Relating to Malcolm's use of Logstash
Projects
Status: Released
Development

No branches or pull requests

1 participant