Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide way to access extracted_files on Hedgehog from Malcolm #331

Closed
mmguero opened this issue Jan 9, 2024 · 5 comments
Closed

provide way to access extracted_files on Hedgehog from Malcolm #331

mmguero opened this issue Jan 9, 2024 · 5 comments
Assignees
Labels
carving Relating to carving (extraction) of files from traffic and the scanning of those files enhancement New feature or request sensor For issues dealing with the Hedgehog OS capture sensor UI Relating to general UI experience
Milestone

Comments

@mmguero
Copy link
Collaborator

mmguero commented Jan 9, 2024

EDIT: See the comment below for the direction we decided to go for this and its progress.


I'm not sure what the best way to do this is, but wanted to get the idea down.

Malcolm provides the /extracted_files/ interface for browsing the carved files it processes.

Hedgehog also carves files, but it just stores them locally and doesn't really provide a way to get to them other than logging into the sensor and grabbing them manually.

However other than Arkime's PCAP payload retrieval, there's not really a good established way to surface that kind of a reachback from Malcolm to the sensor. Some careful planning and design would be needed.

Tangentially related are #330 and #329

@mmguero mmguero added carving Relating to carving (extraction) of files from traffic and the scanning of those files enhancement New feature or request sensor For issues dealing with the Hedgehog OS capture sensor labels Jan 9, 2024
@mmguero mmguero added this to the z.staging milestone Jan 17, 2024
@mmguero
Copy link
Collaborator Author

mmguero commented Mar 27, 2024

Andy Wick (Arkime) suggested it would be possible to do this via an arkime viewer plugin. As we're already using Arkime viewer to retrieve PCAP payload, this might be a good way to do it.

@mmguero mmguero modified the milestones: z.staging, v24.04.0 Mar 27, 2024
@mmguero mmguero self-assigned this Mar 27, 2024
@mmguero
Copy link
Collaborator Author

mmguero commented Mar 27, 2024

The other way to do it (maybe easier) is to reuse the existing code in Malcolm for the download interface (as it supports zipping/encrypting the file, etc.) and then somehow proxy those requests through Malcolm to that service on hedgehog. We'd need to make sure we use the same code to restrict access to the Malcolm IP via ACL:

$ rg acl hedgehog-iso/
...

hedgehog-iso/interface/sensor_ctl/control_vars.conf
15:export ARKIME_PACKET_ACL=

hedgehog-iso/interface/sensor_ctl/supervisor.init/arkime_config_populate.sh
114:  # update the firewall ACL (via ufw) to allow retrieval of packets
...

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Mar 29, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Mar 29, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Mar 29, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Mar 29, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 1, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 2, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
@mmguero
Copy link
Collaborator Author

mmguero commented Apr 2, 2024

Here's how the work in progress is going:

Similar to how the arkime viewer session can retrieve PCAP payloads from port 8005/tcp over TLS on Hedgehog, the extracted files web interface is now available on port 8006/tcp over TLS as well:

image

However, in addition to being accessible directly from the hedgehog over that port (which would not be the common way people would access it), Malcolm will proxy requests to /hh-extracted-files/<hedgehog IP or hostname>/ to the Hedgehog (see the address bar in this screenshot):

image

Access to this port on the Hedgehog is restricted via firewall ACL to IP addresses that have been explicitly allowlisted, which would basically mean set to the Malcolm server's IP address.

For comparison's sake, here's a screenshot of the UI for the Malcolm server itself:

image

I think this is the bulk of this feature, the rest of it is window decoration to try to make it more convenient to navigate to the appropriate place. I'm working on putting some value action links into Arkime (arkime/arkime#2731 will make this even more convenient for this feature).

We'll see what else needs to happen. But it's coming along pretty good.

@mmguero mmguero added the UI Relating to general UI experience label Apr 2, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 2, 2024
@mmguero mmguero added the CISA label Apr 2, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 2, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 2, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 2, 2024
@mmguero
Copy link
Collaborator Author

mmguero commented Apr 2, 2024

I'm also working on integrating these download links (whether to Malcolm or Hedgehog) into Dashboards and Arkime:

  • Arkime, via the Sessions tab when viewing for zeek.files logs

image

  • Dashboards, in the Files dashboard

image

The way I'm doing this at the moment is by generating the correct download URI during Logstash enrichment base on the host.name field, for whichever Hedgehog device or for Malcolm itself. The drawback to this is that if the hostname of the hedgehog changes in the future, the link will no longer be correct. The only way I can think of around this would be to somehow store a persistent separate database mapping the node name to its host or IP address. Arkime is already doing this, in the stats index, but it's not a perfect solution because we're talking about Zeek logs, not Arkime sessions. I think it's okay: worst case scenario is the user can still navigate manually to the hh-extracted-files link (as described above) and find it there.

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 8, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 8, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Apr 8, 2024
…tp_server.py and the setting/creation of ACL rules on hedgehog
@mmguero mmguero closed this as completed Apr 8, 2024
This was referenced Apr 30, 2024
mmguero added a commit that referenced this issue Apr 30, 2024
Malcolm v24.04.0

* Features and enhancements
    - Zeek-extracted files scanned and preserved on a [Hedgehog Linux](https://idaholab.github.io/Malcolm/docs/malcolm-hedgehog-e2e-iso-install.html#HedgehogZeekFileExtraction) sensor can now be accessed via [the extracted files download user interface](https://idaholab.github.io/Malcolm/docs/file-scanning.html#ZeekFileExtractionUI) (#331).
    - Improvements to creation of index templates, dashboards, and other saved objects on startup (#208) to ensure that saved objects get created correctly upon upgrade (see [this comment](#208 (comment)) for more details on this feature).
    - [Populating the NetBox inventory via passively-gathered network traffic metadata](https://idaholab.github.io/Malcolm/docs/asset-interaction-analysis.html#NetBoxPopPassive) now uses network traffic logs for DNS, NTLM, and DHCP to identify assets' host names when possible for use when populating device and VM names (#415). Autopopulated devices now have their *status* field set to `Active` rather than `Stage`, and uses *tags* instead to indicated that they were created through autopopulation.
    - Users can now specify pruning thresholds for [carved files](https://idaholab.github.io/Malcolm/docs/file-scanning.html#ZeekFileExtraction) so that old files are deleted in order to avoid filling available storage (#453). See a new section of documentation on [Managing disk usage](https://idaholab.github.io/Malcolm/docs/malcolm-config.html#DiskUsage) for more information about this and similar settings.
    - Users can now specify a prefix that will be prepended to dashboards as they are imported into OpenSearch Dashboards or Kibana, allowing users who have dashboards from other sources to differentiate between those and Malcolm's (#455).
    - The default anomaly detectors created for the OpenSearch Anomaly Detection plugin are now created with [category fields for high cardinality](https://opensearch.org/docs/latest/observing-your-data/ad/index/#optional-set-category-fields-for-high-cardinality) to allow for better breakdown of contributing values to anomalies discovered (#464).
    - Include [JA4+ plugin in Arkime](https://arkime.com/settings#ja4plus). See #419 for status on upcoming full JA4+ support in Malcolm.
    - Hedgehog Linux sensors can now [periodically refresh](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/hedgehog-iso/interface/sensor_ctl/control_vars.conf#L75) their [Zeek inteligence files](https://idaholab.github.io/Malcolm/docs/hedgehog-config-zeek-intel.html#HedgehogZeekIntel).
        + **NOTE**: Due to an oversight, a value is missing from the default Hedgehog Linux configuration in this release, preventing the intel refresh cron job from executing. As a workaround, appending the line `export INTEL_DIR=/opt/sensor/sensor_ctl/zeek/intel` to `/opt/sensor/sensor_ctl/control_vars.conf` and restarting the sensor services will remedy the situation. This will be corrected in the next Malcolm release.
    - Assorted documentation improvements.
* Component version updates
    - Arkime to [v5.1.2](https://github.com/arkime/arkime/blob/bcd9d7e68be8e4a52a17c35211c5d5a7fdcc1a1c/CHANGELOG#L36-L41)
    - OpenSearch and OpenSearch Dashboards to [v2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md)
    - Beats to [v8.13.2](https://www.elastic.co/guide/en/beats/libbeat/current/release-notes-8.13.2.html)
    - Logstash to [v8.13.2](https://www.elastic.co/guide/en/logstash/current/logstash-8-13-2.html)
    - gunicorn to v22.0.0 to address [CVE-2024-1135](GHSA-w3h3-4rj7-4ph4).
    - elasticsearch-dsl to [v8.13.0](https://github.com/elastic/elasticsearch-dsl-py/releases/tag/v8.13.0)
    - elasticsearch-py to [v8.13.0](https://github.com/elastic/elasticsearch-py/releases/tag/v8.13.0)
    - idna to v3.7 to address [CVE-2024-3651](GHSA-jjg7-2v4v-x38h)
    - Fluent Bit to [v3.0.3](https://fluentbit.io/announcements/v3.0.3/)
* Bug fixes
    - The documentation for [Windows host system configuration](https://idaholab.github.io/Malcolm/docs/host-config-windows.html#HostSystemConfigWindows) was out of date and has been updated for the latest version of Microsoft Windows Subsystem for Linux (#421).
    - An issue was fixed in which Malcolm's list of users and their password hashes could become corrupted if the file did not initially end with a newline character (#426).
    - The manner in which Zeek intel files are generated has been changed to avoid problems found in Kubernetes deployments when scaling out the number of `zeek-live` containers (#456). See [this comment](#456 (comment)) for more details.
    - Removed the version top-level element from `docker-compose.yml` files as it is [now obsolete](https://docs.docker.com/compose/compose-file/04-version-and-name/) and caused a warning message that sometimes was not handled correctly.
    - Fix Malcolm ISO not correctly detecting if it's in a live boot ISO environment or installed mode.
    - Restart live Zeek instances with `zeekctl deploy` instead of `zeekctl restart`.
* Configuration changes (in [environment variables](https://idaholab.github.io/Malcolm/docs/malcolm-config.html#MalcolmConfigEnvVars) in [`./config/`](https://github.com/idaholab/Malcolm/blob/v24.04.0/config))
    - `ARKIME_QUERY_ALL_INDICES` in [`arkime.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/arkime.env.example#L9-L11) can be set to control the [`queryAllIndices` setting](https://arkime.com/settings#queryAllIndices) in Arkime's `config.ini`.
    - `DASHBOARDS_PREFIX` in [`dashboards-helper.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/dashboards-helper.env.example#L3C1-L4C19) has been added for #455 (see above in **Features and Enhancements**).
    - `LOGSTASH_NETBOX_ENRICHMENT_DATASETS` in [`logstash.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/logstash.env.example#L13) has been changed to include `zeek.dhcp`, `zeek.dns`, and `zeek.ntlm` to support #415 (see above in **Features and Enhancements**).
    - `LOGSTASH_ZEEK_IGNORED_LOGS` in [`logstash.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/logstash.env.example#L15) has been changed to remove `capture_loss` and `stats` so that those diagnostic Zeek logs can be parsed without the user having to manually change this variable.
    - `ZEEK_CRON` has been removed from [`zeek-live.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/zeek-live.env.example) and `ZEEK_INTEL_REFRESH_CRON_EXPRESSION` was removed from [`zeek.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/zeek.env.example) and moved to the "offline" version of the container in [`zeek-offline.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/zeek-offline.env.example#L17-L19) for #456.
    - `EXTRACTED_FILE_PRUNE_THRESHOLD_MAX_SIZE`, `EXTRACTED_FILE_PRUNE_THRESHOLD_TOTAL_DISK_USAGE_PERCENT`, and `EXTRACTED_FILE_PRUNE_INTERVAL_SECONDS` were added to [`zeek.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/zeek.env.example#L32-L37) for #453. See a new section of documentation on [Managing disk usage](https://idaholab.github.io/Malcolm/docs/malcolm-config.html#DiskUsage) for more information about these and similar settings.
mmguero added a commit to cisagov/Malcolm that referenced this issue Apr 30, 2024
Malcolm v24.04.0

* Features and enhancements
    - Zeek-extracted files scanned and preserved on a [Hedgehog Linux](https://cisagov.github.io/Malcolm/docs/malcolm-hedgehog-e2e-iso-install.html#HedgehogZeekFileExtraction) sensor can now be accessed via [the extracted files download user interface](https://cisagov.github.io/Malcolm/docs/file-scanning.html#ZeekFileExtractionUI) (idaholab#331).
    - Improvements to creation of index templates, dashboards, and other saved objects on startup (idaholab#208) to ensure that saved objects get created correctly upon upgrade (see [this comment](idaholab#208 (comment)) for more details on this feature).
    - [Populating the NetBox inventory via passively-gathered network traffic metadata](https://cisagov.github.io/Malcolm/docs/asset-interaction-analysis.html#NetBoxPopPassive) now uses network traffic logs for DNS, NTLM, and DHCP to identify assets' host names when possible for use when populating device and VM names (idaholab#415). Autopopulated devices now have their *status* field set to `Active` rather than `Stage`, and uses *tags* instead to indicated that they were created through autopopulation.
    - Users can now specify pruning thresholds for [carved files](https://cisagov.github.io/Malcolm/docs/file-scanning.html#ZeekFileExtraction) so that old files are deleted in order to avoid filling available storage (idaholab#453). See a new section of documentation on [Managing disk usage](https://cisagov.github.io/Malcolm/docs/malcolm-config.html#DiskUsage) for more information about this and similar settings.
    - Users can now specify a prefix that will be prepended to dashboards as they are imported into OpenSearch Dashboards or Kibana, allowing users who have dashboards from other sources to differentiate between those and Malcolm's (idaholab#455).
    - The default anomaly detectors created for the OpenSearch Anomaly Detection plugin are now created with [category fields for high cardinality](https://opensearch.org/docs/latest/observing-your-data/ad/index/#optional-set-category-fields-for-high-cardinality) to allow for better breakdown of contributing values to anomalies discovered (idaholab#464).
    - Include [JA4+ plugin in Arkime](https://arkime.com/settings#ja4plus). See idaholab#419 for status on upcoming full JA4+ support in Malcolm.
    - Hedgehog Linux sensors can now [periodically refresh](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/hedgehog-iso/interface/sensor_ctl/control_vars.conf#L75) their [Zeek inteligence files](https://idaholab.github.io/Malcolm/docs/hedgehog-config-zeek-intel.html#HedgehogZeekIntel). **NOTE**: due to an oversight, a necessary variable is missing in this release that is required for this to work. Appending the line `export INTEL_DIR=/opt/sensor/sensor_ctl/zeek/intel` to `/opt/sensor/sensor_ctl/control_vars.conf` will correct this. This will be corrected in the next Malcolm release.
    - Assorted documentation improvements.
* Component version updates
    - Arkime to [v5.1.2](https://github.com/arkime/arkime/blob/bcd9d7e68be8e4a52a17c35211c5d5a7fdcc1a1c/CHANGELOG#L36-L41)
    - OpenSearch and OpenSearch Dashboards to [v2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md)
    - Beats to [v8.13.2](https://www.elastic.co/guide/en/beats/libbeat/current/release-notes-8.13.2.html)
    - Logstash to [v8.13.2](https://www.elastic.co/guide/en/logstash/current/logstash-8-13-2.html)
    - gunicorn to v22.0.0 to address [CVE-2024-1135](GHSA-w3h3-4rj7-4ph4).
    - elasticsearch-dsl to [v8.13.0](https://github.com/elastic/elasticsearch-dsl-py/releases/tag/v8.13.0)
    - elasticsearch-py to [v8.13.0](https://github.com/elastic/elasticsearch-py/releases/tag/v8.13.0)
    - idna to v3.7 to address [CVE-2024-3651](GHSA-jjg7-2v4v-x38h)
    - Fluent Bit to [v3.0.3](https://fluentbit.io/announcements/v3.0.3/)
* Bug fixes
    - The documentation for [Windows host system configuration](https://cisagov.github.io/Malcolm/docs/host-config-windows.html#HostSystemConfigWindows) was out of date and has been updated for the latest version of Microsoft Windows Subsystem for Linux (idaholab#421).
    - An issue was fixed in which Malcolm's list of users and their password hashes could become corrupted if the file did not initially end with a newline character (idaholab#426).
    - The manner in which Zeek intel files are generated has been changed to avoid problems found in Kubernetes deployments when scaling out the number of `zeek-live` containers (idaholab#456). See [this comment](idaholab#456 (comment)) for more details.
    - Removed the version top-level element from `docker-compose.yml` files as it is [now obsolete](https://docs.docker.com/compose/compose-file/04-version-and-name/) and caused a warning message that sometimes was not handled correctly.
    - Fix Malcolm ISO not correctly detecting if it's in a live boot ISO environment or installed mode.
    - Restart live Zeek instances with `zeekctl deploy` instead of `zeekctl restart`.
* Configuration changes (in [environment variables](https://cisagov.github.io/Malcolm/docs/malcolm-config.html#MalcolmConfigEnvVars) in [`./config/`](https://github.com/idaholab/Malcolm/blob/v24.04.0/config))
    - `ARKIME_QUERY_ALL_INDICES` in [`arkime.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/arkime.env.example#L9-L11) can be set to control the [`queryAllIndices` setting](https://arkime.com/settings#queryAllIndices) in Arkime's `config.ini`.
    - `DASHBOARDS_PREFIX` in [`dashboards-helper.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/dashboards-helper.env.example#L3C1-L4C19) has been added for idaholab#455 (see above in **Features and Enhancements**).
    - `LOGSTASH_NETBOX_ENRICHMENT_DATASETS` in [`logstash.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/logstash.env.example#L13) has been changed to include `zeek.dhcp`, `zeek.dns`, and `zeek.ntlm` to support idaholab#415 (see above in **Features and Enhancements**).
    - `LOGSTASH_ZEEK_IGNORED_LOGS` in [`logstash.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/logstash.env.example#L15) has been changed to remove `capture_loss` and `stats` so that those diagnostic Zeek logs can be parsed without the user having to manually change this variable.
    - `ZEEK_CRON` has been removed from [`zeek-live.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/zeek-live.env.example) and `ZEEK_INTEL_REFRESH_CRON_EXPRESSION` was removed from [`zeek.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/zeek.env.example) and moved to the "offline" version of the container in [`zeek-offline.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/zeek-offline.env.example#L17-L19) for idaholab#456.
    - `EXTRACTED_FILE_PRUNE_THRESHOLD_MAX_SIZE`, `EXTRACTED_FILE_PRUNE_THRESHOLD_TOTAL_DISK_USAGE_PERCENT`, and `EXTRACTED_FILE_PRUNE_INTERVAL_SECONDS` were added to [`zeek.env`](https://github.com/idaholab/Malcolm/blob/bceee4616dd5676a010a3dd7b0410856257948e8/config/zeek.env.example#L32-L37) for idaholab#453. See a new section of documentation on [Managing disk usage](https://cisagov.github.io/Malcolm/docs/malcolm-config.html#DiskUsage) for more information about these and similar settings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
carving Relating to carving (extraction) of files from traffic and the scanning of those files enhancement New feature or request sensor For issues dealing with the Hedgehog OS capture sensor UI Relating to general UI experience
Projects
Status: Released
Development

No branches or pull requests

1 participant