Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ti_misp] Keep the same timestamp for later pages #6649

Merged
merged 4 commits into from Jun 22, 2023

Conversation

chrisberkhout
Copy link
Contributor

@chrisberkhout chrisberkhout commented Jun 21, 2023

What does this PR do?

For a given sequence of page requests, the later requests should use the same timestamp parameter as the initial page in the sequence.

This is achieved by setting a query string parameter in the initial request that will be ignored by MISP, and using that to set the correct timestamp in the response.pagination transforms. This is a workaround for the fact that reponse.pagination transforms aren't provided direct access to the last request, but can access the URL via the last response.

Details

When fetching data from MISP, we start at an earlier point (120 hours by default, 10 mins in testing) and page forward from that point. As items are received, item timstamps are recorded in httpjson cursor data. Upon restart that cursor data is used as the new start point.

The bug was that the timestamp was being reset on every page. For example, looking at agent logs for system tests before the change we see:

cat threat.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}'

{"@timestamp":"2023-06-21T13:20:32.855Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353032\"}","transaction.id":"I4KRQQ1GLTL1E-1"}
{"@timestamp":"2023-06-21T13:20:32.858Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353032\"}","transaction.id":"I4KRQQ1GLTL1E-2"}
{"@timestamp":"2023-06-21T13:20:35.695Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"3\",\"returnFormat\":\"json\",\"timestamp\":\"1687353035\"}","transaction.id":"I4KRQQ1GLTL1E-3"}

cat threat_attributes.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}'

{"@timestamp":"2023-06-21T13:21:09.893Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353069\"}","transaction.id":"0HCM021PLTL1E-1"}
{"@timestamp":"2023-06-21T13:21:12.740Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353072\"}","transaction.id":"0HCM021PLTL1E-2"}

The initial timestamp in the request body is 10 mins before that request was made, as expected. However, for later pages the timestamp is reset to a new value 10 minutes before the new request.

Aside: for the threat datastream, page 2 has the same timestamp as page 1. This is because it is requested just 3 milliseconds after page 1. The 3rd page is requested after 3 second delay (as is the 2nd page for the threat_attributes datastream) so we see a new timestamp value there. I'm not sure why the delays aren't all similar. Maybe a 3 second delay is triggered by a threshold that isn't reached by the 1st page for the threat datastream, which does have less data. Changing the run order doesn't seem to affect this.

After the change the logs look like this:

cat threat.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}'

{"@timestamp":"2023-06-21T13:26:20.952Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-1"}
{"@timestamp":"2023-06-21T13:26:20.955Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-2"}
{"@timestamp":"2023-06-21T13:26:23.816Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"3\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-3"}
 
cat threat_attributes.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}'

{"@timestamp":"2023-06-21T13:26:58.167Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353418\"}","transaction.id":"AUK0G7SALTL1E-1"}
{"@timestamp":"2023-06-21T13:27:01.011Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353418\"}","transaction.id":"AUK0G7SALTL1E-2"}
Details of verifying that MISP doesn't break with the additional query string parameter

MISP setup

  • Get an OVA virtual machine image from https://vm.misp-project.org/latest/ and open it in VirtualBox.
  • Check port forwarding settings to find the right host URL (https://localhost:8443/).
  • Log in and set a new password (required).
  • Activate some data feeds and initiate fetching.
  • Add an auth key.
  • Update baseUrl settings in the UI so that the REST API helper page will work.
  • Use the REST API page to build a query and get curl commands.

Verify that the page 1 response is the same with and without the query string parameter

curl \
 -d '{"returnFormat":"json","page":1,"limit":2,"timestamp":"1640998800"}' \
 -H "Authorization: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
 -H "Accept: application/json" \
 -H "Content-type: application/json" \
 -X POST https://localhost:8443/events/restSearch?timestamp=1640998800 --insecure \
 | jq 'del(.response[].Event | .Attribute, .EventReport, .Galaxy, .GalaxyCluster, .Object, .Org, .Orgc, .RelatedEvent, .Tag)'
curl \
 -d '{"returnFormat":"json","page":1,"limit":2,"timestamp":"1640998800"}' \
 -H "Authorization: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
 -H "Accept: application/json" \
 -H "Content-type: application/json" \
 -X POST https://localhost:8443/events/restSearch --insecure \
 | jq 'del(.response[].Event | .Attribute, .EventReport, .Galaxy, .GalaxyCluster, .Object, .Org, .Orgc, .RelatedEvent, .Tag)'

They do indeed yield the same response:

{
  "response": [
    {
      "Event": {
        "id": "1076",
        "orgc_id": "8",
        "org_id": "1",
        "date": "2018-08-17",
        "threat_level_id": "1",
        "info": "Turla Outlook White Paper",
        "published": true,
        "uuid": "5b773e07-e694-458b-b99c-27f30a016219",
        "attribute_count": "53",
        "analysis": "0",
        "timestamp": "1684790147",
        "distribution": "3",
        "proposal_email_lock": false,
        "locked": false,
        "publish_timestamp": "1687283359",
        "sharing_group_id": "0",
        "disable_correlation": false,
        "extends_uuid": "",
        "protected": null,
        "ShadowAttribute": [],
        "CryptographicKey": []
      }
    },
    {
      "Event": {
        "id": "1226",
        "orgc_id": "3",
        "org_id": "1",
        "date": "2022-01-13",
        "threat_level_id": "2",
        "info": "CYBERCOM_Malware_Alert -  MuddyWater has been seen using a variety of techniques to maintain access to victim networks.",
        "published": true,
        "uuid": "ed46f822-41e6-4dca-a1c5-ad768306bfe9",
        "attribute_count": "119",
        "analysis": "0",
        "timestamp": "1642082225",
        "distribution": "3",
        "proposal_email_lock": false,
        "locked": false,
        "publish_timestamp": "1687283686",
        "sharing_group_id": "0",
        "disable_correlation": false,
        "extends_uuid": "",
        "protected": null,
        "ShadowAttribute": [],
        "CryptographicKey": []
      }
    }
  ]
}

Verify that the page 2 response is the same with and without the query string parameter

curl \
 -d '{"returnFormat":"json","page":2,"limit":2,"timestamp":"1640998800"}' \
 -H "Authorization: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
 -H "Accept: application/json" \
 -H "Content-type: application/json" \
 -X POST https://localhost:8443/events/restSearch?timestamp=1640998800 --insecure \
 | jq 'del(.response[].Event | .Attribute, .EventReport, .Galaxy, .GalaxyCluster, .Object, .Org, .Orgc, .RelatedEvent, .Tag)'
curl \
 -d '{"returnFormat":"json","page":2,"limit":2,"timestamp":"1640998800"}' \
 -H "Authorization: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
 -H "Accept: application/json" \
 -H "Content-type: application/json" \
 -X POST https://localhost:8443/events/restSearch --insecure \
 | jq 'del(.response[].Event | .Attribute, .EventReport, .Galaxy, .GalaxyCluster, .Object, .Org, .Orgc, .RelatedEvent, .Tag)'

They do indeed yield the same response:

{
  "response": [
    {
      "Event": {
        "id": "1227",
        "orgc_id": "3",
        "org_id": "1",
        "date": "2022-01-16",
        "threat_level_id": "4",
        "info": "MSFT - MSTIC - Destructive malware targeting Ukrainian organizations",
        "published": true,
        "uuid": "8cc5335e-915b-4e16-837d-49143e6987b4",
        "attribute_count": "20",
        "analysis": "2",
        "timestamp": "1642348752",
        "distribution": "3",
        "proposal_email_lock": false,
        "locked": false,
        "publish_timestamp": "1687283686",
        "sharing_group_id": "0",
        "disable_correlation": false,
        "extends_uuid": "",
        "protected": null,
        "ShadowAttribute": [],
        "CryptographicKey": []
      }
    },
    {
      "Event": {
        "id": "1228",
        "orgc_id": "3",
        "org_id": "1",
        "date": "2022-01-28",
        "threat_level_id": "4",
        "info": "Disinformation - The GRU’s galaxy of Russian-speaking websites",
        "published": true,
        "uuid": "4b825576-e9e3-4f7b-a9a7-e0ad91550ea2",
        "attribute_count": "1345",
        "analysis": "2",
        "timestamp": "1643358520",
        "distribution": "3",
        "proposal_email_lock": false,
        "locked": false,
        "publish_timestamp": "1687283694",
        "sharing_group_id": "0",
        "disable_correlation": false,
        "extends_uuid": "",
        "protected": null,
        "ShadowAttribute": [],
        "CryptographicKey": []
      }
    }
  ]
}

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • Check agent logs of system tests to verify the initial page timestamp is reused for later pages
  • Check that MISP accepts the extra query string parameter

Related issues

@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@elasticmachine
Copy link

elasticmachine commented Jun 21, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-06-22T15:53:11.192+0000

  • Duration: 15 min 4 sec

Test stats 🧪

Test Results
Failed 0
Passed 15
Skipped 0
Total 15

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Jun 21, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (2/2) 💚
Files 100.0% (2/2) 💚
Classes 100.0% (2/2) 💚
Methods 100.0% (30/30) 💚
Lines 86.536% (617/713) 👎 -8.004
Conditionals 100.0% (0/0) 💚

@chrisberkhout chrisberkhout force-pushed the ti-misp-succesive-pages-use-initial-timestamp branch from e41e67f to 3dcf5bf Compare June 22, 2023 15:52
@efd6 efd6 merged commit 6356cc0 into main Jun 22, 2023
4 checks passed
@efd6
Copy link
Contributor

efd6 commented Jun 22, 2023

@chrisberkhout Thank you for such a wonderful change description (retained sans the <details> section in the commit message).

@elasticmachine
Copy link

Package ti_misp - 1.16.1 containing this change is available at https://epr.elastic.co/search?package=ti_misp

@chrisberkhout chrisberkhout deleted the ti-misp-succesive-pages-use-initial-timestamp branch June 23, 2023 09:27
chrisberkhout added a commit to elastic/beats that referenced this pull request Feb 8, 2024
Update the HTTP JSON input configuration for the Threat Intel module's
misp fileset with pagination fixes that were done earlier in the
Agent-based MISP integration, in these PRs:

- Fix timestamp format sent to API
  elastic/integrations#6482

- Fix duplicate requests for page 1
  elastic/integrations#6495

- Keep the same timestamp for later pages
  elastic/integrations#6649

- Pagination fixes
  elastic/integrations#9073
mergify bot pushed a commit to elastic/beats that referenced this pull request Feb 8, 2024
Update the HTTP JSON input configuration for the Threat Intel module's
misp fileset with pagination fixes that were done earlier in the
Agent-based MISP integration, in these PRs:

- Fix timestamp format sent to API
  elastic/integrations#6482

- Fix duplicate requests for page 1
  elastic/integrations#6495

- Keep the same timestamp for later pages
  elastic/integrations#6649

- Pagination fixes
  elastic/integrations#9073

(cherry picked from commit b7fc69a)
mergify bot pushed a commit to elastic/beats that referenced this pull request Feb 8, 2024
Update the HTTP JSON input configuration for the Threat Intel module's
misp fileset with pagination fixes that were done earlier in the
Agent-based MISP integration, in these PRs:

- Fix timestamp format sent to API
  elastic/integrations#6482

- Fix duplicate requests for page 1
  elastic/integrations#6495

- Keep the same timestamp for later pages
  elastic/integrations#6649

- Pagination fixes
  elastic/integrations#9073

(cherry picked from commit b7fc69a)
chrisberkhout pushed a commit to elastic/beats that referenced this pull request Feb 9, 2024
…#37923)

[filebeat][threatintel] MISP pagination fixes (#37898)

Update the HTTP JSON input configuration for the Threat Intel module's
misp fileset with pagination fixes that were done earlier in the
Agent-based MISP integration, in these PRs:

- Fix timestamp format sent to API
  elastic/integrations#6482

- Fix duplicate requests for page 1
  elastic/integrations#6495

- Keep the same timestamp for later pages
  elastic/integrations#6649

- Pagination fixes
  elastic/integrations#9073
chrisberkhout pushed a commit to elastic/beats that referenced this pull request Feb 9, 2024
…#37924)

[filebeat][threatintel] MISP pagination fixes (#37898)

Update the HTTP JSON input configuration for the Threat Intel module's
misp fileset with pagination fixes that were done earlier in the
Agent-based MISP integration, in these PRs:

- Fix timestamp format sent to API
  elastic/integrations#6482

- Fix duplicate requests for page 1
  elastic/integrations#6495

- Keep the same timestamp for later pages
  elastic/integrations#6649

- Pagination fixes
  elastic/integrations#9073
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ti_misp] timestamp should not change while paginating [TI_MISP] Issue with .cursor.timestamp
4 participants