Skip to content

Commit

Permalink
[ti_misp] Keep the same timestamp for later pages (#6649)
Browse files Browse the repository at this point in the history
For a given sequence of page requests, the later requests should use the same
timestamp parameter as the initial page in the sequence.

This is achieved by setting a query string parameter in the initial request
that will be ignored by MISP, and using that to set the correct timestamp in
the `response.pagination` transforms[1]. This is a workaround for the fact that
`reponse.pagination` aren't provided direct access to the last request, but
can access the URL via the last response.

Details

When fetching data from MISP, we start at an earlier point (120 hours by
default, 10 mins in testing) and page forward from that point. As items are
received, item timstamps are recorded in `httpjson` cursor data[2]. Upon restart
that cursor data is used as the new start point.

The bug was that the timestamp was being reset on every page. For example,
looking at agent logs for system tests before the change we see:

```
cat threat.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}'

{"@timestamp":"2023-06-21T13:20:32.855Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353032\"}","transaction.id":"I4KRQQ1GLTL1E-1"}
{"@timestamp":"2023-06-21T13:20:32.858Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353032\"}","transaction.id":"I4KRQQ1GLTL1E-2"}
{"@timestamp":"2023-06-21T13:20:35.695Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"3\",\"returnFormat\":\"json\",\"timestamp\":\"1687353035\"}","transaction.id":"I4KRQQ1GLTL1E-3"}

cat threat_attributes.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}'

{"@timestamp":"2023-06-21T13:21:09.893Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353069\"}","transaction.id":"0HCM021PLTL1E-1"}
{"@timestamp":"2023-06-21T13:21:12.740Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353072\"}","transaction.id":"0HCM021PLTL1E-2"}
```

The initial timestamp in the request body is 10 mins before that request was
made, as expected. However, for later pages the timestamp is reset to a new
value 10 minutes before the new request.

Aside: for the `threat` datastream, page 2 has the same timestamp as page 1.
This is because it is requested just 3 milliseconds after page 1. The 3rd
page is requested after 3 second delay (as is the 2nd page for the
`threat_attributes` datastream) so we see a new timestamp value there. I'm
not sure why the delays aren't all similar. Maybe a 3 second delay is
triggered by a threshold that isn't reached by the 1st page for the `threat`
datastream, which does have less data. Changing the run order doesn't seem to
affect this.

After the change the logs look like this:

```
cat threat.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}'

{"@timestamp":"2023-06-21T13:26:20.952Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-1"}
{"@timestamp":"2023-06-21T13:26:20.955Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-2"}
{"@timestamp":"2023-06-21T13:26:23.816Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"3\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-3"}

cat threat_attributes.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}'

{"@timestamp":"2023-06-21T13:26:58.167Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353418\"}","transaction.id":"AUK0G7SALTL1E-1"}
{"@timestamp":"2023-06-21T13:27:01.011Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353418\"}","transaction.id":"AUK0G7SALTL1E-2"}
```

[1]https://www.elastic.co/guide/en/beats/filebeat/8.8/filebeat-input-httpjson.html#response-pagination
[2]https://www.elastic.co/guide/en/beats/filebeat/8.8/filebeat-input-httpjson.html#cursor
  • Loading branch information
chrisberkhout committed Jun 22, 2023
1 parent cee9a1f commit 6356cc0
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 1 deletion.
5 changes: 5 additions & 0 deletions packages/ti_misp/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "1.16.1"
changes:
- description: Keep the same timestamp for later pages in a pagination sequence.
type: bugfix
link: https://github.com/elastic/integrations/pull/6649
- version: "1.16.0"
changes:
- description: Ensure event.kind is correctly set for pipeline errors.
Expand Down
10 changes: 10 additions & 0 deletions packages/ti_misp/data_stream/threat/agent/stream/httpjson.yml.hbs
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ request.transforms:
target: body.timestamp
value: '[[.cursor.timestamp.Unix]]'
default: '[[ (now (parseDuration "-{{initial_interval}}")).Unix ]]'
- set:
# Ignored by MISP, set as a workaround to make it available in response.pagination.
target: url.params.timestamp
value: '[[.body.timestamp]]'

response.split:
target: body.response
Expand All @@ -59,6 +63,12 @@ response.pagination:
# Add 2 because the httpjson page counter is zero-based while the MISP page parameter starts at 1.
value: '[[if (ne (len .last_response.body.response) 0)]][[add .last_response.page 2]][[end]]'
fail_on_template_error: true
- set:
target: body.timestamp
value: '[[.last_response.url.params.Get "timestamp"]]'
- set:
target: url.params.timestamp
value: '[[.last_response.url.params.Get "timestamp"]]'
cursor:
timestamp:
value: '[[.last_event.Event.timestamp]]'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ request.transforms:
target: body.timestamp
value: '[[.cursor.timestamp.Unix]]'
default: '[[ (now (parseDuration "-{{initial_interval}}")).Unix ]]'
- set:
# Ignored by MISP, set as a workaround to make it available in response.pagination.
target: url.params.timestamp
value: '[[.body.timestamp]]'

response.split:
target: body.response.Attribute
Expand All @@ -51,6 +55,12 @@ response.pagination:
# Add 2 because the httpjson page counter is zero-based while the MISP page parameter starts at 1.
value: '[[if (ne (len .last_response.body.response.Attribute) 0)]][[add .last_response.page 2]][[end]]'
fail_on_template_error: true
- set:
target: body.timestamp
value: '[[.last_response.url.params.Get "timestamp"]]'
- set:
target: url.params.timestamp
value: '[[.last_response.url.params.Get "timestamp"]]'
cursor:
timestamp:
value: '[[.last_event.Attribute.timestamp]]'
Expand Down
2 changes: 1 addition & 1 deletion packages/ti_misp/manifest.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: ti_misp
title: MISP
version: "1.16.0"
version: "1.16.1"
release: ga
description: Ingest threat intelligence indicators from MISP platform with Elastic Agent.
type: integration
Expand Down

0 comments on commit 6356cc0

Please sign in to comment.