Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ti_misp] Keep the same timestamp for later pages (#6649)
For a given sequence of page requests, the later requests should use the same timestamp parameter as the initial page in the sequence. This is achieved by setting a query string parameter in the initial request that will be ignored by MISP, and using that to set the correct timestamp in the `response.pagination` transforms[1]. This is a workaround for the fact that `reponse.pagination` aren't provided direct access to the last request, but can access the URL via the last response. Details When fetching data from MISP, we start at an earlier point (120 hours by default, 10 mins in testing) and page forward from that point. As items are received, item timstamps are recorded in `httpjson` cursor data[2]. Upon restart that cursor data is used as the new start point. The bug was that the timestamp was being reset on every page. For example, looking at agent logs for system tests before the change we see: ``` cat threat.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}' {"@timestamp":"2023-06-21T13:20:32.855Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353032\"}","transaction.id":"I4KRQQ1GLTL1E-1"} {"@timestamp":"2023-06-21T13:20:32.858Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353032\"}","transaction.id":"I4KRQQ1GLTL1E-2"} {"@timestamp":"2023-06-21T13:20:35.695Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"3\",\"returnFormat\":\"json\",\"timestamp\":\"1687353035\"}","transaction.id":"I4KRQQ1GLTL1E-3"} cat threat_attributes.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}' {"@timestamp":"2023-06-21T13:21:09.893Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353069\"}","transaction.id":"0HCM021PLTL1E-1"} {"@timestamp":"2023-06-21T13:21:12.740Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353072\"}","transaction.id":"0HCM021PLTL1E-2"} ``` The initial timestamp in the request body is 10 mins before that request was made, as expected. However, for later pages the timestamp is reset to a new value 10 minutes before the new request. Aside: for the `threat` datastream, page 2 has the same timestamp as page 1. This is because it is requested just 3 milliseconds after page 1. The 3rd page is requested after 3 second delay (as is the 2nd page for the `threat_attributes` datastream) so we see a new timestamp value there. I'm not sure why the delays aren't all similar. Maybe a 3 second delay is triggered by a threshold that isn't reached by the 1st page for the `threat` datastream, which does have less data. Changing the run order doesn't seem to affect this. After the change the logs look like this: ``` cat threat.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}' {"@timestamp":"2023-06-21T13:26:20.952Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-1"} {"@timestamp":"2023-06-21T13:26:20.955Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-2"} {"@timestamp":"2023-06-21T13:26:23.816Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"3\",\"returnFormat\":\"json\",\"timestamp\":\"1687353380\"}","transaction.id":"4U9VMT41LTL1E-3"} cat threat_attributes.log.ndjson | jq -c 'select(.message=="HTTP request")|{"@timestamp","http.request.body.content","transaction.id"}' {"@timestamp":"2023-06-21T13:26:58.167Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"1\",\"returnFormat\":\"json\",\"timestamp\":\"1687353418\"}","transaction.id":"AUK0G7SALTL1E-1"} {"@timestamp":"2023-06-21T13:27:01.011Z","http.request.body.content":"{\"limit\":\"10\",\"page\":\"2\",\"returnFormat\":\"json\",\"timestamp\":\"1687353418\"}","transaction.id":"AUK0G7SALTL1E-2"} ``` [1]https://www.elastic.co/guide/en/beats/filebeat/8.8/filebeat-input-httpjson.html#response-pagination [2]https://www.elastic.co/guide/en/beats/filebeat/8.8/filebeat-input-httpjson.html#cursor
- Loading branch information