Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[slack] Choose the correct value for the oldest param #8217

Merged
merged 14 commits into from Nov 7, 2023

Conversation

chrisberkhout
Copy link
Contributor

@chrisberkhout chrisberkhout commented Oct 16, 2023

Proposed commit message

[slack] Choose the correct value for the `oldest` param

Choosing the correct value means:

- Use the chosen `oldest` value for all requests in a pagination
  sequence. 
- Choose the `oldest` value for a new pagination sequence based on
  results from the first page of the previous sequence.
- Save the `oldest` value in cursor data in order to facilitate
  resumption of an interrupted pagination sequence.

The `max_date_seen_so_far` request parameter is added as a workaround
for the HTTP JSON input's `cursor:` section not being able to read
existing cursor values when generating new values. It will be ignored
by the Slack API.

Another change was to use the saved `next_cursor` value to indicate
whether a pagination sequence is in progress, rather than spreading
this information across multiple values.

API behavior and test scenario explanation

The system test scenario is updated here to match the recently observed behavior of the live API. It attempts to cover all the pagination logic required for non-interrupted operation.

Relevant behavior of the live API includes:

  • The API needs the oldest parameter in order to know where to stop, even if a cursor parameter is given. In slack: fix handling of oldest parameter when paginating #8169 the oldest parameter was included in all requests, but further changes are required to ensure that its value is correct and does not change for a given sequence of pages.
  • The .response_metadata.next_cursor field is always returned, as an empty string if there is no further page.
  • It is common for multiple entries to have the same date_create timestamp. Often this is a few entries, sometimes it is many more.
  • The oldest parameter is inclusive, so when a new sequence sets the oldest value to the maximum date_create of the previous sequence, there will be repeated results.
  • Internally, a cursor identifies a particular date_create and id. The entries returned are ordered by date_create descending, but for a given date_create, the id values are not in order.

The new scenario is summarized in the following table:

Screenshot from 2023-10-17 09-54-13

The table's time N and identifier NNN match the system test data's timestamp suffix and ID prefix. The table shows which entry a cursor points to, but cursor values in the test data are random. Green entries are new records for ingestion, red entries are repeated records that we should avoid duplicating.

Without the new fix that scenario fails, having missed entries 3-813 and 3-212.

Things needed for this test to pass:

  • Correct cursor handling (seems to work already).
  • The correct oldest value should be used for each sequence of pages.
  • Duplicates should be avoided (currently done by setting _id to the fingerprinted id).
Example request logs from successful system test execution
{
  "@timestamp": "2023-11-02T16:59:14.095Z",
  "message": "HTTP request",
  "url.query": "limit=2&max_date_seen_so_far=0&oldest=1681664354",
  "http.response.body.content": null
}
{
  "@timestamp": "2023-11-02T16:59:14.096Z",
  "message": "HTTP response",
  "url.query": null,
  "http.response.body.content": [
    {
      "id": "4433a45b-6c7d-8900-e12f-3456789gh0i1",
      "date_create": 1683836273
    },
    {
      "id": "313b13e3-28a3-41f0-9ace-a20952def3a0",
      "date_create": 1683836273
    }
  ]
}
{
  "@timestamp": "2023-11-02T16:59:16.070Z",
  "message": "HTTP request",
  "url.query": "cursor=YXNkZmFzZGZhc2Rm&limit=2&max_date_seen_so_far=1683836273&oldest=1681664354",
  "http.response.body.content": null
}
{
  "@timestamp": "2023-11-02T16:59:16.072Z",
  "message": "HTTP response",
  "url.query": null,
  "http.response.body.content": [
    {
      "id": "80928060-1659-4b27-ad55-fdba12e3a7b1",
      "date_create": 1683836272
    },
    {
      "id": "1885fb41-c67c-4cf5-a5c4-d90cb58dd5f9",
      "date_create": 1683836271
    }
  ]
}
{
  "@timestamp": "2023-11-02T16:59:26.077Z",
  "message": "HTTP request",
  "url.query": "limit=2&max_date_seen_so_far=1683836273&oldest=1683836273",
  "http.response.body.content": null
}
{
  "@timestamp": "2023-11-02T16:59:26.080Z",
  "message": "HTTP response",
  "url.query": null,
  "http.response.body.content": [
    {
      "id": "4216a45b-6c7d-8900-e12f-3456789gh0i1",
      "date_create": 1683836274
    },
    {
      "id": "412d13e3-28a3-41f0-9ace-a20952def3a0",
      "date_create": 1683836273
    }
  ]
}
{
  "@timestamp": "2023-11-02T16:59:26.081Z",
  "message": "HTTP request",
  "url.query": "cursor=GytjmKHF5hFmty&limit=2&max_date_seen_so_far=1683836274&oldest=1683836273",
  "http.response.body.content": null
}
{
  "@timestamp": "2023-11-02T16:59:26.083Z",
  "message": "HTTP response",
  "url.query": null,
  "http.response.body.content": [
    {
      "id": "81328070-1659-4b27-ad55-fdba12e3a7b1",
      "date_create": 1683836273
    },
    {
      "id": "2125fb41-c67c-4cf5-a5c4-d90cb58dd5f9",
      "date_create": 1683836273
    }
  ]
}
{
  "@timestamp": "2023-11-02T16:59:26.084Z",
  "message": "HTTP request",
  "url.query": "cursor=Hi4JrvZZnX3IGHE1&limit=2&max_date_seen_so_far=1683836274&oldest=1683836273",
  "http.response.body.content": null
}
{
  "@timestamp": "2023-11-02T16:59:26.085Z",
  "message": "HTTP response",
  "url.query": null,
  "http.response.body.content": [
    {
      "id": "4433a45b-6c7d-8900-e12f-3456789gh0i1",
      "date_create": 1683836273
    },
    {
      "id": "313b13e3-28a3-41f0-9ace-a20952def3a0",
      "date_create": 1683836273
    }
  ]
}
{
  "@timestamp": "2023-11-02T16:59:36.075Z",
  "message": "HTTP request",
  "url.query": "limit=2&max_date_seen_so_far=1683836274&oldest=1683836274",
  "http.response.body.content": null
}
{
  "@timestamp": "2023-11-02T16:59:36.077Z",
  "message": "HTTP response",
  "url.query": null,
  "http.response.body.content": [
    {
      "id": "22328080-1659-4b27-ad55-fdba12e3a7b1",
      "date_create": 1683836275
    },
    {
      "id": "4216a45b-6c7d-8900-e12f-3456789gh0i1",
      "date_create": 1683836274
    }
  ]
}
{
  "@timestamp": "2023-11-02T16:59:46.075Z",
  "message": "HTTP request",
  "url.query": "limit=2&max_date_seen_so_far=1683836275&oldest=1683836275",
  "http.response.body.content": null
}
{
  "@timestamp": "2023-11-02T16:59:46.077Z",
  "message": "HTTP response",
  "url.query": null,
  "http.response.body.content": [
    {
      "id": "6465fc41-c67c-4cf5-a5c4-d90cb58dd5f9",
      "date_create": 1683836276
    },
    {
      "id": "22328080-1659-4b27-ad55-fdba12e3a7b1",
      "date_create": 1683836275
    }
  ]
}
{
  "@timestamp": "2023-11-02T16:59:56.076Z",
  "message": "HTTP request",
  "url.query": "limit=2&max_date_seen_so_far=1683836276&oldest=1683836276",
  "http.response.body.content": null
}
{
  "@timestamp": "2023-11-02T16:59:56.077Z",
  "message": "HTTP response",
  "url.query": null,
  "http.response.body.content": [
    {
      "id": "6465fc41-c67c-4cf5-a5c4-d90cb58dd5f9",
      "date_create": 1683836276
    }
  ]
}

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally

# build and test
elastic-package build -v
elastic-package test system -v

# find agent logs and copy to current directory
docker exec elastic-package-stack-elastic-agent-1 find /usr/share/elastic-agent/state/data/logs -type f |\
  xargs -I{} docker cp elastic-package-stack-elastic-agent-1:{} .

# print request log fields relevant to pagination
cat http-*.ndjson | jq '{"@timestamp",message,"url.query","http.response.body.content": (try (."http.response.body.content" | fromjson | .entries|map({id,date_create})) catch null)}'

Related issues

@elasticmachine
Copy link

elasticmachine commented Oct 16, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-03T11:05:12.687+0000

  • Duration: 16 min 17 sec

Test stats 🧪

Test Results
Failed 0
Passed 5
Skipped 0
Total 5

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Oct 16, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (1/1) 💚
Files 100.0% (1/1) 💚
Classes 100.0% (1/1) 💚
Methods 100.0% (14/14) 💚
Lines 98.721% (386/391)
Conditionals 100.0% (0/0) 💚

@chrisberkhout chrisberkhout marked this pull request as ready for review November 2, 2023 20:35
@chrisberkhout chrisberkhout requested a review from a team as a code owner November 2, 2023 20:35
@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@efd6
Copy link
Contributor

efd6 commented Nov 2, 2023

The request parameter max_date_seen_so_far is intended to be ignored by the API, and is added because the in the HTTP JSON input's cursor: section it is not possible to read existing cursor values.

Do we know that Slack's API will tolerate this param being present?

Copy link
Contributor

@bhapas bhapas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test this against a slack instance to see if the API tolerates the new param

@chrisberkhout chrisberkhout changed the title [slack] Fix pagination logic [slack] Choose the correct value for the oldest param Nov 3, 2023
@chrisberkhout
Copy link
Contributor Author

@bhapas @efd6

The extra parameter will be okay.

Infosec ran this request with a real api token and confirmed that it does return results:

api_token="xoxp-1234567890"
url="https://api.slack.com/audit/v1/logs"
curl --get "$url" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $api_token" \
  -d "cursor=" \
  -d "limit=20" \
  -d "oldest=1680000000" \
  -d "max_date_seen_so_far=1234"

@efd6
Copy link
Contributor

efd6 commented Nov 6, 2023

@chrisberkhout Thanks for checking that. Would you mind including a note in the commit message that this technique is used to stash the value?

@chrisberkhout chrisberkhout merged commit 5ccced5 into elastic:main Nov 7, 2023
4 checks passed
@chrisberkhout chrisberkhout deleted the slack-pagination-fix branch November 7, 2023 09:14
Copy link
Contributor

@bhapas bhapas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry was a bit late..!

@elasticmachine
Copy link

Package slack - 1.15.1 containing this change is available at https://epr.elastic.co/search?package=slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Integration:Slack
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slack: cursor.pagination_finished template logic is unclear or incorrect
4 participants