New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[slack] Fix 'oldest' API query param handling #6958
Conversation
f0fc052
to
8ae93e8
Compare
🌐 Coverage report
|
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
If logs are returned newest-to-oldest then our method of paginating won't work even if we change to using the timestamp of the last event returned. I think we would need to apply a fix similar to what was done for Google Workspace in #4982 to fix #4796. From the Slack docs:
If we can use a cursor provided by the API that would be ideal. This issue gives a second confirmation that cursor is supported slackapi/python-slack-sdk#1147. It won't fix the underlying ordering issue but it is a more accurate way to iterate through logs without getting duplicates. |
@andrewkroh We are using this cursor in our agent template - https://github.com/elastic/integrations/pull/6958/files#diff-7b5b9582e9bfd6a478411e73c6990e8565fb41d12d645cd785d121de5412bbfcR40-R42 So, A combination of In case of Google workspace in #4982 , I think there is no specific cursor value and As per discussion here , the PR is tested and is working fine to retrieve logs based on |
@bhapas and I talked via Zoom. Summarizing here:
|
I started reviewing this before seeing the new comments. Hope this still helps: This would be much easier if it was possible to request As of ccaf565 it will paginate from now to the oldest data or Based on examples I saw in the Slack docs, I suspect the cursor will only include an item ID and no other context from the request in which it is returned. If you're working with a live endpoint you can base64 decode the cursor to see what's in there. However, I still think Here's some logic that I think would work for resuming interrupted pagination, both for the initial fetch for later:
It'd take some care to translate that into |
This is more like client side filtering that we try to achieve using cursor. But I think it is still good to get only
Yes , the oldest shall still be
Please let me know if I am not clear here :-) |
Agreed. Using
Yes, that workflow looks right to me and matches my earlier suggestion but using
|
True. But the
Not 100% sure how the filtering works here.. So would like to test out this scenario with the real API |
Right, that might be be a good way to keep hold of it. In #6649 I ended up adding an ignored query string parameter just to get access during pagination to a value set for the original request. |
Looking through the API docs again, it seems that the This might be reconfirmed by looking at pagination docs - https://api.slack.com/docs/pagination#cursors - The |
Yeah, the docs are ambiguous. On the Audit Logs API page it does say:
I assumed they probably added cursor-based pagination but only partially updated the docs. There is some other code on Github that uses the same audit logs endpoint and uses cursors: |
@bhapas, As I mentioned in #6958 (comment), the conversation in slackapi/python-slack-sdk#1147 indicates that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the logic:
- Tidying up:
last_timestamp
is stored in cursor data but not used. - Resuming a sequence: Rather than using
last_timestamp
, it'd be better to store.last_response.body.response_metadata.next_cursor
in cursor data and use that for resuming a pagination sequence. That would mean using that cursor inrequest.transforms
. With that there,response.pagination
would still need be defined, but it wouldn't have to set any different values. - Endpoint of a new sequence:
next_oldest_date
will get stored after every non-empty page, but to begin a 2nd or later pagination sequence, we want the.first_event.date_create
from the first page of the last sequence (or probablylatest
from that sequence if it contained no events). - Params for later pages in a sequence: In
response.pagination
, if thenext_cursor
value only indicates a start point for the next page as I suspect (to be verified one way or the other), you'll need to keepoldest
around. It may be okay to deletelatest
, or it may be more convenient to keep it around.
To review, it'd be helpful to have see an example sequence of requests, maybe from logs, to see how the params get preserved / updated at each step.
Done
The
Agree. For the pagination requests , Just the
As I read it in the API docs, Cursor-paginated methods accept cursor and limit parameters. So it should be safe to consider the
I would love to as well. But the
|
Initial request would be Next request in pagination would be Resuming broken pagination request would look like
If pagination is complete, then the next request after interval would be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example, the values for oldest
and latest
should be swapped:
If pagination is complete, then the next request after interval would be
GET /audit/v1/logs?latest=<First-Event-Time-In-Last-Request>&limit=2&oldest=<now>
Functionality
In setting url.params.oldest
and url.params.latest
, it looks like the intention is to have a default value on the first request, another value for the start of a new sequence (for latest
it's the same value), and an empty string for pages that aren't the first in a sequence. There are a couple of issues with that:
- Currently the empty string part will actually evaluate to
''
(two single quotes). The else branch could be removed, or the template changed to[[- '' -]]
to get the empty string, however... - If the
value:
part returns an empty string, it will use the default, even if it's not the initial request. I think it would actually be okay for botholdest
andlatest
, but it doesn't match the plan. For requests with a cursor, if it's not resuming from interruption, theoldest
andlatest
will be removed at 73-76, but that won't be done when resuming. This logic could be more simple and correct.
There should be a value for next_oldest_date
for when there were no events in a sequence. Ideally it'll be the now
of the request, with some padding (e.g. 5 mins back).
Potential improvements that don't change functionality
- It's probably fine to never set
url.params.latest
, or to always set it to(now).Unix
. cursor.last_cursor
could be calledcursor.next_cursor
. It came from the last request, but it was callednext_cursor
in the response data and it hasn't been used yet.- For setting values in the
cursor:
section,fail_on_template_error: true
isn't an option (seecursor
docs). pagination_finished
andlast_cursor
(ornext_cursor
) could be combined. If so, settingignore_empty_value: false
may be necessary.- White space trimming with
-
is redundant on line 25, so I'd remove that. On lines 37, 39, 52, 54 it's not technically needed because the preceding and following lines will trim across the line boundaries, but I think it's better style to have it there.
Assumptions to validate
- An empty cursor parameter will be ignored.
- A request with a cursor will not provide a
next_cursor
if it's reachedoldest
from the initial request. - Cursor values are unlikely to expire in minutes or hours (true if it contains an item ID, false if it's some kind of query reference or includes expiry information).
Functionality changes
Improvements
Fixed all these improvements Test scenarios
Assumptions validated
|
e411220
to
ca5e521
Compare
The last change seems good (only setting I think this is a working version, given the assumption that a cursor knows when to stop. I think that's still to be validated. The documentation you quoted makes it sound like it'll stop but consider this scenario:
Did the API decide there was "no further results to retrieve" after 200 days of data, or after the the full 5 years of data? One non-functional point: if |
The API will create a
This might still be needed in case the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, the assumption can be validated against the live API as follows:
- Set
initial_interval
to a shorter time duration than available historical data. - Set
limit
to a smaller number of items than available historical data. - Ensure that fetching of historical data is limited to
initial_interval
.
Package slack - 1.10.1 containing this change is available at https://epr.elastic.co/search?package=slack |
* Fix oldest API query param handling * Remove oldest and latest params in pagination request * Fix the agent config * Address pr comments * Modify agent * Fix agent config and added system test * Remove default in next_oldest_date
What does this PR do?
This PR fixes the way
oldest
query parameter is handled. According to Slack API docs , whenlatest
oldest
query params are passed in the API request, the logs are returned indescending order
meaningmost to least recent
.But the agent template is setting
first_event.date_create
into theoldest
param after the first request is made which means for the second request theoldest
query param has thedate_create
of the most recent event. Thereby the old data is never retrieved.This PR preserves the
oldest
parameter and passes thedate_create
into thelatest
param for the next request so that the events are pulled as per thelimit
config.Checklist
changelog.yml
file.Related issues