Skip to content

Conversation

@ezzamo
Copy link
Contributor

@ezzamo ezzamo commented Oct 31, 2025

Proposed commit message

eset_protect: Fix 404 errors caused by stale Response-Id caching in device_task stream

The ESET Automation API sometimes replies with 202 Accepted and a Response-Id header, 
which must be sent with the next poll request. Once the task completes and the API returns 
200 OK, that Response-Id becomes invalid. 

In version 1.10.0, the integration introduced caching of this Response-Id but did not 
clear it after a successful 200 OK response. As a result, subsequent requests continued 
to send the stale header, leading to recurring 404 Not Found errors and degraded input 
health.

This patch clears the cached Response-Id after each 200 OK, ensuring the header is only 
included when valid.

WHAT

This PR updates the CEL program for the device_task data stream to correctly handle the Response-Id header returned by the ESET API. When the API responds with 202 Accepted, it indicates that the result is still being prepared which includes a Response-Id that must be sent with the next request. Once the task completes and the API responds with 200 OK, that ID becomes invalid (as it is no longer needed). Previously, the integration continued sending this outdated ID with subsequent requests, even after a successful response, which caused the API to return 404 Not Found errors and led to degraded input health. With this patch, the cached Response-Id is cleared after every 200 OK, which ensures that only valid headers are used.

WHY

In production we got this error:
image

Version 1.10.0 of eset_protect introduced an unintended bug where the integration did not properly clear its cached state when transitioning from a 202 Accepted response to a 200 OK response. The error lies in this snippet added by the author of this commit:

        (
          resp.StatusCode == 202 ?
            state.with({
              "events": [{"message":"retry"}],
              "want_more": true,
              "cursor": {
                "response_id": resp.Header["Response-Id"][0]
              }
            })
          :
// rest of the code

This change added a new cursor field containing the Response-Id header, which became cached between requests.
However, the cached Response-Id was never cleared after a successful 200 OK response. As a result, subsequent polls kept sending a stale Response-Id header, which the ESET API couldn't process and as a result returned a 404 Not Found.

TL;DR

GET /v1/device_tasks in ESET Automation sometimes replies with 202 Accepted and a Response-Id header, clients must send that header on the next poll to retrieve the cached result. Once the service returns 200 OK, the header must no longer be sent.

In our production environment adding these changes and manually adding the package removed the recurring 404 Not Found errors and the input has remained healthy (for extended time).

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@ezzamo ezzamo requested a review from a team as a code owner October 31, 2025 11:09
@cla-checker-service
Copy link

cla-checker-service bot commented Oct 31, 2025

💚 CLA has been signed

@ezzamo
Copy link
Contributor Author

ezzamo commented Oct 31, 2025

CLA has been signed

@andrewkroh andrewkroh added Integration:eset_protect ESET PROTECT Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Oct 31, 2025
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@brijesh-elastic brijesh-elastic added the bugfix Pull request that fixes a bug issue label Oct 31, 2025
@ezzamo
Copy link
Contributor Author

ezzamo commented Oct 31, 2025

changed from 1.12.0 to 1.11.1 as its a bugfix

"page_size": state.page_size
"page_size": state.page_size,
"cursor": {
"response_id": null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary; the cursor object will replace the existing object, blatting out the response_id field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. A quick clarification on why this change is necessary.

The “blatting out” behaviour only takes effect when the CEL program returns a cursor object in that evaluation. In the current device_task code path for 200 OK the program does not return any cursor field at all, so the previous cursor is retained:

resp.StatusCode == 200 
? 
{
  "events": ...,
  "page_token": ...,
  "want_more": ...,
  "page_size": state.page_size
}
: 
...

The 202 Accepted does set a cursor with a Response-Id:

state.with({
  "cursor": { "response_id": resp.Header["Response-Id"][0] }
})

Because the 200 OK branch omitted cursor, the existing cursor object (including the cached response_id) persisted across polls. Subsequent requests kept sending that stale header and the backend returned 404 Not Found.

This PR fixes that by making the 200 OK path return:

"cursor": { "response_id": null }

which explicitly clears the stale value. This mirrors the detection data stream (see integrations/packages/eset_protect/data_stream/detection/agent/stream/cel.yml.hbs), where the 200 OK already resets the response_id. After applying this change the 404s disappeared and the input has remained healthy.

Copy link
Contributor

@efd6 efd6 Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reason that it's there, but the expression is more complex than is required to achieve that goal.

You have this:

-- src.cel --
state.with(
	{
		"cursor": {"response_id": null},
	}
)
-- data.json --
{
	"cursor": {
		"response_id": 42
	}
}
-- out.json --
{
	"cursor": {
		"response_id": null
	}
}

with the goal that there be no usable value in state.cursor.response_id, but this also works:

-- src.cel --
state.with(
	{
		"cursor": {},
	}
)
-- data.json --
{
	"cursor": {
		"response_id": 42
	}
}
-- out.json --
{
	"cursor": {}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Agreed "cursor": {} is sufficient here and simpler.

In device_task the 200 OK branch doesn’t need to preserve any other cursor fields, so I will update the code from:

"page_size": state.page_size,
"cursor": { "response_id": null }

to:

"page_size": state.page_size,
"cursor": {}

As you suggest, this should fully reset the cursor so no stale response-id gets sent on subsequent polls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change applied in: 52c7d53

@ezzamo ezzamo requested a review from efd6 November 2, 2025 21:36
@ezzamo ezzamo changed the title ESET Protect: clear cached response-id on 200 OK to prevent 404 on /v… ESET_Protect: clear cursor on 200 OK using empty cursor object Nov 2, 2025
@efd6
Copy link
Contributor

efd6 commented Nov 2, 2025

/test

@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@elasticmachine
Copy link

💚 Build Succeeded

@efd6 efd6 merged commit af695f4 into elastic:main Nov 2, 2025
7 checks passed
@elastic-vault-github-plugin-prod

Package eset_protect - 1.11.1 containing this change is available at https://epr.elastic.co/package/eset_protect/1.11.1/

tehbooom pushed a commit to tehbooom/integrations that referenced this pull request Nov 19, 2025
…evice_task stream (elastic#15831)

The ESET Automation API sometimes replies with 202 Accepted and a Response-Id header, 
which must be sent with the next poll request. Once the task completes and the API returns 
200 OK, that Response-Id becomes invalid. 

In version 1.10.0, the integration introduced caching of this Response-Id but did not 
clear it after a successful 200 OK response. As a result, subsequent requests continued 
to send the stale header, leading to recurring 404 Not Found errors and degraded input 
health.

This patch clears the cached Response-Id after each 200 OK, ensuring the header is only 
included when valid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes a bug issue Integration:eset_protect ESET PROTECT Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants