Skip to content

[O365] - Prune stale listing URLs from cursor to recover from long outages#18510

Merged
ShourieG merged 6 commits intoelastic:mainfrom
ShourieG:bugfix/o365_#7097
Apr 20, 2026
Merged

[O365] - Prune stale listing URLs from cursor to recover from long outages#18510
ShourieG merged 6 commits intoelastic:mainfrom
ShourieG:bugfix/o365_#7097

Conversation

@ShourieG
Copy link
Copy Markdown
Contributor

Type of change

  • Bug

Proposed commit message

packages/o365: prune stale listing URLs from cursor to recover from long outages

A listing URL queued in cursor.todo_links before a prolonged outage
can age past the Management API's 7-day startTime window while parked
there, producing a permanent AF20055 retry loop: the listing-error
branch cannot evict it because its map-shaped error return is treated
by the CEL runtime as a freeze signal that discards cursor mutations.

Add a pre-flight filter that drops listing URLs whose embedded
startTime is older than now - maximum_age, tolerating both startTime
and starttime spellings (NextPageUri responses have been observed
with the lowercase form). Includes a system test that queues a
NextPageUri with a 2019 startTime against a 1h maximum_age and
asserts it is not followed.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

…'s accepted window so that prolonged periods without successful fetches (e.g. auth failures during a credential rotation, or agent downtime longer than seven days) self-heal on resume instead of looping on AF20055.
@ShourieG ShourieG self-assigned this Apr 18, 2026
@ShourieG ShourieG requested a review from a team as a code owner April 18, 2026 06:39
@ShourieG ShourieG added Integration:o365 Microsoft Office 365 bugfix Pull request that fixes a bug issue Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Team:Security-Cloud Services Security Data Experience - Cloud Services team [elastic/cloud-services] labels Apr 18, 2026
@elasticmachine
Copy link
Copy Markdown

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@ShourieG ShourieG requested review from chrisberkhout, efd6 and kcreddy and removed request for kcreddy April 18, 2026 06:43
@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

Comment on lines +176 to +213
).as(state,
// Drop listing links whose startTime has aged past the API's accepted
// window. The Office 365 Management API rejects requests whose startTime
// is more than 7 days in the past with AF20055; a link that was queued
// before a prolonged period without successful fetches (agent downtime,
// auth failures, credential rotation gaps, etc.) may have aged past
// that boundary while parked in the cursor. The listing-error branch
// cannot evict it reliably because its map-shaped error return is
// treated by the runtime as a freeze signal that discards cursor
// mutations, so the only way out is to prune up front. Once pruned,
// the bottom branch generates a fresh link clamped to the valid
// window on the next iteration.
(now() - duration(state.base.maximum_age)).as(oldest_allowed_start,
state.with(
{
"cursor": state.cursor.with(
{
"todo_links": state.cursor.todo_links.filter(link,
link.parse_url().RawQuery.parse_query().as(q,
// Listing URLs generated by this program use "startTime", but
// NextPageUri values returned by the API have been seen with
// the lowercase "starttime" variant (see #15325), so accept
// either. If no startTime can be extracted, keep the link
// and let the API itself decide.
q.?startTime.orValue(q.?starttime.orValue([])).as(st,
(st.size() == 0) ?
true
:
st[0].parse_time(time_layout.RFC3339) >= oldest_allowed_start
)
)
),
}
),
}
)
)
).as(state,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be folded into the previous block.

state.base.content_types.split(",").map(t, t.trim_space()).as(configured_types,
  state.with(
    {
      "cursor": state.?cursor.orValue({}).with(
        {
          // Listing links have the content type embedded in the URL query
          // string, so links for removed types would produce API errors
          // if not pruned here.
          //
          // Links whose startTime has aged past maximum_age are also
          // pruned. The Office 365 Management API rejects requests whose
          // startTime is more than 7 days in the past with AF20055; a
          // link queued before a prolonged period without successful
          // fetches (agent downtime, auth failures, credential rotation
          // gaps, etc.) may have aged past that boundary while parked in
          // the cursor. The listing-error branch cannot evict it reliably
          // because its map-shaped error return is treated by the runtime
          // as a freeze signal that discards cursor mutations. Once
          // pruned, the bottom branch generates a fresh link clamped to
          // the valid window on the next iteration.
          "todo_links": state.?cursor.todo_links.orValue([]).filter(link,
            configured_types.exists(ct, link.to_lower().contains("contenttype=" + ct.to_lower()))
            &&
            link.parse_url().RawQuery.parse_query().as(q,
              // Listing URLs generated by this program use "startTime",
              // but NextPageUri values returned by the API have been seen
              // with the lowercase "starttime" variant (see #15325), so
              // accept either. If no startTime can be extracted, keep the
              // link and let the API itself decide.
              q.?startTime.orValue(q.?starttime.orValue([])).as(st,
                (st.size() == 0) ?
                  true
                :
                  now() - st[0].parse_time(time_layout.RFC3339) <= duration(state.base.maximum_age)
              )
            )
          ),
          "todo_content": state.?cursor.todo_content.orValue([]).filter(item,
            configured_types.exists(ct, ct.to_lower() == item.?contentType.orValue("").to_lower())
          ),
        }
      ),
    }
  )
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efd6, made the suggested changes

Comment thread packages/o365/data_stream/audit/agent/stream/cel.yml.hbs Outdated
@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

cc @ShourieG

@ShourieG ShourieG merged commit e69c276 into elastic:main Apr 20, 2026
9 checks passed
@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

Package o365 - 3.8.1 containing this change is available at https://epr.elastic.co/package/o365/3.8.1/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes a bug issue Integration:o365 Microsoft Office 365 Team:Security-Cloud Services Security Data Experience - Cloud Services team [elastic/cloud-services] Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants