Skip to content

fix: skip broken cross-origin iframe hierarchy reads#3289

Open
schmayterling wants to merge 7 commits into
mobile-dev-inc:mainfrom
schmayterling:may/fix-cdp-crash
Open

fix: skip broken cross-origin iframe hierarchy reads#3289
schmayterling wants to merge 7 commits into
mobile-dev-inc:mainfrom
schmayterling:may/fix-cdp-crash

Conversation

@schmayterling
Copy link
Copy Markdown

Proposed changes

fixes a Maestro web hierarchy crash when cross-origin iframe enrichment hits transient browser state or malformed JS/CDP results.

see:

"status" : "FAILED",
  "error" : {
    "stackTrace" : [ {
      "methodName" : "fetchCrossOriginIframeContent",
      "fileName" : "CdpWebDriver.kt",
      "lineNumber" : 696,
      "className" : "maestro.drivers.CdpWebDriver"
    }, {
      "methodName" : "injectCrossOriginIframes",
      "fileName" : "CdpWebDriver.kt",
      "lineNumber" : 655,
      "className" : "maestro.drivers.CdpWebDriver"
    }, {
      "methodName" : "contentDescriptor",
      "fileName" : "CdpWebDriver.kt",
      "lineNumber" : 276,
      "className" : "maestro.drivers.CdpWebDriver"
    }, {
      "methodName" : "from-8JJjmZI",
      "fileName" : "ViewHierarchy.kt",
      "lineNumber" : 29,
      "className" : "maestro.ViewHierarchy$Companion"
    } ],
    "message" : "null cannot be cast to non-null type kotlin.collections.Map<kotlin.String, kotlin.Any>"
  }

from my understanding this usually happens after navigation, popup return, or iframe reload. the parent page is still usable, but a third-party iframe can be stale/closed/still loading/detached, or return null/malformed hierarchy data.

previously, that path used unsafe casts into Map<String, Any> and numeric fields, so one broken iframe could crash the whole hierarchy build.

this patch tries keeps the hierarchy path best-effort, see flow below

  1. parse js/cdp hierarchy payloads with checked maps/lists instead of unsafe casts
  2. reset selenium to default content before hierarchy and iframe reads
  3. preserve readable cross-origin iframe content
  4. keep the parent iframe node with empty children when iframe content is bad
  5. treat stale frame, missing frame, and closed window errors as transient during iframe enrichment
  6. keep the parent page hierarchy usable when a third-party iframe is broken

Testing

unit tests should cover:

  • null iframe content does not crash hierarchy construction
  • malformed iframe content does not crash hierarchy construction
  • transient iframe/window errors are swallowed and traversal continues
  • readable iframe content is still preserved and injected
  • malformed child nodes are skipped while valid siblings are preserved
  • missing iframe viewport params skip iframe content fetch

ran the following:

./gradlew :maestro-client:test --tests maestro.drivers.WebHierarchyTest -x :maestro-ios-driver:buildIosDriver
./gradlew :maestro-client:check -x :maestro-ios-driver:buildIosDriver
./gradlew detektMain

idt e2e is needed because there already is one for iframes

Issues fixed

@schmayterling schmayterling changed the title May/fix cdp crash fix: skip broken cross-origin iframe hierarchy reads May 14, 2026
@Fishbowler
Copy link
Copy Markdown
Contributor

keep the parent iframe node with empty children when iframe content is bad

I'm concerned that this would lead to flaky tests. If run an assertVisible and the DOM is the parent content, but skipping the iframes, isn't there a chance my step will fail, but if I rerun it'll succeed?

Should we be handling failures in iframes with retry logic and eventual errors rather than silently eliding parts of the DOM?

@schmayterling
Copy link
Copy Markdown
Author

keep the parent iframe node with empty children when iframe content is bad

I'm concerned that this would lead to flaky tests. If run an assertVisible and the DOM is the parent content, but skipping the iframes, isn't there a chance my step will fail, but if I rerun it'll succeed?

yeah, I think that can still happen for iframe-targeted assertions if the iframe is temporarily unavailable during that hierarchy snapshot. but yea the goal of this patch is to avoid crashing the entire hierarchy build when a third-party iframe is stale/broken/loading, and keep the rest of the page usable instead of failing with null cannot be cast ... which is causing flaky tests directly

Should we be handling failures in iframes with retry logic and eventual errors rather than silently eliding parts of the DOM?

I think retry/wait behavior for iframe content would need to happen at the assertion/command layer rather than inside hierarchy parsing itself. would be happy to see your thoughts with this perspective

Copy link
Copy Markdown
Contributor

@proksh proksh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @schmayterling

  • When iframe content is bad we keep the parent node but drop its children. An assertVisible targeting iframe content could fail on one snapshot and pass on rerun → flaky. We have stopped accepting pull requests that results in flaky test, so that will stop us to approve this PR. Do you have a solution that can be consistent?
  • Rebase to main and revert the part either of #3314 or #3315 covers.
  • Suggest rebasing on top of #3314 + #3315 once they merge so the diff shrinks to just the parsing/enrichment fix.

// ChromeDriver can execute scripts inside cross-origin iframes via switchTo().frame()
driver.switchTo().frame(iframeElement)
return try {
driver.switchTo().frame(iframeElement)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reordering - moving switchTo().frame() inside the try so a StaleElementReferenceException degrades to null - is exactly #3314. Let's revert it here and rebase on main to avoid two copies of the same fix.

val params = WebHierarchy.parseIframeViewportParams(paramsJson, iframeSrc) ?: return null

// ChromeDriver can execute scripts inside cross-origin iframes via switchTo().frame()
driver.switchTo().frame(iframeElement)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. This reordering - moving switchTo().frame() inside the try so a StaleElementReferenceException degrades to null - is exactly #3314. Let's revert it here and rebase on main to avoid two copies of the same fix.

return TreeNode(attributes = attributes, children = children)
}

fun isTransientBrowserContextError(e: Throwable): Boolean {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This classifies Selenium exceptions as transient - which is what #3315's SeleniumExceptionTranslator is for, at the Driver boundary. Once #3315 lands, callers won't see raw org.openqa.selenium.* here. Suggest removing this and the logWebHierarchyFailure / logIframeFetchFailure helpers that depend on it, and letting #3315 own the classification.

}
}

private fun logWebHierarchyFailure(message: String, e: Exception) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tied to the classification in WebHierarchy - drop with isTransientBrowserContextError. After #3315, transient/infra failures don't reach this logging path.

val rawMap = contentDesc as Map<String, Any>
val enrichedMap = injectCrossOriginIframes(rawMap)
val root = parseDomAsTreeNodes(enrichedMap)
val rawMap = WebHierarchy.normalizeDomNode(contentDesc, "web content description")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part to keep. Replacing contentDesc as Map<String, Any> (and the casts inside the old parseDomAsTreeNodes) with checked parsing is what stops the null cannot be cast to non-null type Map crash. Neither #3314 nor #3315 touches this.


val options = driver.capabilities.getCapability("goog:chromeOptions") as Map<String, Any>
val debuggerAddress = options["debuggerAddress"] as String
val options = driver.capabilities.getCapability("goog:chromeOptions") as? Map<*, *>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice defensive cast - keep. Independent of the iframe work.

webScreenRecorder?.onWindowChange()
}
}
if (windowHandles.isNotEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This window-handle fallback in detectWindowChange is a meaningful behavior change beyond the iframe-crash fix. Is it required for #3271, or is it scope that could be its own PR? Easier to review/revert if split out. (Also overlaps conceptually with #3315's NoSuchSessionException / unreachable handling — worth checking they don't fight.)

return when (query) {
is OnDeviceElementQuery.Css -> queryCss(query)
else -> super.queryOnDeviceElements(query)
if (query is OnDeviceElementQuery.Css) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these changes needed?

}
}

return mapOf(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the silent-elision behavior @Fishbowler flagged: when iframe content is bad we keep the parent node but drop its children. An assertVisible targeting iframe content could fail on one snapshot and pass on rerun → flaky.

You already noted retry/wait belongs at the command/assertion layer - agree, but we will prefer a solution that is consistent. We have stopped merging things that results in flaky results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Looped flow fails when "coming back" to the main window.

3 participants