Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom selector engines (Querying nested shadow roots) #5405

Open
Georgegriff opened this issue Feb 10, 2020 · 8 comments
Open

Support for custom selector engines (Querying nested shadow roots) #5405

Georgegriff opened this issue Feb 10, 2020 · 8 comments

Comments

@Georgegriff
Copy link

@Georgegriff Georgegriff commented Feb 10, 2020

What is this?
Other tools offer the ability to provide a custom engine selecting elements in the DOM.

(example from playwright)

  await playwright.selectors.register(selectorEngine, { name: 'shadow' })
  await page.waitForSelector('shadow=#no-downloads span', {timeout: 3000}

I've seen #382 but i'm not sure this offers such an easy mechanism thats really simple for end users. I'm happy to be wrong on this, i couldn't find any examples.

Playwright offers this. https://github.com/microsoft/playwright/blob/master/docs/api.md#selectorsregisterenginefunction-args

In Selenium world its things like this: https://chercher.tech/java/custom-locators-selenium-webdriver

Why is this useful?
Traditionally these custom locators would be used to provide the ability to select elements via XPATH or JQuery selectors.

Why do i want this?
I maintain: https://github.com/Georgegriff/query-selector-shadow-dom which allows users to write css selectors that automatically pierce web component shadow roots and it was trivial to add support in Playwright to use my library as a selector engine.
Like so:

const { selectorEngine } = require("query-selector-shadow-dom/plugins/playwright");
const playwright = require('playwright')
  await playwright.selectors.register(selectorEngine, { name: 'shadow' })

  const browser = await playwright.chromium.launch({ headless: false})
  const context = await browser.newContext({ viewport: null })
  const page = await context.newPage()

  await page.goto('chrome://downloads')

  await page.waitForSelector('shadow=#no-downloads span', {timeout: 3000})
  await new Promise(resolve => setTimeout(resolve, 3000))   

  await page.close()
  await context.close()
  await browser.close()

Registering this engine allows users to use click waitForSelector and thing that accepts a selector to use my library to automatically pierce shadow roots.

How is my engine implemented in playwright?

Playwright defines this interface: https://github.com/microsoft/playwright/blob/master/docs/api.md#selectorsregisterenginefunction-args which accepts a Function/String
They will take your function and pass into into the browser context and handle the rest for you so you can use the engine for click etc.

My library implements this interface: https://github.com/Georgegriff/query-selector-shadow-dom/blob/master/plugins/playwright/index.js (It does this a little strangely using string because i need to inject my library into the function scope)

@paullewis
Copy link
Collaborator

@paullewis paullewis commented Apr 30, 2020

Hey, so we now have an experimental API that lets you do this (on master). Roughly it looks like this:

// Custom query handler.
const doesNotHaveClass = 
    (element, className) => element.querySelectorAll(`:not(.${className})`);

// Register it.
puppeteer.__experimental_registerCustomQueryHandler('doesNotHaveClass', 
    doesNotHaveClass);

// Prepend queries with the name of the handler.
const elements = await page.$$('doesNotHaveClass/foo');

We have the following APIs:

__experimental_registerCustomQueryHandler(name: string, queryHandler: QueryHandler): void;
__experimental_unregisterCustomQueryHandler(name: string): void;
__experimental_customQueryHandlers(): Map<string, QueryHandler>;
__experimental_clearQueryHandlers(): void;

Where QueryHandler is a relatively generic term for a function of the form:

(element, selector) => Element | Element[] | NodeListOf<Element>

Other points of note:

  • It's experimental (so it might change!)
  • You can only register one function for a given name, and names can only contain [a-zA-Z]
  • You can register or invoke a function that doesn't follow the expectations of $, $$, $$eval, or waitFor{Selector}, and you will either get unexpected outcomes or an error. In short we don't check that the query handler you invoke is going to do what you expect :)
@zewa666
Copy link

@zewa666 zewa666 commented May 6, 2020

this does sound super interesting. For the official Aurelia i18n plugin we're making use of custom attributes, by default named t, which contain a string pointing to a resource and translates by default the textContent of the target element.

an example would be something like this:

<span t="title">Title</span>

additionally next to the default textContent target the user can override the target with this syntax

<span t="[alt]title">Title</span>

So ideally we could forward multiple params to the custom query handler (something along these lines)

// Custom query handler.
const i18n = 
    (element, key, target) => element.querySelectorAll(`[t^='${target ? '[' + target + ']' : ''}${key}']`);

// Register it.
puppeteer.__experimental_registerCustomQueryHandler('i18n ', i18n);

const elementsWithoutTarget = await page.$$('i18n /title');
const elementsWithTarget = await page.$$('i18n /title/alt');

There are many more opportunities but essentially having multiple params available, would open up much more use cases

@mathiasbynens
Copy link
Member

@mathiasbynens mathiasbynens commented May 7, 2020

@zewa666 These use cases can already be addressed by splitting the selector string into parts in your custom query handler:

const myQueryHandler = (element, selector) => {
  const params = splitIntoParameters(selector);
  return doStuff(element, params);
};

The reason we want to avoid handling this for you (in this case by splitting on /) is because every use case might have different requirements, and we don't want to limit the possibilities. In XPath for example / already has special meaning, so if your custom handler uses XPath you likely wouldn't want to use / to mean anything special in your selector aside from the custom myQueryHandler/ prefix.

@zewa666
Copy link

@zewa666 zewa666 commented May 7, 2020

Oh ok yeah that makes sense. I thought the / was a kind of convention (like xpath) and you had to distinguish params by that. In this case my call can be simply

await page.$$('i18n/[alt]title');

Thanks for the clarification

@mathiasbynens
Copy link
Member

@mathiasbynens mathiasbynens commented May 7, 2020

@zewa666 Exactly! The i18n/ is the prefix that tells Puppeteer which custom query handler to use. The string '[alt]title', i.e. the rest of the selector, is then passed to your custom handler where you can process it however way you like.

@mathiasbynens
Copy link
Member

@mathiasbynens mathiasbynens commented May 7, 2020

@paullewis How would you register a custom query handler that supports both $ and $$?

@Georgegriff
Copy link
Author

@Georgegriff Georgegriff commented May 8, 2020

I've also just run into this problem i've got .$ working fine, but not when i return an array of elements, with a shadow-dom based query handler, it falls over, my handlers attached at bottom to this comment. Ideally i could support both and puppeteer could choose what action to take accordingly based on if the user wanted $$ or $

This is handled in playwright by registering two functions query, and queryAll, in what if the query handlers supported returning something like this

const () => {
  return {
   query(element, selector) => // stuff you do for returning a single element,
   queryAll(element, selector) => // stuff you do for returning multiple elements
  },
}

Then puppeteer could call the appropriate function, or alternatively allow something like this:

        puppeteer.__experimental_registerCustomQueryHandler('shadow', queryHandler, queryAllHandler);

Where second function is intended to return arrays or nodelists

This works for $. but not $$.
(based on query-selector-shadow-dom)

my lib has two func:

return querySelectorShadowDom.querySelectorDeep(selector, element);
and
return querySelectorShadowDom.querySelectorAllDeep(selector, element);
The are mirrors of querySelector/querySelectorAll but they automatically pierce nested shadow roots.

The first function works, but using the 2nd doesn't because the second returns an array.

const queryHandler = (element, selector) => {

    // minified library guff to inject my code into the handler, scroll past
    var querySelectorShadowDom=function(e){"use strict";function o(e,a,c){var t=c.querySelector(e);return document.head.createShadowRoot||document.head.attachShadow?!a&&t?t:h(e,",").reduce(function(e,t){if(!a&&e)return e;var l,d,i,o=h(t.replace(/^\s+/g,"").replace(/\s*([>+~]+)\s*/g,"$1")," ").filter(function(e){return!!e}),r=o.length-1,n=function(t,e){void 0===t&&(t=null);var n=[],o=function e(t){for(var o,r=0;o=t[r];++r)n.push(o),o.shadowRoot&&e(o.shadowRoot.querySelectorAll("*"))};e.shadowRoot&&o(e.shadowRoot.querySelectorAll("*"));return o(e.querySelectorAll("*")),t?n.filter(function(e){return e.matches(t)}):n}(o[r],c),u=(l=o,d=r,i=c,function(e){for(var t,o,r,n=d,u=e,a=!1;u&&(r=u).nodeType!==Node.DOCUMENT_FRAGMENT_NODE&&r.nodeType!==Node.DOCUMENT_NODE;){var c=u.matches(l[n]);if(c&&0===n){a=!0;break}c&&n--,t=i,o=u.parentNode,u=o&&o.host&&11===o.nodeType?o.host:o===t?null:o}return a});return a?e=e.concat(n.filter(u)):(e=n.find(u))||null},a?[]:null):a?c.querySelectorAll(e):t}function h(e,o){return e.match(/\\?.|^$/g).reduce(function(e,t){return'"'!==t||e.sQuote?"'"!==t||e.quote?e.quote||e.sQuote||t!==o?e.a[e.a.length-1]+=t:e.a.push(""):(e.sQuote^=1,e.a[e.a.length-1]+=t):(e.quote^=1,e.a[e.a.length-1]+=t),e},{a:[""]}).a}return e.querySelectorAllDeep=function(e,t){return void 0===t&&(t=document),o(e,!0,t)},e.querySelectorDeep=function(e,t){return void 0===t&&(t=document),o(e,!1,t)},e}({});
    
    // my lib communicating with the new puppeteer api
    return querySelectorShadowDom.querySelectorDeep(selector, element);
}

Incidentally, Playwright recently made their inbuilt css selector automagically work for shadow dom: https://github.com/microsoft/playwright/releases/tag/v0.14.0

@paullewis
Copy link
Collaborator

@paullewis paullewis commented May 11, 2020

@paullewis How would you register a custom query handler that supports both $ and $$?

You would just register it and use it wherever you like. We don't make any distinction in the code about which function the handler is for. That said the implementation of the handler will either be doing something that expects a single element or a collection of elements, which will naturally lend it to either $ or $$, but that's not enforced so much as it's about what the function returns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.