Seperate head and body in gatherer? #77

gauntface · 2016-03-24T14:05:55Z

At the moment viewport audit is running a regex over the entire html. It would be useful for this (and future audits) to seperate the body and the head as inputs. The other alternative would be to have these as gathers, but that feels like it wouldn't be as flexible.

Thoughts on having input.html + input.head + html body?

paullewis · 2016-03-24T14:44:26Z

I'm wondering if we need to change the HTML gatherer to a DOM gatherer. In the Node code we can use JSDOM / cheerio, in the extension use DOMParser. Then you have something you can query more effectively with querySelector('head meta[name="viewport"]').

I suspect many of the audits that will want to query DOM will want to do so that way rather than via RegExps.

gauntface · 2016-03-24T14:48:58Z

sounds good. Will take a look at doing this instead.

paullewis · 2016-03-24T14:51:16Z

Yeah def give it a go. I do some branching in the manifest gatherer for similar reasons (fetch in extension vs request in Node). This would likely be the same thing. The viewport is the only audit reliant on this atm, so it's impact should be minimal, and the viewport audit has tests so ... 👍

paullewis · 2016-03-24T14:52:31Z

BTW if you use DOMParser you need to know it doesn't throw if it can't parse. It will make you a doc with a <parseerror> node .... yah I know.

gauntface · 2016-03-24T18:03:55Z

I've started looking at one way to do this dom like selector in the audits and ended up doing the following - https://github.com/GoogleChrome/lighthouse/blob/html-theme-color-audit/gatherers/html.js

I essentially give the audits access to the window OR use JSDom for node environments. (In the chrome extension I have to ignore the jsdom module because babel / browserify doesn't like something in it).

@paulirish seems not too happy about JSDom being added, so simple solutions would be:

Use basic regex to extract the head and body content of the HTML string and share that with audits (Nice and simple but still requires regex for extraction, which isn't that big of a deal).
Use chrome driver to access to extract the head and body and share those strings with the audits (Slight variation on the above)
Pass the driver to the audits to request anything they want (Feels O.T.T and may not be overly reliable if the audits are async)
.....

paulirish · 2016-03-25T00:25:09Z

First, on ChromeDriver... It's doing 90% of its browser interaction by via the Chrome Debugging Protocol that we're using directly. So it's pretty much the jQuery to our DOM. And in this case, we're too cool for jQuery. (That and we're unable to do everything we want via ChromeDriver and they dont work side by side)

Overall, we should always trust the browser as source of truth and take care when serializing/reparsing over in Node that we maintain 100% fidelity with what the browser sees.

I think we have a few options :

gather/theme-color-meta.js - read DOM: We use DOM.querySelector and DOM.getAttributes etc, just like in gathers/manifest.js
gather/theme-color-meta.js ask JS: We use Runtime.evaluate and get the answers we need from clientside JS. (actually what's in the old viewport gather and audit)
gather/DOM-snapshot.js read entire DOM: We dump some representation of the full DOM tree and reuse that across tests.

Talking with @paullewis and both of us favor 1 or 2, assuming they are done as a specialist gather and definitely not in the audit. They are fairly one-off gathers and rely on a live runtime, but they should be straightforward and less brittle than... 3.. which would be very testable, but would also need ergonomics sugar to handle walking, querySelectoring, etc.

@gauntface do you have a preference of 1 or 2?

gauntface · 2016-03-25T00:43:58Z

I'm probably more inclined for option 2 but curious why it was changed from
this?

On Fri, 25 Mar 2016, 00:25 Paul Irish, notifications@github.com wrote:

I read in the opensource handbook that you should upset everyone with a
big rejection of a new contributor's first PR.

Or not. [image: 😕] Again, sorry for being terse and stuff.

I feed this thread more flowers for good luck: [image: 🌷] [image:

💐] [image: 🌹]

First, on ChromeDriver... It's doing 90% of its browser interaction by via
the Chrome Debugging Protocol
https://chromedevtools.github.io/debugger-protocol-viewer/ that we're
using directly. So it's pretty much the jQuery to our DOM. And in this
case, we're too cool for jQuery. (That and we're unable to do everything we

want via ChromeDriver and they dont work side by side)

Overall, we should always trust the browser as source of truth and take
care when serializing/reparsing over in Node that we maintain 100% fidelity
with what the browser sees.

I think we have a few options :

gather/theme-color-meta.js - read DOM: We use DOM.querySelector and
DOM.getAttributes etc, just like in gathers/manifest.js

lighthouse/gatherers/manifest.js

Lines 49 to 53 in af3ead2

return driver.sendCommand('DOM.getDocument')

.then(result => result.root.nodeId)

.then(nodeId => driver.sendCommand('DOM.querySelector', {

nodeId: nodeId,

selector: 'link[rel="manifest"]'

gather/theme-color-meta.js ask JS: We use Runtime.evaluate and get
the answers we need from clientside JS. (actually what's in the old
viewport gather

lighthouse/audits/viewport-meta-tag/gather.js

Lines 21 to 33 in 432323b

// must be defined as a standalone function expression to be stringified successfully.

function findMetaViewport() {

return document.head.querySelector('meta[name="viewport"]');

}

var ViewportHasMetaContent = {

run: function(driver, url) {

return driver.evaluateFunction(url, findMetaViewport)

.then(viewportElement => ({viewportElement}));

}

};

module.exports = ViewportHasMetaContent;

and audit
https://github.com/GoogleChrome/lighthouse/blob/432323b1035e0449bb1ebdc20f0ef53923365a3c/audits/viewport-meta-tag/audit.js
)

gather/DOM-snapshot.js read entire DOM: We dump some representation
of the full DOM tree and reuse that across tests.

Talking with @paullewis https://github.com/paullewis and both of us
favor 1 or 2, assuming they are done as a specialist gather and definitely
not in the audit. They are fairly one-off gathers and rely on a live
runtime, but they should be straightforward and less brittle than... 3..
which would be very testable, but would also need ergonomics sugar to
handle walking, querySelectoring, etc.

@gauntface https://github.com/gauntface do you have a preference of 1
or 2?

—
You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#77 (comment)

paulirish · 2016-03-28T21:07:20Z

I'm probably more inclined for option 2 but curious why it was changed from this?

Not sure. Lewis did it a while back.

Closing this as it seems to be mostly taken care of with the DOM based approach in the PR now. I'm adding some documentation so that there's justification for this sorta thing.

paulirish · 2016-03-28T21:08:16Z

Oh, on Chrome Driver. It might make sense to use ChromeDriver for tests, but I have a feeling like it'd break since we require the protocol in either case (although devtools eng is working on fixing that bug).

brendankenny · 2016-03-28T21:24:08Z

note that we already have the latest (continuous build) Chrome downloading and starting up (listening on port 9222) on Travis, so anyone can start writing tests against a live browser today

paullewis mentioned this issue Mar 24, 2016

Adding audit for theme-color in HTML #79

Closed

paulirish closed this as completed Mar 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seperate head and body in gatherer? #77

Seperate head and body in gatherer? #77

gauntface commented Mar 24, 2016

paullewis commented Mar 24, 2016

gauntface commented Mar 24, 2016

paullewis commented Mar 24, 2016

paullewis commented Mar 24, 2016

gauntface commented Mar 24, 2016

paulirish commented Mar 25, 2016

gauntface commented Mar 25, 2016

💐] [image: 🌹]

want via ChromeDriver and they dont work side by side)

paulirish commented Mar 28, 2016

paulirish commented Mar 28, 2016

brendankenny commented Mar 28, 2016

Seperate head and body in gatherer? #77

Seperate head and body in gatherer? #77

Comments

gauntface commented Mar 24, 2016

paullewis commented Mar 24, 2016

gauntface commented Mar 24, 2016

paullewis commented Mar 24, 2016

paullewis commented Mar 24, 2016

gauntface commented Mar 24, 2016

paulirish commented Mar 25, 2016

gauntface commented Mar 25, 2016

💐] [image: 🌹]

want via ChromeDriver and they dont work side by side)

paulirish commented Mar 28, 2016

paulirish commented Mar 28, 2016

brendankenny commented Mar 28, 2016