Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brainstorming: what would a modernized jsdom API look like? #1139

Closed
domenic opened this issue May 29, 2015 · 34 comments
Closed

Brainstorming: what would a modernized jsdom API look like? #1139

domenic opened this issue May 29, 2015 · 34 comments

Comments

@domenic
Copy link
Member

domenic commented May 29, 2015

I feel like the current API is not the most user-friendly. Here are a few of my complaints:

  • The lack of clear default export (require("jsdom") doing something useful)
  • The split between env and jsdom, plus the under-loved jQuerify cousin
  • jQuerify is a lot less convenient now that we require an explicit jQuery URL
  • The util methods (createVirtualConsole etc.) are kind of sprouting everywhere
  • env's magic is often a bit too magical, especially the auto-detection of strings
  • The document-centric interface is awkward; often you want a window
  • The different default settings between env and jsdom is confusing
  • The "features" configuration is a bit crazy

I think as a prerequisite to any redesign, we'd need to fix:

If we got those down, here are some broad ideas on what I could imagine:

  • A new type of class, Jsdom, which has stuff like:
    • A window getter
    • virtualConsole and cookieJar getters (which would just return the one you passed in, if you did)
    • A serialize() method
    • Other meta stuff? Maybe promises for when everything loads?
  • A new set of top-level APIs:
    • jsdom: takes a HTML string and returns a Jsdom
    • jsdom.fromUrl: takes a URL and returns a Promise
    • jsdom.fromFile: takes a filename and returns a Promise
    • Each should take args in the form (relevantString, commonOptions, typeSpecificOptions)
    • commonOption defaults are all the same
    • current features stuff is rolled up into commonOptions (or possibly into resourceLoader?)

Alternately I was considering a "builder" class instead of a Jsdom class, which would allow things like .insertScript() to get back that jsdom.env/jsdom.jQueryify functionality, and other nicer methods... might be worth sketching that out too. We could put a promise-returning insertScript on Jsdom.prototype though perhaps.

@Joris-van-der-Wel
Copy link
Member

What about streams:

fs.openAsync('foo.html', 'r')
.then(function(fds)
{
    return jsdom.fromStream(fd);
})
.then(function(document)
{
});
express.get('/stuff', function(request, response, next)
{
    stuffDocument().then(function(document)
    {
        response.set('Content-Type', 'text/html; charset=utf-8');
        return jsdom.serializeAsStream(document).pipe(response);
    }).catch(next);
});

@Joris-van-der-Wel
Copy link
Member

Would also have to think about what constitutes as resolving the promise when parsing html:

  1. After the document & window object become available?
  2. After the XML schema / DTD has been fetched?
  3. After the entire html has been parsed? (but: a <script src="http://slowserver.foo"> will block because it might execute document.write)
  4. After everything with <script async> has been loaded?
  5. After all scripts have been executed?
  6. images, css, iframes, etc

I see a couple of choices:

  1. Give a document as soon as possible, let the user attach event listeners
  2. The user configures what to fetch and parse, wait until all of that is done
  3. The user configures what to fetch and parse, but it also configures when the promise should be resolved

@domenic
Copy link
Member Author

domenic commented May 31, 2015

What about streams:

I was thinking of that. But, that's really an entirely new feature request: adding streaming parsing support (parse5 already supports that, but hooking it up seems difficult) and adding streaming serialization support (need parse5 to add support).

Would also have to think about what constitutes as resolving the promise when parsing html:

As soon as possible, I'd think, to match the raw jsdom(string) API. I'd ideally like to add promises to the Jsdom instance for events like parsingFinished (including <script src> and document.write) and others.

@Joris-van-der-Wel
Copy link
Member

The jQuery support should perhaps be moved to a separate package? Something like:

builder = builder.plugin(require('jquery-jsdom'));

This is also motivated by the decreasing relevance of jQuery, DOM is getting better. (And most jquery plugins are horrible ;) )

@Joris-van-der-Wel
Copy link
Member

No more default fetch & parse options would be great. I keep having to remind myself to disable this when using jsdom.jsdom()

@domenic
Copy link
Member Author

domenic commented May 31, 2015

My thought was to remove the jQuery support since

var jsdom = require("jsdom");
var window = jsdom.jsdom().defaultView;

jsdom.jQueryify(window, "http://code.jquery.com/jquery-2.1.1.js", function () {
  window.$("body").append('<div class="testing">Hello World, It works</div>');
});

is not much nicer than

var jsdom = require("jsdom");

jsdom.env(undefined, ["http://code.jquery.com/jquery-2.1.1.js"], function () {
  window.$("body").append('<div class="testing">Hello World, It works</div>');
});

@domenic
Copy link
Member Author

domenic commented May 31, 2015

Maybe it would be good to add a method like

Jsdom.prototype.insertScript = function (src) {
  const window = this.window;
  return new Promise(function (resolve, reject) {
    const script = window.document.createElement("script");
    script.src = src;
    script.addEventListener("load", function () { resolve(); });
    script.addEventListener("error", function () { reject(new Error("Failed to load script " + src)); });

    window.document.body.appendChild(script);
  });
};

to make the jQueryify use case into

const jsdom = require("jsdom");

const dom = jsdom();
dom.insertScript("http://code.jquery.com/jquery-2.1.1.js").then(function () {
  dom.window.$("body").append('<div class="testing">Hello World, It works</div>');
});

@Joris-van-der-Wel
Copy link
Member

All one-off "events" (document object is available, all scripts loaded, all html parsed, etc) could be promises on the Jsdom object:

var jsdom = require('jsdom).parseFile('foo.html', {});
jsdom.documentAvailable.then(..);
jsdom.scriptsLoaded.then(...);

Some events that occur multiple times might be interesting:

var jsdom = require('jsdom).parseFile('foo.html', {});
jsdom.on('console', function(level, message){ ... });
jsdom.on('raise', function(level, message){ ... });
jsdom.on('scriptError', function(err){ ... });
jsdom.on('fetch', function(type, url){ ... });

@domenic
Copy link
Member Author

domenic commented May 31, 2015

I'd really like to preserve a synchronous cheerio-esque way of just getting a window/document ASAP.

@Joris-van-der-Wel
Copy link
Member

Maybe it would be good to add a method like .. to make the jQueryify use case into

I can also imagine a use case where you would want to inject a script before any other script executes.

The downside about using promises for the loading phases is that they are fired async. E.g. if I want to inject a script at the correct time, a promise will be too late.

@Joris-van-der-Wel
Copy link
Member

I'd really like to preserve a synchronous cheerio-esque way of just getting a window/document ASAP.

Me too. That is actually my primary use case in jsdom.

The point at which a document is available does not have to be a promise. So iit would just be empty initially if you are loading stuff from an URL.

@domenic
Copy link
Member Author

domenic commented May 31, 2015

The point at which a document is available does not have to be a promise. So iit would just be empty initially if you are loading stuff from an URL.

I really dislike that kind of bimodal API. I'd rather Jsdom objects always represent situations where both the window and document are available.

@Joris-van-der-Wel
Copy link
Member

I really dislike that kind of bimodal API. I'd rather Jsdom objects always represent situations where both the window and document are available.

That is what i meant

@domenic
Copy link
Member Author

domenic commented May 31, 2015

Right, so in that case having parseFile return a Jsdom, instead of a Promise<Jsdom>, is not OK.

@inikulin
Copy link
Contributor

inikulin commented Jun 1, 2015

@domenic parse5 don't support streaming at the moment (related discussion: inikulin/parse5#26). But it supports parser suspension on document.write' which can be treated as a special case of streaming behavior. Stream chunks may have unfinished trailing tokens which is quite tricky to handle without performance regression.

@Joris-van-der-Wel
Copy link
Member

Not just parsing a stream is useful. Writing to a stream is also useful, this should be a lot easier to implement. parse5 simply has a bunch of this.html += ... in its serializer

@inikulin
Copy link
Contributor

inikulin commented Jun 1, 2015

@Joris-van-der-Wel You are right, streaming serializer implementation is not a big deal.

@domenic
Copy link
Member Author

domenic commented Aug 5, 2015

Here are some examples of my current thinking. More are needed, especially showcasing the options. These are kind of insertScript-focused.

jsdom.fromUrl("https://iojs.org/dist/")
  .then(function (dom) {
    return dom.insertScript("http://code.jquery.com/jquery.js");
  })
  .then(function (dom) {
    const window = dom.window;
    console.log("there have been", window.$("a").length - 4, "io.js releases!");
  });
const dom = jsdom.fromHtml(`<p><a class="the-link" href="https://github.com/tmpvar/jsdom">jsdom!</a></p>`);

dom.insertScript("http://code.jquery.com/jquery.js").then(function () {
  const window = dom.window;
  console.log("contents of a.the-link:", window.$("a.the-link").text());
});

@inikulin
Copy link
Contributor

inikulin commented Aug 5, 2015

dom object is confusing for me, since we already have jsdom

@domenic
Copy link
Member Author

domenic commented Aug 5, 2015

Yeah wasn't sure the best name for it. It's a container that has stuff like window, loaded promise, virtualConsole, cookieJar, ...

@inikulin
Copy link
Contributor

inikulin commented Aug 5, 2015

It will definetly make sense if we'll have ctor:

const dom = new JsDom(`<p><a class="the-link" href="https://github.com/tmpvar/jsdom">jsdom!</a></p>`);

dom.insertScript("http://code.jquery.com/jquery.js").then(function () {
  const window = dom.window;
  console.log("contents of a.the-link:", window.$("a.the-link").text());
});

And the static method for the URL:

JsDom.fromUrl("https://iojs.org/dist/")
  .then(function (dom) {
    return dom.insertScript("http://code.jquery.com/jquery.js");
  })
  .then(function (dom) {
    const window = dom.window;
    console.log("there have been", window.$("a").length - 4, "io.js releases!");
  });

However, it introduces inconsistency in the way we build DOM. But guys with the C#/Java background will be definetly happy 😄

@domenic
Copy link
Member Author

domenic commented Aug 5, 2015

I think the constructor is an improvement over my original proposal, which was just a set of factory functions that would return Jsdom instances. +1.

@knpwrs
Copy link
Contributor

knpwrs commented Nov 18, 2015

Alternately I was considering a "builder" class instead of a Jsdom class, which would allow things like .insertScript() to get back that jsdom.env/jsdom.jQueryify functionality, and other nicer methods... might be worth sketching that out too. We could put a promise-returning insertScript on Jsdom.prototype though perhaps.

I might be thinking of something different, but one pattern I've seen is for .then to actually act as the builder method, e.g., from Knex.js:

knex.table('users').pluck('id').then(ids => console.log(ids));

.pluck doesn't actually return a Promise. It returns an object that has a then method (a thenable, in promise terms) that builds the query, executes it, attaches its callback to a promise, and then returns that promise in order to enable chaining. The cool thing about this is that you can chain all day and then nothing happens until the first call to then. This becomes especially useful when queries become longer / more complicated:

function withUserName(queryBuilder, foreignKey) {
  queryBuilder
    .leftJoin('users', foreignKey, 'users.id')
    .select('users.name');
}

knex
  .table('articles')
  .select('title', 'body')
  .modify(withUserName, 'articles_user.id')
  .then(({name}) => console.log(name));

Another pattern I've seen, which would be more relevant to JSDom as a browser technology, is the internal promise queue à la webdriver:

driver.get('http://www.google.com/ncr');
driver.findElement(By.name('q')).sendKeys('webdriver');
driver.findElement(By.name('btnG')).click();
driver.wait(until.titleIs('webdriver - Google Search'), 1000);

Each of those methods returns a Promise (or maybe a thenable, not actually sure in this case) but each sequential promise only resolves after all of the previous promises have resolved. This allows the end user to not have to worry about Promise chaining (until ES7 makes it less verbose, anyway). In testing frameworks that support Promises, simply return the last statement:

it('should have the right title', function () {
  driver.get('http://www.google.com/ncr');
  driver.findElement(By.name('q')).sendKeys('webdriver');
  driver.findElement(By.name('btnG')).click();
  return driver.wait(until.titleIs('webdriver - Google Search'), 1000);
});

@snuggs
Copy link
Contributor

snuggs commented Nov 19, 2015

"It returns an object that has a then method (a thenable, in promise terms) that builds the query, executes it, attaches its callback to a promise, and then returns that promise in order to enable chaining. The cool thing about this is that you can chain all day and then nothing happens until the first call to then. "

The not so cool part of it is now the promise(able) also has unexpected side effects. An interface should work as the developer expects. I'd expect nothing to be executed when calling a then. Only when I call an end or finally would I expect an operation to run. I'd stay away from overriding the then interface of the the promise API. Sounds KEWL....and confusing. My $0.02 💭

@domenic
Copy link
Member Author

domenic commented May 29, 2016

First draft up at https://github.com/tmpvar/jsdom/blob/newapi/lib/newapi1.md. Upon re-reading this thread I seem to have forgotten the constructor idea, so I should probably switch to that before doing a release. But I might do a release with newapi available soon (while leaving oldapi as the default).

Comments welcome!

@arturparkhisenko
Copy link

arturparkhisenko commented May 29, 2016

@domenic Maybe it's unneeded.. but how about, something related to shadow dom, elements api js methods / and etc related to that.. i'm not an expert so.. sorry for dumb questions/ideas, sometimes i'm thinking about a problems like:

  • I have to save a references of some elements in global scope (to escape window.query(el).query(el))
  • How to access parent custom element (related to previous issue, example: customElement1 -> div -> customElement2)
  • CSP requirements / possible problems

@domenic
Copy link
Member Author

domenic commented May 29, 2016

That doesn't really seem to have anything to do with the new jsdom API, and is instead some kind of jsdom feature request that it support shadow DOM and custom elements and CSP?

@arturparkhisenko
Copy link

arturparkhisenko commented May 29, 2016

I think it's a good thing, and maybe w3c forgot to add something for that in usual dom api (yes you ​are completely right, and maybe after a year ​of working with Web Components i'm feeling like that.. and I ask myself why this transfer to the Web Components takes so much time and some features that we ​are still waiting to be implemented in shadow dom v1 and future releases). Of course i think someone ​should start it with propose / demo / polyfills.. . And thanks for this work! You're our superhero! An excellent example and a big motivation..

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented May 30, 2016

What will be the new equivalent of the done callback? Is jsdom() synchronous for external resources loading and execution? Does it returns when all these are complete after parsing? Shoud the user set listeners for window load event?

@domenic
Copy link
Member Author

domenic commented May 30, 2016

The tentative idea is to provide a dom.loaded promise, as described at the bottom of the document. jsdom() is not synchronous for any external resources.

@domenic
Copy link
Member Author

domenic commented Jul 3, 2016

Two issues I am currently struggling with:

  • Should we do new JSOM(html, options), and JSDOM.fromURL(url, options), or jsdom(html, options) and jsdom.fromURL(url, options)? I kind of like being honest about the class, but on the other hand I value the brevity of just jsdom("some html here").

  • As per the document, my current idea for the new resource loader infrastructure (taking over features.DocumentFeatures and the current resourceLoader callback) is

    const dom = jsdom(``, {
      resources: {
        allowed: ["script#foo", "iframe", `link[rel="stylesheet"]`]
        fetch({ url, cookie, referrer, defaultFetch }) {
          // return a promise
        }
      }
    });

    However, this selector-based filtering approach does not work for XHR. I was thinking of moving to an approach based on https://fetch.spec.whatwg.org/#concept-request-destination, but that doesn't allow whitelisting or blacklisting specific elements or URLs.

    If we had all of Fetch implemented, maybe I would go with a function like allowed(request) { return true; }. But we don't. I guess the fetch() hook would look different there too.

    Maybe both allowed() and fetch() should take some kind of "jsdom request" object, which is something like { url, cookie, referrer, destination, defaultFetch }?

    Alternately we could just hack something where the specific string "xhr" allows XHRs? Meh, but that doesn't allow filtering based on URL...

@kapouer
Copy link
Contributor

kapouer commented Jul 12, 2016

@domenic you want api for thought ? https://webkit.org/blog/3476/content-blockers-first-look/

@TimothyGu
Copy link
Member

I think this should be closed after #1741?

@domenic
Copy link
Member Author

domenic commented May 11, 2017

Never did figure out what to do for resource loader, but yeah, let's close this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants