Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make findAll non-recursive to avoid stack overflows #22

Merged
merged 4 commits into from
Apr 30, 2017

Conversation

sirwart
Copy link
Contributor

@sirwart sirwart commented Jun 17, 2016

No description provided.

@sirwart sirwart changed the title Make findAll non-recursive solution to avoid stack overflows Make findAll non-recursive to avoid stack overflows Jun 17, 2016
@fb55
Copy link
Owner

fb55 commented Jun 17, 2016

Could you provide a benchmark? It would be interesting to see which performance implications this has.

@fb55
Copy link
Owner

fb55 commented Jun 17, 2016

Also this is a switch from depth- to breadth-first search and therefore alters the result.

@dbuezas
Copy link

dbuezas commented Aug 27, 2016

I just tested 5 variations of findAll with 6000 selectors in a total of 21 pages using cheerio in node 6.3.0.
The original version is consistently around 40% slower than the others and concat alone is the culprit.

original: 5710ms (144%)
(sirwart) non recursive: 3964ms (100%)
(sirwart 2): 3720 (94%)
(1) non recursive + storing the result in an object to avoid pushes: 3942ms (99%)
(2) storing the result in an object to avoid pushes and concats: 3995ms (100%)
(3) using result.push.apply instead of concat: 4185ms (105%)

@sirwart 's solution with a tiny twist is the fastest, probably consumes the least memory and solves potential stack overflows.
Using result.push.apply is the smallest change.

Publishing this update translates in a direct 35% 1-(3964/5710) performance improvement in cheerio dom querying!

test functions:
(sirwart 2)

function findAll(test, rootElems){
    var result = [];
    var stack = [rootElems];
    while(stack.length){
        var elems = stack.pop();
        for(var i = 0, j = elems.length; i < j; i++){
            var elem = elems[i];
            if(isTag(elem)) {
                if(test(elem)) result.push(elem);
                stack.push(elem.children); // doesn't make much sense, but not checking for its length here makes it consistently a bit faster
            }
        }
    }
    return result;
}

(1)

function findAll(test, rootElems){
    var result = {};
    var k = 0;
    var stack = [rootElems];
    while(stack.length){
        var elems = stack.pop();
        for(var i = 0, j = elems.length; i < j; i++){
            if(!isTag(elems[i])) continue;
            if(test(elems[i])) result[k++] = elems[i];

            if(elems[i].children.length > 0){
                stack.push(elems[i].children);
            }
        }
    }
    var r = new Array(k)
    for(var i = 0; i < k; i++){
        r[i] = result[i]
    }
    return r
}

(2) storing the result in an object to avoid pushes and concats

function findAll(test, elems){
    var els = {};
    var count = 0;
    var inner = function(test, elems){
        for(var i = 0, j = elems.length; i < j; i++){
            if(!isTag(elems[i])) continue;
            if(test(elems[i])) els[count++] = elems[i];

            if(elems[i].children.length > 0){
                inner(test, elems[i].children);
            }
        }
    }
    inner(test, elems);
    var r = new Array(count);
    for(var i = 0; i < count; i++){
        r[i] = els[i];
    }
    return r[i];
}

(3)

function findAll(test, elems){
    var result = [];
    for(var i = 0, j = elems.length; i < j; i++){
        if(!isTag(elems[i])) continue;
        if(test(elems[i])) result.push(elems[i]);

        if(elems[i].children.length > 0){
            result.push.apply(result, findAll_3(test, elems[i].children));
        }
    }
    return result;
}

@amit777
Copy link

amit777 commented Apr 30, 2017

Hi, was wondering if this patch is planned to be merged? I'm not sure if the breadth vs depth-first change has any functional impact. I run across this same error occasionally in production and am glad I found this thread.

@fb55 fb55 merged commit f13cd75 into fb55:master Apr 30, 2017
@fb55
Copy link
Owner

fb55 commented Apr 30, 2017

@amit777 Correct ordering is pretty important here, as everything else would be pretty confusing. This PR still visited the last child first, but because of the gains outlined by @dbuezas (thanks for that!) it made sense to merge this and fix it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants