New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markRegExp locking up #55

Closed
Mottie opened this Issue Jul 18, 2016 · 22 comments

Comments

Projects
None yet
4 participants
@Mottie
Copy link
Collaborator

Mottie commented Jul 18, 2016

When using markRegExp, some "partial matching" regular expressions will cause the browser to lock up.

For example, using this demo for reference:

  • When a regular expression of /(lorem|ipsum)/gi is entered into the input; the result is as expected.
  • As you delete letters from the end of the second word... you'll get to this point: /(lorem|i)/gi.
  • Then, deleting the last letter i will give you this regular expression: /(lorem|)/gi which locks up the browser.

I've tested this from the console with the same result.

$(".context").unmark().markRegExp(/(lorem|)/gi);

It probably isn't in an infinite loop, it's just trying to match an enormous number of empty strings:

'Lorem ipsum dolor sit āmet'.match(/(lorem|)/gi)
// results: ["Lorem", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]
@julmot

This comment has been minimized.

Copy link
Owner

julmot commented Jul 18, 2016

Hi @Mottie,

Thanks for submitting!

The plugin doesn't validate regular expressions or any of its option values.

  1. Why do you need to pass an empty alternative?
  2. What do you expect to happen?

Thank you in advance.

@julmot julmot self-assigned this Jul 18, 2016

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Jul 18, 2016

I have a filter that allows users to actively type in a regular expression into an input to search for results, so it's not that I am passing an empty alternative, it's the user typing.

With that particular regular expression, I would expect only the "lorem" to match and not lock up the browser.

@julmot

This comment has been minimized.

Copy link
Owner

julmot commented Jul 18, 2016

Alright, so what do you suggest? Checking if it's an empty value?
If we introduce these tests there may be others complaining that it doesn't check all kind of invalid regular expressions they encountered.

Why not adjust your testRegex to catch these case?

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Jul 18, 2016

Well, you really never know what a user is going to enter.

I've been testing alternatives and the "easiest" solution I found would be to add the following line of code

if (match[0] === "") { break; }

Inside the "wrapMatches" function while loop (updated demo):

{
    key: "wrapMatches",
    value: function wrapMatches(node, regex, custom, cb) {
        var hEl = !this.opt.element ? "mark" : this.opt.element,
            index = custom ? 0 : 2;
        var match = void 0;
        while ((match = regex.exec(node.textContent)) !== null) {
            if (match[0] === "") {
                break; // prevents browser lock-up
            }
            var pos = match.index;
            if (!custom) {
                pos += match[index - 1].length;
            }
            var startNode = node.splitText(pos);

            node = startNode.splitText(match[index].length);
            if (startNode.parentNode !== null) {
                var repl = document.createElement(hEl);
                repl.setAttribute("data-markjs", "true");
                if (this.opt.className) {
                    repl.setAttribute("class", this.opt.className);
                }
                repl.textContent = match[index];
                startNode.parentNode.replaceChild(repl, startNode);
                cb(repl);
            }
            regex.lastIndex = 0;
        }
    }
}

The result is that if the regular expression matches an empty string (e.g. /(lorem|)/gi), nothing gets marked, which I think is acceptable.

@julmot

This comment has been minimized.

Copy link
Owner

julmot commented Jul 19, 2016

@Mottie If you switch to v7.0.0+ you can use the filter option to pass this custom filter. It'll be called exactly at the position you've mentioned. Here is an example with your fiddle: https://jsfiddle.net/vpav6tL1/136/

However, as this doesn't validate the RegExp, the empty matches will still be iterated even if you ignore them. I'm surprised you've mentioned that this would solve your issue. I'm still experiencing a browser freeze. What browser/OS are you using?

From my point of view you'll have to make sure that the RegExp a user filled in doesn't match empty things. I know that this is a challenge, as there can be very much variants. If you can't determine if it's "valid" using a RegExp, you may check if it finds empty parts by executing it on a test string (e.g. yourRegExp.test("lorem")).

What do you think?

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Jul 19, 2016

I've been testing the script in Chrome on a Windows 10 64 bit system.

Yeah, I guess I'll have to see what I come up with. Thanks.

@julmot

This comment has been minimized.

Copy link
Owner

julmot commented Jul 19, 2016

@Mottie Alright. Thanks for asking, though. If you have any other issue concerning mark.js, don't hesitate to open another issue.

@julmot julmot closed this Jul 19, 2016

@julmot

This comment has been minimized.

Copy link
Owner

julmot commented Jul 19, 2016

@Mottie Could you imagine to share your solution here if you have one? I think this would be great for others trying to realizing the same.

@julmot

This comment has been minimized.

Copy link
Owner

julmot commented Jul 20, 2016

@Mottie Thanks for referencing this. If it's too uncomfortable to fix this yourself, there might be another option. We could introduce a new return value to break the loop, not just continue. Currently, if the filter returns false it will ignore the match and continue. But as mentioned, this is the reason why it doesn't solve the performance issue.

  • 1 for executing the loop
  • 0 for breaking the loop entirely
  • -1 for stopping the current loop

But of course, only if really necessary.

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Jul 20, 2016

So what I think I'll do is just use the mark function. I'll split the keyword at the vertical bar and pass in an array. I know it isn't ideal, but it's a simple solution (demo).

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Jul 20, 2016

Actually, here is a mark widget demo...

Maybe someday I'll figure out how to get it to highlight matching words with a fuzzy query (e.g. ~ge will match "Girafee" and "Google" in that demo).

@julmot

This comment has been minimized.

Copy link
Owner

julmot commented Jul 20, 2016

@Mottie

I'll split the keyword at the vertical bar and pass in an array.

Honestly, I don't think that this will be a satisfactory solution. When users fill in /test/gi this will match nothing, also with ~ge as you mentioned.

Maybe someday I'll figure out how to get it to highlight matching words with a fuzzy query (e.g. ~ge will match "Girafee" and "Google" in that demo).

Why not mark a RegExp? https://regex101.com/r/iH2eA0/1
How do you solve this internally?
What other mapping solutions did you implement in Tablesorter?

Maybe it's worth a thought fetching the matched keywords from Tablesorter, instead of implementing the matching algorithm again.

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Jul 21, 2016

Ok, I think I came up with a way to check the regular expression. What do you think?

checkRegex : function( regex ) {
    if ( regex instanceof RegExp ) {
        var result = '\u0001\u0002\u0003\u0004\u0005'.match( regex );
        return result === null || result.length < 5;
    }
    return false;
}

Here is an updated demo with presets for testing.


Update: to answer your questions...

  • The "Notes" section of this page in the docs shows all the current filter types.
  • When using an external search on all columns, some limitations are applied (i.e. no operator or range queries) - I guess my current demo doesn't account for these within each column.... yay more work!
  • Internally, I use small functions that return a boolean if there is or isn't a match, or null if the filter doesn't apply. It would make it slower and more work to save the match within each function.
  • The fuzzy match doesn't use regex. Instead it uses a modified version of this code.
@julmot

This comment has been minimized.

Copy link
Owner

julmot commented Jul 21, 2016

@Mottie Congrats! Works for me. The only thing that seems to be unnecessary, are the option checkboxes.

I'm looking forward to see this released. Would be happy to provide a link to the plugin on the website.

@LaRemond

This comment has been minimized.

Copy link

LaRemond commented Jul 21, 2016

Hello @julmot & @Mottie

Something odd I noticed in @Mottie 's updated demo : if you type "ht", "tt" or in the external search field all lines desapper. TD contents such as "http://..." are not selected.

Doesn't seem to be normal... or is it ?

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Jul 21, 2016

@LaRemond Actually, the filter is using parsed data, so in the case of the "Sites" column, the http:// is stripped out. So searching for it won't find it... I need to make a minor tweak to the parser to fix that.


Update: Fixed in Mottie/tablesorter@bd25372

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Aug 1, 2016

Finally done! Check out the tablesorter mark widget demo.

@stu-ng

This comment has been minimized.

Copy link

stu-ng commented Sep 6, 2016

Hi all,

Google takes me here :). I saw @Mottie mentioned that the /.*/ does not work for markRegExp. I experienced the same issue too. Is that related to this issue #55 or a different one?

From @Mottie demo above, .* works perfectly ( highlight everything), but still /.*/ will kill the app :(. I would appreciate if there is any workarounds.

Thank you!!!

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Sep 6, 2016

Hi @stu-nguyen!

I added this regex checker to my code. If it returns true, then it is okay to use the .markRegExp() function:

function checkRegex( regex ) {
  if ( regex instanceof RegExp ) {
    // prevent lock up of mark.js
    // (see https://github.com/julmot/mark.js/issues/55)
    var result = '\u0001\u0002\u0003\u0004\u0005'.match( regex );
    return result === null || result.length < 5;
  }
  return false;
}
@stu-ng

This comment has been minimized.

Copy link

stu-ng commented Sep 6, 2016

Thanks so much for the quick response @Mottie,

The regex checker worked for me, but is there any to highlight things when you run into this regex /.*/ ?

@Mottie

This comment has been minimized.

Copy link
Collaborator

Mottie commented Sep 6, 2016

Use /\S/ or /.+/.

@stu-ng

This comment has been minimized.

Copy link

stu-ng commented Sep 6, 2016

@Mottie Perfect!!! I did read your notes but I didn't understand that you mean to use /.+/ instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment