Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement wildcards option #114

Merged
merged 15 commits into from
Mar 12, 2017
Merged

Implement wildcards option #114

merged 15 commits into from
Mar 12, 2017

Conversation

Mottie
Copy link
Collaborator

@Mottie Mottie commented Feb 3, 2017

This PR adds the useWildcard enhancement discussed in #112.

Copy link
Owner

@julkue julkue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote to name this option wildcards instead of useWildcards, as we have option names like diacritics too, not useDiacritics.

I'd also vote to ignore escaped wildcards, e.g. ignore them in the following keyword: lor\?m, even if wildcards is true.

Do you agree?

@julkue julkue changed the title Add useWildcards option Implement wildcards option Feb 4, 2017
@Mottie
Copy link
Collaborator Author

Mottie commented Feb 4, 2017

name this option wildcards instead of useWildcards

Sure, no problem!

ignore escaped wildcards, e.g. ignore them in the following keyword: lor\?m, even if wildcards is true.

There is a problem with that...

  • If you pass a keyword of lor\?m, the escape is processed and the resulting string becomes lor?m which is indistinguishable from an unescaped lor?m.
  • If we double escape the keyword (lor\\?m), we would then have to remove the extra backslash during keyword processing, or the search would be performed against lor\?m and not lor?m. I'm not sure if removing the backslash would be a good idea, but it shouldn't harm anything as long as we are only doing it for that specific pattern (\\? or \\*).
  • Alternatively, maybe we could "escape" it with a forward slash?... this isn't ideal and may be confusing to both the user and developer - lor/?m.

What do you think?

@julkue
Copy link
Owner

julkue commented Feb 6, 2017

Alternatively, maybe we could "escape" it with a forward slash?

I'm against this option, since it's not common.

If we double escape the keyword (lor\?m), we would then have to remove the extra backslash during keyword processing, or the search would be performed against lor?m and not lor?m. I'm not sure if removing the backslash would be a good idea, but it shouldn't harm anything as long as we are only doing it for that specific pattern (\? or \*).

If you look at the escapeStr function you'll notice that it does exactly that; adds two backslashes for sensitive characters like *. If the user passes a keyword like lor\\*m it'll be lor\*m in the plugin and thus be equivalent to the result of escapeStr. The point is, if you pass this to this function it'll still add another \, which is of course unnecessary. But from my point of view, this is all that needs to be done to allow passing escaped wildcards even if the wildcards option is true.

@Mottie
Copy link
Collaborator Author

Mottie commented Feb 6, 2017

Hmm, well the setupWildcardsRegExp function is called before the escapeStr function to catch the wildcards before they are escaped... I'll look into this more later today.

Mottie and others added 6 commits February 10, 2017 06:04
And normalize case insensitive diacritics array
One of the issues specified in #108

Squashed commit of the following:

commit 867dabe
Author: Julian Motz <me@julianmotz.com>
Date:   Mon Feb 6 10:48:58 2017 +0100

    Update patch level

commit e46fff2
Merge: 0644a91 5b2dadb
Author: Julian Motz <me@julianmotz.com>
Date:   Mon Feb 6 10:46:23 2017 +0100

    Merge branch 'master' into issue-108

    Resolved merge conflict

commit 0644a91
Author: Rob Garrison <wowmotty@gmail.com>
Date:   Mon Jan 30 15:22:56 2017 -0600

    Prevent adding empty string synonyms. See #108
@Mottie
Copy link
Collaborator Author

Mottie commented Feb 10, 2017

In the last update, the wildcards option now has three settings:

  • "disable" (default)
  • "enable" - wildcards only target non-whitespace characters.
  • "includeWhitespace" - wildcards target all characters

So to get the demo in #119 working, you have two choices (see https://jsfiddle.net/Mottie/6kqr1d88/4/):

  • Set wildcards to "enable" and use the keyword `"Hello? World" ("?" matches the comma or any punctuation, but not the whitespace).
  • Or, set wildcards to "includeWhitespace" and use the keyword "Hello*World".

But, setting the option to include whitespace does have some consequences! if for example you use the keyword lorem*ipsum, it will also mark lorem dolor ipsum; but it isn't greedy, so it won't include the second ipsum in lorem dolor ipsum ipsum. See this demo: https://jsfiddle.net/Mottie/6kqr1d88/3/

Copy link
Owner

@julkue julkue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. However, why is it necessary to call setupWildcardsRegexp() in the synonyms method, when the this.createWildcardsRegexp() method is called after this.createSynonymsRegexp()?

src/mark.js Outdated
@@ -776,6 +828,20 @@ class Mark { // eslint-disable-line no-unused-vars
* <li><i>limiters</i>: A custom array of string limiters for accuracy
* "exactly" or "complementary"</li>
* </ul>
* @property {"disable"|"enable"|"includeSpaces"}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please outsource the JSDOC as a @typedef like here. This way it's possible to reference it in other functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, np!

"done": function () {
new Mark($ctx2[0]).mark("Lor*m", {
"synonyms": {
"Lor*m": "Ips*m"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it necessary to call setupWildcardsRegexp() in the synonyms method

This allows you to include wildcards in the synonyms. Does that sound reasonable?

@Mottie
Copy link
Collaborator Author

Mottie commented Feb 12, 2017

First off, I incorrectly named an option in my last comment... it's not "includeWhitespace", it should be "includeSpaces".... maybe I should shorten this option to "withSpaces"? Or internally only look for "space" in case the wrong term is used?

I'm also starting to think we might need to change the "?" wildcard to match zero or one characters instead of only one character. That way you could use "World?s" to mark World's and Worlds as discussed in https://github.com/julmot/mark.js/issues/119#issuecomment-279233505.

@julkue
Copy link
Owner

julkue commented Feb 12, 2017

  1. I'm for "withSpaces"
  2. I'd vote for naming it "enabled" and "disabled" instead of "enable" and "disable". What do you think?
  3. I already thought that ? is open for either one or no match. If it isn't currently, I'm absolutely for it.

"enable" -> "enabled"
"disable" -> "disabled"
"includeSpaces" -> "withSpaces"
Previously set to match only one
@julkue
Copy link
Owner

julkue commented Mar 11, 2017

shit mate, I'm very sorry for the long delay. I'm doing some tests in the next hours and will come back with some review.

Beautified code and outsourced the accuracy property into a @typedef definition to make it consistent with the wildcards property
They must be included after diacritics and synonyms tests, since they depend on them
Copy link
Owner

@julkue julkue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mottie Thanks for the great work. It looks very good overall. The only thing I'm missing is a test for the disabled value. Then I think we're safe to publish this as a new minor release! :bowtie:

@Mottie
Copy link
Collaborator Author

Mottie commented Mar 12, 2017

The disabled value was tested in the wildcards.js file (line 31); is that good enough, or should I add more?

HTML

<div>
    <p>
        Lorenzom ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
        tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
        Lorem ipsum dolor sit amet, consetetur sadipscing elitr. Loriem ipsum
        dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor
        invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
        Lorm ipsum Lorum ipsum, et Loram ipsum. Lor?m ipsum. Lor*m ipsum.
    </p>
</div>

Script

new Mark($ctx4[0]).mark(["lor?m", "Lor*m"], {
    "separateWordSearch": false,
    "wildcards": "disabled",
    "done": done
});
// ...
it("should find wildcards as plain characters when disabled", function () {
    expect($ctx4.find("mark")).toHaveLength(2);
});

@julkue julkue merged commit 0b3fd5b into master Mar 12, 2017
@julkue julkue deleted the wildcards branch March 12, 2017 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants