-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add caseSensitive option #65
Conversation
"yŸÿýÝ", | ||
"zŽžżŻźŹ" | ||
]; | ||
"aàáâãäåāą", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing the array this way – with a separate line for each property – is clean if it's not so large.
In this case please try to fill out the line length of 80chars for the const dct
array in order to reduce the function height
Hi @Mottie, Thanks for the re-submission! First, apologize that I haven't had the time to give your PR a feedback earlier. I'd like to merge the case sensitive option. From my point of view you've also added the case sensitive option for the synonyms option, but haven't added a test for this. Am I right? If so, it'd be great to make sure this works too. Now to the diacritics issue. I understand that this might be a problem in non-unicode countries. Setting the Thanks again. Looking forward to your reply. |
Hi @julmot! Thanks for the great feedback! I'll start working on the changes. As for the encoding, the files in the |
Thanks for reworking. Every point is fine now! 👍 I just wanted to merge this but found out that there's one behavior that doesn't seem to be as expected.
I'm expecting that the uppercase diacritics are also covered in the regular expression when |
Found another issue. When searching with the following configuration: var instance = new Mark(document.getElementById("context"));
instance.mark("lorem", {
debug: true,
diacritics: true,
caseSensitive: false,
synonyms: {
"lorem": "Lorem"
},
each: function(node){
console.log(node);
}
}); The RegExp will look like:
|
The "simplest" solution that I came up with was to make the query lower case if (sens) {
// Prevent insensitive searches from including upper case
// diacritics in the regular expression
ch = ch.toLowerCase();
} I'm not sure what you'll think of that addition. I did add a test with the code you shared above. |
Hmm, I forgot to use
|
Duh, it's because the second part is added as a synonym. I also thought about doing something like this: createDiacriticsRegExp(str) {
let dct = [
"aàáâãäåāą", "cçćč", "dđď", "eèéêëěēę", "iìíîïī", "lł", "nñňń",
"oòóôõöøō", "rř", "sšś", "tť", "uùúûüůū", "yÿý", "zžżź"
],
sens = "i";
if (this.opt.caseSensitive) {
sens = "";
dct = dct.concat(dct.slice().join(",").toUpperCase().split(","));
}
let handled = [];
str.split("").forEach(ch => {
dct.every(dct => {
// Check if the character is inside a diacritics list
if(dct.indexOf(ch) !== -1) {
// Check if the related diacritics list was not
// handled yet
if(handled.indexOf(dct) > -1) {
return false;
}
// Make sure that the character OR any other
// character in the diacritics list will be matched
str = str.replace(
new RegExp(`[${dct}]`, `gm${sens}`), `[${dct}]`
);
handled.push(dct);
}
return true;
});
});
return str;
} The diacritic array only contains lower case characters. It converts and concatinates a clone of the array in upper case. What do you think of this approach? I always second guess myself... should I use |
@Mottie Just for our understanding, the second group is indeed the synonym. If the value is the same word as the key, then of course the groups will be the same. But that wasn't what I meant above, I was talking about the I can't give you a solution for the issue on the fly and unfortunately don't have time for it today. Will check this out directly after fixing an open bug. Thanks for your patience! |
Oh, I shared two different solutions above. No worries, take your time 😁 |
Now, the bugs have been fixed and I'll have the time to come back to your PR. Once again, thank you for your patience! I've made a few tests with the current implementation and here are my results:
In summary it seems like only the first point is left open. As mentioned, the regular expression in this case seems to be wrong, but it works. I don't understand why, because the
I'm uncertain if this would be possible for all kind of diacritics. There may be diacritics only available in upper case characters. To play it safe I'd include also upper case diacritics. Don't you think so? |
} | ||
} | ||
return str; | ||
} | ||
|
||
createDiacriticsRegExp(str) { | ||
const dct = ["aÀÁÂÃÄÅàáâãäåĀāąĄ", "cÇçćĆčČ", "dđĐďĎ", "eÈÉÊËèéêëěĚĒēęĘ", "iÌÍÎÏìíîïĪī", "lłŁ", "nÑñňŇńŃ", "oÒÓÔÕÕÖØòóôõöøŌō", "rřŘ", "sŠšśŚ", "tťŤ", "uÙÚÛÜùúûüůŮŪū", "yŸÿýÝ", "zŽžżŻźŹ"]; | ||
const sens = this.opt.caseSensitive ? "" : "i", | ||
dct = this.opt.caseSensitive ? ["aàáâãäåāą", "AÀÁÂÃÄÅĀĄ", "cçćč", "CÇĆČ", "dđď", "DĐĎ", "eèéêëěēę", "EÈÉÊËĚĒĘ", "iìíîïī", "IÌÍÎÏĪ", "lł", "LŁ", "nñňń", "NÑŇŃ", "oòóôõöøō", "OÒÓÔÕÖØŌ", "rř", "RŘ", "sšś", "SŠŚ", "tť", "TŤ", "uùúûüůū", "UÙÚÛÜŮŪ", "yÿý", "YŸÝ", "zžżź", "ZŽŻŹ"] : ["aÀÁÂÃÄÅàáâãäåĀāąĄ", "cÇçćĆčČ", "dđĐďĎ", "eÈÉÊËèéêëěĚĒēęĘ", "iÌÍÎÏìíîïĪī", "lłŁ", "nÑñňŇńŃ", "oÒÓÔÕÖØòóôõöøŌō", "rřŘ", "sŠšśŚ", "tťŤ", "uÙÚÛÜùúûüůŮŪū", "yŸÿýÝ", "zŽžżŻźŹ"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect a regular expression that contains the upper and lower case diacritics
This should take care of the regex.
Thanks @Mottie, solved very clever! 👏 I'm going to merge this now.
|
I messed up merging to master instead of the 8.0.0 branch. This PR replaces #59.