Question / feature request #43

redaktor · 2015-05-31T17:21:21Z

for normalizing the input:

How about all normalizing all typographic stuff like curly and special quotes
to well the normalized ones ?

maybe useful for e.g.
O’Reilly to O'Reilly etc.

see http://practicaltypography.com/straight-and-curly-quotes.html

and note to me :
Maybe it would be useful to write a "preprocess" test, testing if everything in .js and .min.js ("expanded") is the same.

spencermountain · 2015-05-31T17:57:29Z

nice idea. will add to tokenize.js

you wanna do it?

redaktor · 2015-06-05T17:04:47Z

note: I did it, I'll submit it with the fork but it raised two questions about
methods/tokenization/tokenize.js

This is what it looks now (with complete curly quotes replacement):

function normalise (str) {
    if (!str) { return "" }
    str = str.toLowerCase();
    str = str.replace(/[,\.!:;\?\(\)]/, '');
    // single curly quotes
    str = str.replace(/[\u2018\u2019\u201A\u201B\u2032\u2035]+/g, "'");
    // double curly quotes
    str = str.replace(/[\u201C\u201D\u201E\u201F\u2033\u2036]+/g, '"');
    if (!str.match(/[a-z0-9]/i)) { return '' }
    return str
}

• Is the str trimmed here already ?
and
• if the str is not normalized in the end

if (!str.match(/[a-z0-9]/i)) {

we could normalize it further by your ../transliteration/unicode_normalisation.js
before

return ''

spencermountain · 2015-06-05T17:20:17Z

the fix looks great, i've added a test too.

spencermountain added a commit that referenced this issue Jun 5, 2015

unicode quotes #43

d4feb70

spencermountain closed this as completed Jun 5, 2015

scagood mentioned this issue Feb 26, 2018

Incorrect quote string matching #458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question / feature request #43

Question / feature request #43

redaktor commented May 31, 2015

spencermountain commented May 31, 2015

redaktor commented Jun 5, 2015

spencermountain commented Jun 5, 2015

Question / feature request #43

Question / feature request #43

Comments

redaktor commented May 31, 2015

spencermountain commented May 31, 2015

redaktor commented Jun 5, 2015

spencermountain commented Jun 5, 2015