Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question / feature request #43

Closed
redaktor opened this issue May 31, 2015 · 3 comments
Closed

Question / feature request #43

redaktor opened this issue May 31, 2015 · 3 comments

Comments

@redaktor
Copy link
Contributor

for normalizing the input:

How about all normalizing all typographic stuff like curly and special quotes
to well the normalized ones ?

maybe useful for e.g.
O’Reilly to O'Reilly etc.

see http://practicaltypography.com/straight-and-curly-quotes.html

and note to me :
Maybe it would be useful to write a "preprocess" test, testing if everything in .js and .min.js ("expanded") is the same.

@spencermountain
Copy link
Owner

nice idea. will add to tokenize.js

you wanna do it?

@redaktor
Copy link
Contributor Author

redaktor commented Jun 5, 2015

note: I did it, I'll submit it with the fork but it raised two questions about
methods/tokenization/tokenize.js

This is what it looks now (with complete curly quotes replacement):

function normalise (str) {
    if (!str) { return "" }
    str = str.toLowerCase();
    str = str.replace(/[,\.!:;\?\(\)]/, '');
    // single curly quotes
    str = str.replace(/[\u2018\u2019\u201A\u201B\u2032\u2035]+/g, "'");
    // double curly quotes
    str = str.replace(/[\u201C\u201D\u201E\u201F\u2033\u2036]+/g, '"');
    if (!str.match(/[a-z0-9]/i)) { return '' }
    return str
}

• Is the str trimmed here already ?
and
• if the str is not normalized in the end

if (!str.match(/[a-z0-9]/i)) {

we could normalize it further by your ../transliteration/unicode_normalisation.js
before

return ''

spencermountain added a commit that referenced this issue Jun 5, 2015
@spencermountain
Copy link
Owner

the fix looks great, i've added a test too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant