Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split repexp to strict and console.log #1

Open
pmario opened this issue Apr 6, 2011 · 1 comment
Open

split repexp to strict and console.log #1

pmario opened this issue Apr 6, 2011 · 1 comment

Comments

@pmario
Copy link

pmario commented Apr 6, 2011

Hi,
This is a really cool library.

But the split regexp in trie.js

words = words.split(/[^a-zA-Z]+/);

IMO is a little bit to strict for eg: german language, where we have öäü ..
It would be nice to have the possibility to overwrite your defaults with a config object, that contains a regexp eg: /[^a-zA-ZöäüÖÄÜ]+/

And there is a console.log which you forgot to comment.

kind regards

@mckoss
Copy link
Owner

mckoss commented Apr 20, 2011

Thanks. I just noticed your comment. I assume these are Unicode symbols then. Strictly speaking, I can relax words further as long as they are not characters that I've reserved for punctuation in the PackedTrie string format (currently, ':', '!', and ';').

So, I could split on the traditional \s (space characters) plus my reserved characters. That would then allow contractions into the dictionary as well (like can't).

words = words.split(/[\s:!;]+/);

I'll put the console.log inside an if DEBUG statement...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants