-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: .slashes() tokenize transform #1100
Comments
hey Emiliano, good idea. nlp(`IEEE/WIC/ACM`).match('wic').found //true note that the words are not actually split. we have an awkward, but safe interpretation of slashes, so they don't get bunged-up by other transformations. cheers |
released in |
on 14.13 I still see
showing one token with the combined words. How do I extract the separate words in 14.13? |
Hey, sorry for delay - yes, this is not possible now, but is a good idea. It would be cool (and possible) to add a .slashes().split() method. I can try to add it in an upcoming release |
I find them there, but they've been lowercased. I use the tokenizer for a sentence-casing algorithm so I need case intact. |
Would the split method recreate location info? And this slashes.split would be something I would run on individual terms? |
I'm tokenizing using
compromise/one
. Can I have'IEEE/WIC/ACM'
be recognized as 3 slash-separated words rather than one?The text was updated successfully, but these errors were encountered: