Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Miner returns "empty" vocabulary item instead of first stopword #6

Closed
bnvk opened this issue Sep 11, 2015 · 1 comment
Closed

Comments

@bnvk
Copy link

bnvk commented Sep 11, 2015

I've tested this a few times with different stopwords and configurations and can get repeated result.

var corpus = new TextMiner.Corpus([])

corpus.addDoc("wat cash money you go to boots and cats and dogs with me")
corpus.removeWords(TextMiner.STOPWORDS.EN)

var terms = new TextMiner.Terms(corpus)

=> [ 'wat', 'cash', 'money', '', 'boots', 'cats', 'dogs' ]

But if I move the first English stopword you two words left the "blank" word shifts two spaces left

corpus.addDoc("wat you cash money go to boots and cats and dogs with me")

=> [ 'wat', '', 'cash', 'money', 'boots', 'cats', 'dogs' ]
@Planeshifter
Copy link
Owner

After removing stop words, I added as step in which any extra whitespace is removed. So now both of your examples return the following:

=> [ 'wat', 'cash', 'money', 'boots', 'cats', 'dogs' ]

I pushed a new version of the package to npm. Let me know if you have further comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants