Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tokens_sample() #1478

Closed
koheiw opened this issue Oct 29, 2018 · 1 comment
Closed

Add tokens_sample() #1478

koheiw opened this issue Oct 29, 2018 · 1 comment
Assignees

Comments

@koheiw
Copy link
Collaborator

koheiw commented Oct 29, 2018

We have dfm_sample() and corpus_sample(), but why not tokens_sample()?
I am tied of doing

toks <- tokens(data_corpus_big)
toks[sample(seq_along(toks), 10)]

just to check how tokenization is going.

@kbenoit
Copy link
Collaborator

kbenoit commented Oct 29, 2018

Good idea. Let's add a

tokens_sample(x, size = ndoc(x), replace = FALSE, prob = NULL, by = NULL, ...)

that works just as corpus_sample() does to sample from documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants