Add tokens_sample() #1478

koheiw · 2018-10-29T11:59:01Z

We have dfm_sample() and corpus_sample(), but why not tokens_sample()?
I am tied of doing

toks <- tokens(data_corpus_big)
toks[sample(seq_along(toks), 10)]

just to check how tokenization is going.

The text was updated successfully, but these errors were encountered:

kbenoit · 2018-10-29T12:56:34Z

Good idea. Let's add a

tokens_sample(x, size = ndoc(x), replace = FALSE, prob = NULL, by = NULL, ...)

that works just as corpus_sample() does to sample from documents.

koheiw added enhancement tokens labels Oct 29, 2018

koheiw self-assigned this Oct 29, 2018

koheiw mentioned this issue Oct 30, 2018

Add tokens_sample() with some tests #1479

Merged

koheiw closed this as completed Nov 6, 2018

Provide feedback