You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
True, somehow, what = "" is not being passed through to tokens() from dfm().
Simpler examples:
> tokens("this is a test", what="character")
tokensfrom1document.text1:
[1] "t""h""i""s""i""s""a""t""e""s""t"> dfm("this is a test", what="character")
Document-featurematrixof:1document, 4 features (0% sparse).1x4sparseMatrixofclass"dfm"featuresdocsthisisatesttext11111> tokens("This is a test. Second sentence", what="sentence")
tokensfrom1document.text1:
[1] "This is a test.""Second sentence"> dfm("This is a test. Second sentence", what="sentence")
Document-featurematrixof:1document, 7 features (0% sparse).1x7sparseMatrixofclass"dfm"featuresdocsthisisatest.secondsentencetext11111111
kbenoit
changed the title
bug for dfm and character tokenization
what = not passed through to tokens() by dfm()
Dec 7, 2017
Hi,
the following code should produce a dfm with characters as features, but does contain words as tokens:
It works when using
tokens()
in a prestep:The text was updated successfully, but these errors were encountered: