-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error message from unnest: Atomic vectors #73
Comments
I hope that you saw my answer over on Stack Overflow, but just to make sure, I'll put it here too! If you want to convert your text data to a tidy format, you do not need to transform it to a corpus or a document term matrix or anything first. That is one of the main ideas behind using a tidy data format for text; you don't use those other formats, unless you need to for modeling. You just put the raw text into a data frame, then use unnest_tokens() to tidy it. (I am making some assumptions here about what your CSV looks like; it would be more helpful to post a reproducible example next time.) library(dplyr)
docs <- data_frame(line = 1:4,
document = c("This is an excellent document.",
"Wow, what a great set of words!",
"Once upon a time...",
"Happy birthday!"))
docs
#> # A tibble: 4 x 2
#> line document
#> <int> <chr>
#> 1 1 This is an excellent document.
#> 2 2 Wow, what a great set of words!
#> 3 3 Once upon a time...
#> 4 4 Happy birthday!
library(tidytext)
docs %>%
unnest_tokens(word, document)
#> # A tibble: 18 x 2
#> line word
#> <int> <chr>
#> 1 1 this
#> 2 1 is
#> 3 1 an
#> 4 1 excellent
#> 5 1 document
#> 6 2 wow
#> 7 2 what
#> 8 2 a
#> 9 2 great
#> 10 2 set
#> 11 2 of
#> 12 2 words
#> 13 3 once
#> 14 3 upon
#> 15 3 a
#> 16 3 time
#> 17 4 happy
#> 18 4 birthday Good luck! |
Hi Julia
Thank you very much for your reply. This really helps me.
Regards
Sapphasak
2017-08-01 23:29 GMT+01:00 Julia Silge <notifications@github.com>:
… I hope that you saw my answer over on Stack Overflow
<https://stackoverflow.com/questions/45220536/unnest-tokens-and-its-error/45244762#45244762>,
but just to make sure, I'll put it here too!
If you want to convert your text data to a tidy format, you do not need to
transform it to a corpus or a document term matrix or anything first. That
is one of the main ideas behind using a tidy data format for text; you
don't use those other formats, unless you need to for modeling.
You just put the raw text into a data frame, then use unnest_tokens() to
tidy it. (I am making some assumptions here about what your CSV looks like;
it would be more helpful to post a reproducible example
<https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example>
next time.)
library(dplyr)
docs <- data_frame(line = 1:4,
document = c("This is an excellent document.",
"Wow, what a great set of words!",
"Once upon a time...",
"Happy birthday!"))
docs#> # A tibble: 4 x 2#> line document#> <int> <chr>#> 1 1 This is an excellent document.#> 2 2 Wow, what a great set of words!#> 3 3 Once upon a time...#> 4 4 Happy birthday!
library(tidytext)
docs %>%
unnest_tokens(word, document)#> # A tibble: 18 x 2#> line word#> <int> <chr>#> 1 1 this#> 2 1 is#> 3 1 an#> 4 1 excellent#> 5 1 document#> 6 2 wow#> 7 2 what#> 8 2 a#> 9 2 great#> 10 2 set#> 11 2 of#> 12 2 words#> 13 3 once#> 14 3 upon#> 15 3 a#> 16 3 time#> 17 4 happy#> 18 4 birthday
Good luck!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AY61H5MgUKGoB9MGGzwC_JcH8mtL2mUcks5sT6bMgaJpZM4Oe2G0>
.
|
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Dear Contributors
First of all, your book is very good. However, I am new to R and am learning from my mistakes.
Here come the question! Suppose I have a csv file with one column and 115 rows. Each row represents a document. I ran the following code, and it returns "unnest_tokens expects all columns of input to be atomic vectors (not lists)".
I guess that this issue arises when I import the file.csv. Please give me some guideline.
Thank you in advance
The text was updated successfully, but these errors were encountered: