Error message from unnest: Atomic vectors #73

TheOne000 · 2017-07-21T01:12:30Z

Dear Contributors
First of all, your book is very good. However, I am new to R and am learning from my mistakes.
Here come the question! Suppose I have a csv file with one column and 115 rows. Each row represents a document. I ran the following code, and it returns "unnest_tokens expects all columns of input to be atomic vectors (not lists)".
I guess that this issue arises when I import the file.csv. Please give me some guideline.
Thank you in advance

   library(tidytext)
   library(tm)
   library(dplyr)
   library(stats)
   library(base)
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
  #Build a corpus: a collection of statements
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
    f <-Corpus(DirSource("C:/Users/Boon/Desktop/Dissertation/F"))
   doc_dir <- "C:/Users/Boon/Desktop/Dis/F/f.csv"
   doc <- read.csv(file_loc, header = TRUE)
   docs<- Corpus(DataframeSource(doc))
   dtm <- DocumentTermMatrix(docs)
   text_df<-data_frame(line=1:115,docs=docs)

 #This is the output from the code above,which is fine!: 
 # text_df
 # A tibble: 115 x 2
 #line          docs
 #<int> <S3: VCorpus>
 # 1      1 <S3: VCorpus>
 #2      2 <S3: VCorpus>
 #3      3 <S3: VCorpus>
 #4      4 <S3: VCorpus>
 #5      5 <S3: VCorpus>
 #6      6 <S3: VCorpus>
 #7      7 <S3: VCorpus>
 #8      8 <S3: VCorpus>
 #9      9 <S3: VCorpus>
 #10    10 <S3: VCorpus>
 # ... with 105 more rows
  unnest_tokens(text_df, word, docs)

 unnest_tokens_(tbl, output_col, input_col, token = token, to_lower = to_lower, : unnest_tokens expects 
 all columns of input to be atomic vectors (not lists)

The text was updated successfully, but these errors were encountered:

juliasilge · 2017-08-01T22:29:30Z

I hope that you saw my answer over on Stack Overflow, but just to make sure, I'll put it here too!

If you want to convert your text data to a tidy format, you do not need to transform it to a corpus or a document term matrix or anything first. That is one of the main ideas behind using a tidy data format for text; you don't use those other formats, unless you need to for modeling.

You just put the raw text into a data frame, then use unnest_tokens() to tidy it. (I am making some assumptions here about what your CSV looks like; it would be more helpful to post a reproducible example next time.)

library(dplyr)

docs <- data_frame(line = 1:4,
                   document = c("This is an excellent document.",
                                "Wow, what a great set of words!",
                                "Once upon a time...",
                                "Happy birthday!"))

docs
#> # A tibble: 4 x 2
#>    line                        document
#>   <int>                           <chr>
#> 1     1  This is an excellent document.
#> 2     2 Wow, what a great set of words!
#> 3     3             Once upon a time...
#> 4     4                 Happy birthday!

library(tidytext)

docs %>%
  unnest_tokens(word, document)
#> # A tibble: 18 x 2
#>     line      word
#>    <int>     <chr>
#>  1     1      this
#>  2     1        is
#>  3     1        an
#>  4     1 excellent
#>  5     1  document
#>  6     2       wow
#>  7     2      what
#>  8     2         a
#>  9     2     great
#> 10     2       set
#> 11     2        of
#> 12     2     words
#> 13     3      once
#> 14     3      upon
#> 15     3         a
#> 16     3      time
#> 17     4     happy
#> 18     4  birthday

Good luck!

TheOne000 · 2017-08-03T21:41:21Z

Hi Julia Thank you very much for your reply. This really helps me. Regards Sapphasak 2017-08-01 23:29 GMT+01:00 Julia Silge <notifications@github.com>:

…

I hope that you saw my answer over on Stack Overflow <https://stackoverflow.com/questions/45220536/unnest-tokens-and-its-error/45244762#45244762>, but just to make sure, I'll put it here too! If you want to convert your text data to a tidy format, you do not need to transform it to a corpus or a document term matrix or anything first. That is one of the main ideas behind using a tidy data format for text; you don't use those other formats, unless you need to for modeling. You just put the raw text into a data frame, then use unnest_tokens() to tidy it. (I am making some assumptions here about what your CSV looks like; it would be more helpful to post a reproducible example <https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example> next time.) library(dplyr) docs <- data_frame(line = 1:4, document = c("This is an excellent document.", "Wow, what a great set of words!", "Once upon a time...", "Happy birthday!")) docs#> # A tibble: 4 x 2#> line document#> <int> <chr>#> 1 1 This is an excellent document.#> 2 2 Wow, what a great set of words!#> 3 3 Once upon a time...#> 4 4 Happy birthday! library(tidytext) docs %>% unnest_tokens(word, document)#> # A tibble: 18 x 2#> line word#> <int> <chr>#> 1 1 this#> 2 1 is#> 3 1 an#> 4 1 excellent#> 5 1 document#> 6 2 wow#> 7 2 what#> 8 2 a#> 9 2 great#> 10 2 set#> 11 2 of#> 12 2 words#> 13 3 once#> 14 3 upon#> 15 3 a#> 16 3 time#> 17 4 happy#> 18 4 birthday Good luck! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#73 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AY61H5MgUKGoB9MGGzwC_JcH8mtL2mUcks5sT6bMgaJpZM4Oe2G0> .

github-actions · 2022-03-25T00:09:09Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

juliasilge closed this as completed Aug 1, 2017

github-actions bot locked and limited conversation to collaborators Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error message from unnest: Atomic vectors #73

Error message from unnest: Atomic vectors #73

TheOne000 commented Jul 21, 2017 •

edited

juliasilge commented Aug 1, 2017

TheOne000 commented Aug 3, 2017 via email

github-actions bot commented Mar 25, 2022

Error message from unnest: Atomic vectors #73

Error message from unnest: Atomic vectors #73

Comments

TheOne000 commented Jul 21, 2017 • edited

juliasilge commented Aug 1, 2017

TheOne000 commented Aug 3, 2017 via email

github-actions bot commented Mar 25, 2022

TheOne000 commented Jul 21, 2017 •

edited