Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error message from unnest: Atomic vectors #73

Closed
TheOne000 opened this issue Jul 21, 2017 · 3 comments
Closed

Error message from unnest: Atomic vectors #73

TheOne000 opened this issue Jul 21, 2017 · 3 comments

Comments

@TheOne000
Copy link

TheOne000 commented Jul 21, 2017

Dear Contributors
First of all, your book is very good. However, I am new to R and am learning from my mistakes.
Here come the question! Suppose I have a csv file with one column and 115 rows. Each row represents a document. I ran the following code, and it returns "unnest_tokens expects all columns of input to be atomic vectors (not lists)".
I guess that this issue arises when I import the file.csv. Please give me some guideline.
Thank you in advance

   library(tidytext)
   library(tm)
   library(dplyr)
   library(stats)
   library(base)
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
  #Build a corpus: a collection of statements
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
    f <-Corpus(DirSource("C:/Users/Boon/Desktop/Dissertation/F"))
   doc_dir <- "C:/Users/Boon/Desktop/Dis/F/f.csv"
   doc <- read.csv(file_loc, header = TRUE)
   docs<- Corpus(DataframeSource(doc))
   dtm <- DocumentTermMatrix(docs)
   text_df<-data_frame(line=1:115,docs=docs)

 #This is the output from the code above,which is fine!: 
 # text_df
 # A tibble: 115 x 2
 #line          docs
 #<int> <S3: VCorpus>
 # 1      1 <S3: VCorpus>
 #2      2 <S3: VCorpus>
 #3      3 <S3: VCorpus>
 #4      4 <S3: VCorpus>
 #5      5 <S3: VCorpus>
 #6      6 <S3: VCorpus>
 #7      7 <S3: VCorpus>
 #8      8 <S3: VCorpus>
 #9      9 <S3: VCorpus>
 #10    10 <S3: VCorpus>
 # ... with 105 more rows
  unnest_tokens(text_df, word, docs)

 unnest_tokens_(tbl, output_col, input_col, token = token, to_lower = to_lower, : unnest_tokens expects 
 all columns of input to be atomic vectors (not lists)
@juliasilge
Copy link
Owner

I hope that you saw my answer over on Stack Overflow, but just to make sure, I'll put it here too!

If you want to convert your text data to a tidy format, you do not need to transform it to a corpus or a document term matrix or anything first. That is one of the main ideas behind using a tidy data format for text; you don't use those other formats, unless you need to for modeling.

You just put the raw text into a data frame, then use unnest_tokens() to tidy it. (I am making some assumptions here about what your CSV looks like; it would be more helpful to post a reproducible example next time.)

library(dplyr)

docs <- data_frame(line = 1:4,
                   document = c("This is an excellent document.",
                                "Wow, what a great set of words!",
                                "Once upon a time...",
                                "Happy birthday!"))

docs
#> # A tibble: 4 x 2
#>    line                        document
#>   <int>                           <chr>
#> 1     1  This is an excellent document.
#> 2     2 Wow, what a great set of words!
#> 3     3             Once upon a time...
#> 4     4                 Happy birthday!

library(tidytext)

docs %>%
  unnest_tokens(word, document)
#> # A tibble: 18 x 2
#>     line      word
#>    <int>     <chr>
#>  1     1      this
#>  2     1        is
#>  3     1        an
#>  4     1 excellent
#>  5     1  document
#>  6     2       wow
#>  7     2      what
#>  8     2         a
#>  9     2     great
#> 10     2       set
#> 11     2        of
#> 12     2     words
#> 13     3      once
#> 14     3      upon
#> 15     3         a
#> 16     3      time
#> 17     4     happy
#> 18     4  birthday

Good luck!

@TheOne000
Copy link
Author

TheOne000 commented Aug 3, 2017 via email

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants