Skip to content

Commit

Permalink
Clarification on arg order
Browse files Browse the repository at this point in the history
  • Loading branch information
juliasilge committed Sep 5, 2023
1 parent cb434c8 commit add0b80
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 10 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,4 @@ Config/testthat/edition: 3
Encoding: UTF-8
LazyData: TRUE
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.2
RoxygenNote: 7.2.3
6 changes: 3 additions & 3 deletions R/unnest_tokens.R
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,13 @@
#' d
#'
#' d %>%
#' unnest_tokens(word, txt)
#' unnest_tokens(output = word, input = txt)
#'
#' d %>%
#' unnest_tokens(sentence, txt, token = "sentences")
#' unnest_tokens(output = sentence, input = txt, token = "sentences")
#'
#' d %>%
#' unnest_tokens(ngram, txt, token = "ngrams", n = 2)
#' unnest_tokens(output = ngram, input = txt, token = "ngrams", n = 2)
#'
#' d %>%
#' unnest_tokens(chapter, txt, token = "regex", pattern = "Chapter [\\\\d]")
Expand Down
1 change: 1 addition & 0 deletions man/tidytext-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/unnest_tokens.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions vignettes/tidytext.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ original_books <- austen_books() %>%
original_books
```

To work with this as a tidy dataset, we need to restructure it as **one-token-per-row** format. The `unnest_tokens` function is a way to convert a dataframe with a text column to be one-token-per-row:
To work with this as a tidy dataset, we need to restructure it as **one-token-per-row** format. The `unnest_tokens` function is a way to convert a dataframe with a text column to be one-token-per-row. Here let's tokenize to a new `word` column from the existing `text` column:

```{r}
library(tidytext)
tidy_books <- original_books %>%
unnest_tokens(word, text)
unnest_tokens(output = word, input = text)
tidy_books
```
Expand Down Expand Up @@ -188,7 +188,7 @@ is a sad sentence, not a happy one, because of negation. The [Stanford CoreNLP](

```{r}
PandP_sentences <- tibble(text = prideprejudice) %>%
unnest_tokens(sentence, text, token = "sentences")
unnest_tokens(output = sentence, input = text, token = "sentences")
```

Let's look at just one.
Expand Down

0 comments on commit add0b80

Please sign in to comment.