Skip to content
The textsampler R-Package samples texts from a predefined text source.
R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
data
man
tests
.Rbuildignore
.gitignore
DESCRIPTION
LICENSE
LICENSE.md
NAMESPACE
README.Rmd
README.md
textsampler.Rproj

README.md

Text Sampling

Author: Nicolas Pröllochs
License: MIT

The textsampler R-Package samples texts from a predefined text source. This implementation uses tidy data principles and works seamlessly with existing text mining packages such as tm, tidytext, and rvest. In addition, it supplies multiple built-in text datasets for a hassle-free sampling of words, sentences, and texts.

Installation

You can easily install the latest development version of textsampler via GitHub.

# Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("nproellochs/textsampler")

Usage

This section shows the basic functionality of how to sample text from a predefined text source. First, load the corresponding package textsampler.

library(textsampler)

Quick demonstration

The following example shows how to sample sentences from a built-in database of texts. The result is a data frame containing five random sentences.

# Sample five sentences
sample_text(n = 5, type = "sentences")
#> # A tibble: 5 x 3
#>      Id Text                                                         Length
#>   <int> <chr>                                                         <int>
#> 1   897 the pizza selections are good.                                    5
#> 2   264 good service, very clean, and inexpensive, to boot!               8
#> 3   368 would come back again if i had a sushi craving while in veg~     13
#> 4   569 an hour... seriously?                                             3
#> 5   904 and the drinks are weak, people!                                  6

Example: Sampling text from built-in text source

The following example shows how to sample words from a built-in text source (“english_words”). The result is a data frame containing five random words.

# Sample five words from english_words
sample_text(n = 5, type = "words", source = "english_words")
#> # A tibble: 5 x 3
#>      Id Text        Length
#>   <int> <chr>        <int>
#> 1  9440 cuisin           1
#> 2 42046 trojan           1
#> 3 44211 upper            1
#> 4 30925 prediagnost      1
#> 5 29442 peter            1

Example: Sampling text from website

The textsampler R-package works with tidy tools and can easily be combined with existing packages such as the rvest R-package. The following example shows how to sample texts from a website. Specifically, the example samples 15 famous quotes by Julius Ceasar.

library(rvest)
read_html("https://www.brainyquote.com/authors/julius-caesar-quotes/") %>%
  html_nodes(xpath = ".//a[contains(@class, 'b-qt qt_')]") %>%
  html_text() %>% 
  enframe() %>% 
  sample_text(n = 15, source = ., input = "value", min_length = 1, max_length = 40,
              shuffle = F, clean = T)
#> # A tibble: 15 x 3
#>       Id Text                                                        Length
#>    <int> <chr>                                                        <int>
#>  1     1 experience is the teacher of all things.                         7
#>  2     2 it is easier to find men who will volunteer to die, than t~     23
#>  3     3 it was the wont of the immortal gods sometimes to grant pr~     38
#>  4     4 cowards die many times before their actual deaths.               8
#>  5     5 if you must break the law, do it to seize power: in all ot~     17
#>  6     7 i came, i saw, i conquered.                                      6
#>  7     8 it is not these well-fed long-haired men that i fear, but ~     19
#>  8     9 i have lived long enough both in years and in accomplishme~     11
#>  9    10 i had rather be first in a village than second at rome.         12
#> 10    11 i love the name of honor, more than i fear death.               11
#> 11    12 no one is so brave that he is not disturbed by something u~     13
#> 12    13 men willingly believe what they wish.                            6
#> 13    14 i have lived long enough to satisfy both nature and glory.      11
#> 14    15 i have always reckoned the dignity of the republic of firs~     16
#> 15    16 as a rule, men worry more about what they can't see than a~     16

Example: Sampling text from vector source

The textsamplr R-package can be used to sample text from a vector source. The following example samples five random sentences from a book downloaded by the gutenbergr R-Package.

library(gutenbergr)
full_text <- gutenberg_download(5314)

textsampler::sample_text(n = 5, source = full_text$text[1:1000], type = "sentences", shuffle = T)
#> # A tibble: 5 x 3
#>      Id Text                                                         Length
#>   <int> <chr>                                                         <int>
#> 1    90 59 frederick and catherine (der frieder und das catherliesc~      9
#> 2   281 "thou wilt have, dear frog,\" said she--\"my clothes, my pe~     13
#> 3   245 legend 4 poverty and humility lead to heaven (armut und dem~     14
#> 4   736 "\"one of this kind has never come my way before.\""             10
#> 5   453 they set                                                          2

Example: Sampling text data with specific text characteristics

The textsamplr R-package allows one to sample texts with specific text characteristics. The following example samples three sentences from Amazon reviews, all of which have a maximum length of 5 words and contain the word ‘great’.

sample_text(n = 5, source = "amazon_sentences", type = "sentences", 
            max_length = 5, word_list = c("great"))
#> # A tibble: 5 x 3
#>      Id Text                          Length
#>   <int> <chr>                          <int>
#> 1   557 great product for the price!.      5
#> 2   291 great phone.                       2
#> 3   474 great software for motorolas.      4
#> 4   793 great phone.                       2
#> 5   234 great sound and service.           4

Contributing

If you experience any difficulties with the package, or have suggestions, or want to contribute directly, you have the following options:

License

textsampler is released under the MIT License

Copyright (c) 2019 Nicolas Pröllochs

You can’t perform that action at this time.