Skip to content

simecek/capek

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

capek

An R Data Package with Karel Čapek's Novels

This package provides access to the full texts of six Czech novels of Karel Čapek, Czech writer best known for his play R.U.R. which introduced the word robot. It is more than just inspired by Julia Silge's janeaustenr package. The package is intended to provide non-english corpus for an experimenting with tidy text analysis.

The plain text for each novel has been downloaded from Municipal Library of Prague:

  • tovarna_na_absolutno: Továrna na absolutno (The Absolute at Large), published in 1922
  • krakatit: Krakatit, published in 1922
  • hordubal: Hordubal, published in 1933
  • povetron: Povětroň (Meteor), published in 1934
  • obycejny_zivot: Obyčejný život (An Ordinary Life), published in 1934
  • valka_s_mloky: Válka s mloky (War with the Newts), published in 1936

There is also a function capek_books() that returns a tidy data frame of all 6 novels.

Installation

To install the package from Github, use the following:

library(devtools)
install_github("simecek/capek")
library(capek)

Usage

library(capek)
library(dplyr)

capek_books() %>%
     group_by(book) %>%
     summarise(total_lines = n())

About

Šest románů Karla Čapka připravených pro analýzu v R/tidytext

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages