Scripts for processing and mining (classic) literature and other text data, such as screenplays
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
text_mining_dracula.R
text_mining_the_room.R

README.md

text-mining-literature

Scripts for processing and mining (classic) literature and PDF files

Description: text_mining_dracula

The script covers

  • downloading and processing public domain works in the Project Gutenberg collection with gutenbergr
  • transforming works into a tidy format
  • mining works by
    • calculating and plotting word frequencies
    • plotting word and comparison clouds
    • conducting sentiment analyses (nrc)

using the example of Bram Stoker's Dracula.

Description: text_mining_the_room

Corresponding blog post: https://lhehnke.github.io/notes/2018/01/25/text_mining_the_room

The script covers

  • downloading, importing and processing PDF files in R
  • transforming PDF files into a tidy format
  • mining PDF files by
    • calculating and plotting word frequencies
    • conducting sentiment analyses (nrc; bing)
    • plotting word and comparison clouds
    • visualizing the most frequent positive and negative words (bing sentiments)

using the script of The Room a.k.a. the worst film ever made (directed, produced, written by and starring Tommy Wiseau).

Source: https://theroomscriptblog.files.wordpress.com/2016/04/the-room-original-script-by-tommy-wiseau.pdf

Example plot: