text_mining

This repo contains data from Ted Kwartler's "Text Mining in Practice With R" book.

Code Changes

In December 2017, the tm package was changed. Specifically, readTabular was removed. For more specifics click here

An example on page 43 of the book no longer works as written but the code below corrects the issue.

If using DataframeSource the first column MUST be named doc_id followed by a text column. Any other columns are considered metadata associated row-wise.

This makes it easier instead of manually declaring metadata through a readerControl.

Page 43 Example

#DEPRECATED: 
#tweets<-data.frame(ID=seq(1:nrow(text.df)),text=text.df$text)
tweets<-data.frame(doc_id=seq(1:nrow(text.df)),text=text.df$text)

#DEPRECATED: 
#meta.data.reader <- readTabular(mapping=list(content="text", id="ID"))
#corpus <- VCorpus(DataframeSource(tweets), readerControl=list(reader=meta.data.reader))

corpus <- VCorpus(DataframeSource(tweets))
corpus<-clean.corpus(corpus)
corpus[[103]][1]
corpus[[103]][2]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
revisions		revisions
1-email.docx		1-email.docx
1yr_plus_final4.csv		1yr_plus_final4.csv
2k_movie_reviews.csv		2k_movie_reviews.csv
Airbnb-boston_only.zip		Airbnb-boston_only.zip
C8_final_txts.zip		C8_final_txts.zip
Guardian_articles_11_14_2015_12_1_2015.csv		Guardian_articles_11_14_2015_12_1_2015.csv
IMG_3234.JPG		IMG_3234.JPG
README.md		README.md
Wizard_Of_Oz.txt		Wizard_Of_Oz.txt
all_3k_headlines.csv		all_3k_headlines.csv
amzn_cs.csv		amzn_cs.csv
bos_airbnb_1k.csv		bos_airbnb_1k.csv
chardonnay.csv		chardonnay.csv
diabetes_subset_8500.csv		diabetes_subset_8500.csv
hillary-clinton-emails-august-31-release.txt		hillary-clinton-emails-august-31-release.txt
hillary-clinton-emails-release-2015-09-11-01-39-01.zip		hillary-clinton-emails-release-2015-09-11-01-39-01.zip
oct_delta.csv		oct_delta.csv
one_two_star_reviews.xlsx		one_two_star_reviews.xlsx
pdftotext.exe		pdftotext.exe
turkish_ankara_1.rds		turkish_ankara_1.rds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text_mining

Code Changes

Page 43 Example

About

Releases

Packages

Languages

kwartler/text_mining

Folders and files

Latest commit

History

Repository files navigation

text_mining

Code Changes

Page 43 Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages