Each day, please perform a git pull
to get the most up to date files and lessons.
- This class session, we will not have virtual machines set up so please forgive my mistake in the video. You will have to install R, R-studio and git locally on your laptop. Welcome to the class video.
Day | URL | Topic |
Monday | * Setup*: R, R-Studio & git * Vid2** * Vid3** |
Setup, R Basics, String Manipulation, Text Organizations |
Tuesday | Vid1 Vid2 Vid3 | Text Mining Visuals |
Wednesday | Vid1 Vid2 vid3 Vid4 Vid5 | Sentiment Analysis, Unsupervised Methods |
Thursday | Vid1 Vid2 Vid3 Vid4 | Supervised Methods |
Friday | Vid1 Vid2 Vid3 | Ethics, Data Sources, Syntactic Parsing & Lemmatization |
*video1 has been replaced with a presentation for r, r-studio & git setup.
**due to an editing error, you will have to download these instead of streaming
Day | Url |
Monday | * 1st day |
Tuesday | * 2nd day only second half :( |
Wednesday | * 3rd day |
Thursday | * 4th day |
Friday | * 5th day |
Each day's lesson is contained in the lesson folder. Each individual lesson folder will contain the following files and folders.
- slides - A copy of the presentation covered in the recording. Provided because some students print the slides and take notes.
sub folder - contains the data we will work through togetherscripts
- commented scripts to demonstrate the lesson's conceptsHW
- the daily homework will be in this folder.
- You must install R, R-Studio & Git locally on your laptop or if you have the knowledge to set it up, you can work from a server instance with all software. (www.rstudio.cloud)[www.rstudio.cloud] is another option but the free tier has significant time limitations. Part of day 1 will be devoted to ensuring people's instances work correctly.
- If you encounter any errors during set up don't worry! Please request technical help from Prof K. The
library is usually the trickiest because it requires Java andrJava
. So if you get any errors, try removing that from the code below and rerunning. This will take a long time, so if possible please run prior to class, and at a time you don't need your computer ie at night. We will work to resolve any issues prior to class or during Monday's live session.
# Easiest Method to run in your console
pacman::p_load(ggplot2, ggthemes, stringi, hunspell, qdap, spelling, tm, dendextend,
wordcloud, RColorBrewer, wordcloud2, pbapply, plotrix, ggalt, tidytext, textdata, dplyr, radarchart,
lda, LDAvis, treemap, clue, cluster, fst, skmeans, kmed, text2vec, caret, glmnet, pROC, textcat,
xml2, stringr, rvest, twitteR, jsonlite, docxtractr, readxl, udpipe, reshape2, openNLP, vtreat, e1071,
lexicon, echarts4r, lsa, yardstick, textreadr, pdftools, tesseract, mgsub, mapproj, ggwordcloud)
# Additionally we will need this package from a different repo
install.packages('openNLPmodels.en', repo= 'http://datacube.wu.ac.at/')
# You can install packages individually such as below if pacman fails.
# Or using base functions use a nested `c()`
install.packages(c("lda", "LDAvis", "treemap"))
For most students these two links have helped them install java, and then make sure R/Rstudio can find it when loading qdap
. Keep in mind, you don't have to install qdap, to earn a good grade This is primarily for the use of some functions including polarity()
Once java is installed this command from terminal often resolves the issue:
sudo R CMD javareconf
If this causes hardship, don't worry! Its only a small bit of our overall learning and I will cover an alternative in the live session.
HW | Covered in Class. | Due |
HW1 | Monday | Tuesday |
HW2 | Tuesday | Wednesday |
HW3 | Wednesday | Thursday |
HW4 | Thursday | Friday |
Case | NA | July 1 |
- Read chapter 1 of the book Text Mining in Practice with R book.