Text analysis (also called “text mining” or “distant reading"-–a phrase coined in 2000 by Franco Moretti): is a term for research techniques that analyze information based on statistical patterns in texts. This can be as simple as counting the raw frequencies of works, or using statistical "modeling" of textual language.
We've already dipped our toes into text analysis when we used the command line commands to find keywords in a set of texts and when we used a Python script to extract word frequencies from inaugural addresses. Other common, out-of-the box tools for lightweight text analysis include Voyant Tools, which is a web-based environment for analyzing patterns in small collections of text files (like tracing word counts over a corpus, looking). Text analysis will often draw on methods from natural language processing (NLP) "a field that explores the interactions between computers and human (natural) languages."(Digital Humanities Literary Guidebook)
For today's homework, we'll explore some of more complicated things that scholars are using to analyze texts:
- Explore existing tools for text analysis
- Read about more advanced methods of "distant reading" collections of texts
- Sign up for a GitHub account (which we'll be using in our in-class work)
Complete the simple walk through of Voyant tools
- Choose one of the following tutorials;
- Matthew Lavin, "Analyzing Documents with tf-idf" (2019)
- Zoë Wilkinson Saldaña, "Sentiment Analysis for Exploratory Data Analysis" (2018)
- Digital Research Institute (DRI), "Introduction to Text Analysis with Python and NLTK"" (2018)
- Write down one question that you have after reading the tutorial.
NOTE: Your goal should be to read through the tutorial to get a sense of what your chosen method does, its limitations and advantages. This means:
- You don't need to complete the actual steps of the tutorial (unless you want to)
- You should read the tutorial with an eye towards what this methods does, how it works (in plain English, not the minutia of the Python code)
- Ex: What sorts of research questions we can (and can't) responsibly use it to explore, and what other considerations we might have to take into account when using it.
-
Read Imran Haider's (5 minute-read) "A Dead Simple Intro to GitHub for the Non-Technical"
-
Finally, sign up for a GitHub account:
- Go to github.com and click sign-up
Try and finish the assignment we worked on in class and as part of your homework last Thursday. If you find yourself getting, stuck, take a look at my notebook (interactive Binder version here)
I'm not requiring you to turn this in, but if you'd like feedback, include it as part of your Week 8 homework.
If you want further guidance on downloading files from our course website, here are my instructions how to download web resources using wget
NOTE: to view the interactive visualization, you will have to open the Jupyter notebook in "Jupyter Notebook,"" not Jupyter Labs. To do this, launch Anaconda Navigator, and, from the menu, select "Jupyter Notebook" (not Jupyter Labs).