# Wrangling Humanities Data with AI and R â€“ A Very Brief Introduction 

In this workshop we will explore how generative AI can help you create R code for working with humanistic data.
This portion of the session focuses on a set of letters written from 1915-1919 by H.J.C. (Jack) Peirs, a British World War I officer.
The data is in a spreadsheet, with rows that correspond to each letter, and columns that identify the text of the letter (transcribed by a human, not OCR), the date it was sent, and who received it. This is a type of **structured data**.

## What we will do:
1. Load the letters dataset.
2. Verify its structure.
3. Use generative AI to write R code that:
   - visualizes some aspect of the data,
   - analyzes some aspect of the word usage,

You will generate most of the R code yourself by working with an AI tool of your choice (ChatGPT, Claude, Gemini, etc.).


## How to Run Code in This mybinder.org Notebook

- Click into any `code` cell.
- Press **Shift + Enter** to run the code.

If a cell returns an error:
- Check the error message.
- Ask an AI tool to help fix it, usually by copying and pasting it into the AI tool itself.

Sometimes you will see warnings - you can generally ignore these, although sometimes they have useful information you can use to refine your R script.



## Loading the Letters Dataset

The file **letters.csv** is in the same folder as this notebook. When we load it, we will see the first few rows. We should also see the column names, **text**, **recipient**, and **date** (in YYYY-MM-DD format). The column names are important when working with AI to generate R scripts.

This code includes the tidyverse R library, which provides some helper code which lets us read the CSV file.

Run the next cell to load it by clicking in the cell and pressing **Shift + Enter**.

Note: While this step is not always necessary if you are already familar with the dataset, we can pull it up just to verify it is being loaded.
/n

In [None]:
# loads the tidyverse R library
library(tidyverse)
# create a variable `letters` for the CSV file
letters <- read_csv("letters.csv", show_col_types = FALSE)
# Peek at the dataset
head(letters)


## Using AI to Write R Code

You will now generate your own R code with the help of a generative AI tool.

### Instructions:
1. Ask an AI tool to write the R code for the visualization.  
2. Copy and paste **only the code** in the cell below, underneath our libraries.  
3. Run it by clicking in the cell and pressing **Shift + Enter**.  
4. Sometimes you may get unexpected results. You can use the AI to troubleshoot and refine the code, often by giving it examples of the unexpected results. If errors appear, copy and paste the error messages and ask the AI to help debug.
### Tips for Successful Prompt Writing 
Successful prompt writing requires that you provide the AI tool some information about your coding environment and data. In this case, we can say:

**I am using a mybinder.org notebook to run R scripts from a GitHub repository provided by a facilitator. This repository includes a file of correspondence, `letters.csv`, which has the columns `text`, `date` (generally in YYYY-MM-DD format, but sometimes in YYYY-MM and sometimes just YYYY) and `recipient`. Each row in 'letters.csv' is a different letter, for a total of 269 letters. The mybinder.org notebook has the following R libraries preloaded:** 

**`tidyverse, readr, dplyr, stringr, lubridate, tidytext, ggplot2, syuzhet, wordcloud, RColorBrewer`**

**I want to create some R scripts that analyze and visualize the dataset. When creating the code, do not re-load libraries. Assume I know nothing about R, so please include comments that help me understand what the different parts of the script are doing.** 

You can even do this before you even ask the AI tool to start writing R scripts to get it primed for what comes next.

First, prime everything by giving your AI tool your environment and data as above. Then, you can ask it to write an R script. Then you can choose one (or both) of the tasks below to experiment.
### Task A: Create a Visualization Using AI-Generated Code
For this, we will see if we can create some sort of data visualization based on the letters. R can do very basic visualizations. For more robust visualizations, or for more control, it is usually more advantageous to have the R code create a CSV file that you can then import into a data visualization program such as Flourish.Studio.

#### Example prompts you might use:
- **Write R code that counts how many letters were written per year and creates a bar chart**
- **Write R code that extracts the 20 most frequent non-stopword words from the text column in letters.csv and creates a word cloud**
- **Write R code that filters letters written between 1914 and 1918 and graphs them by month**
- **Write R code that creates a visualization of how many letters each recipient received**



In [None]:
# DO NOT DELETE OR PASTE OVER THESE LINES
# load our R libraries
library(tidyverse)
library(readr)
library(dplyr)
library(stringr)
library(lubridate)
library(tidytext)
library(ggplot2)
library(syuzhet)
library(wordcloud)
library(RColorBrewer)
# Paste your AI-generated code below this line and run it.
# --------------------------------------------------------




### Task B: Analyze the Text Using AI-Generated Code
For this, we will use some tools to do some basic analysis based on the text of the letters. 

#### Example prompts you might use:
- **Write R code that creates a table of the 10 most common non-stopword words used by recipient**
- **Write R code that creates a table of the 10 most common non-stopword words used by year**  
- **Write R code that creates a table of the 25 most common bigrams** (two-word phrases)  
- **Write R code that creates a table of average Syuzhet sentiment analysis scores for each letter recipient** 
- Write R code that creates a table of Syuzhet sentiment analysis scores that has the highest and lowest score for each year, and include the date of that letter**  



In [None]:
# DO NOT DELETE OR PASTE OVER THESE LINES
# load our R libraries
library(tidyverse)
library(readr)
library(dplyr)
library(stringr)
library(lubridate)
library(tidytext)
library(ggplot2)
library(syuzhet)
library(wordcloud)
library(RColorBrewer)
# Paste your AI-generated code below this line and run it.
# --------------------------------------------------------


