# Wrangling Humanities Data with AI and R â€“ A Very Brief Introduction 

In this workshop we will explore how generative AI (ChatGPT, Copilot, etc.) can help you create R code for working with humanistic data.
This session focuses on a set of letters written from 1915-1919 by H.J.C. (Jack) Peirs, a British World War I officer.
This is **structured data**, meaning the various elements are in pre-defined rows and columns in a spreadsheet.

### What we will do:
1. Load the letters dataset.
2. Explore its structure.
3. Use generative AI to write R code that:
   - visualizes the data,
   - analyzes word usage,
   - or performs any analysis you choose.
4. Reflect on how AI + R can support humanities research.

You will **generate most of the R code yourself** by asking an AI tool.


## How to Run Code in This Notebook

- Click any code cell.
- Press **Shift + Enter** to run it.
- New cells can be added with the "+" button above.

If a cell returns an error:
- Check the error message.
- Ask an AI tool to help fix it.
- Try updating column names or syntax accordingly.

Errors are normal — they’re part of the process!


In [None]:
library(tidyverse)
library(readr)
library(dplyr)
library(stringr)
library(lubridate)
library(tidytext)

sessionInfo()


# Loading the Letters Dataset

The file `letters.csv` should be in the same folder as this notebook.

Run the next cell to load it.



In [None]:
letters <- read_csv("letters.csv", show_col_types = FALSE)

# Peek at the dataset
head(letters)


# Inspecting the Data Structure

Before generating any R code, let's examine what we're working with:

- What columns exist? In this case, you should see **text**, **date** (in YYYY-mm-dd format), and **recepient** columns.

Run the next cell to explore the structure.


In [None]:
names(letters)
summary(letters)

# Optional: attempt to parse the date column
# (Adjust if your date column has a different name)
letters <- letters %>%
  mutate(date = ymd(date))

summary(letters$date)


# Using AI to Write R Code

You will now generate your own R code with the help of a generative AI tool.

### Example prompts you might use:
- **“Write R code that counts how many letters were written per year and creates a bar chart using ggplot2.”**
- **“Write R code that extracts the 20 most frequent non-stopword words from the `text` column in letters.csv.”**
- **“Write R code that filters letters written between 1914 and 1918 and graphs them by month.”**
- **“Here is the head of my dataset. Tell me which column contains the text and write tidytext code to analyze it.”**

### Important:
- Paste **only the code** into the next blank cell.
- Run your code.
- If it fails → copy the error message → ask your AI tool to fix it.

You are in full control of the experiment.


# Task 1: Create a Visualization Using AI-Generated Code

Choose one visualization task:

1. **Letters per year**  
2. **Letters per month**  
3. **Letters per recipient**  
4. **Any pattern you choose**

### Instructions:
1. Ask an AI tool to write the R code for the visualization you choose.  
2. Paste the code in the cell below.  
3. Run it.  
4. If errors appear, copy and paste the code and ask the AI to help debug.


In [None]:
# Paste AI-generated visualization code here and run it.



# Task 2: Analyze the Text Using AI-Generated Code

Choose one of these text-analysis tasks:

1. **Find the most common words** (excluding stopwords)  
2. **Find the most common bigrams** (two-word phrases)  
3. **Look at word usage during a specific time period**  
4. **Any analysis you propose**

### Instructions:
1. Ask an AI tool for R code that performs the analysis.  
2. Paste the generated code into the next code cell.  
3. Run it.  
4. Ask AI for help if something breaks — that’s part of the learning!


In [None]:
# Paste AI-generated text-analysis code here and run it.



# Reflection

Take a moment to consider:

- What did AI make easier?
- What did you still need to understand or decide as a humanist?
- What kinds of research questions might benefit from these methods?
- What problems or risks should we watch for when using AI-generated code?

AI can help with:
- Generating code quickly  
- Fixing errors  
- Suggesting methods

But YOU provide:
- Interpretation  
- Historical context  
- Critical judgment  
- Research questions


# Thank You!

You now have a working environment where:

- You load humanities data
- You use AI to write R scripts
- You run the scripts in a reproducible environment
- You interpret outputs using your scholarly expertise

### Optional next steps:
- Sentiment analysis (`syuzhet`)
- Topic modeling (`topicmodels`, `stm`)
- Named-entity recognition
- Comparing multiple authors or time periods

If you would like, we can expand this notebook into:
- A full assignment template
- A guided humanities lab
- A reusable DH module for future courses
