# Sample Text Analysis: Word Frequency in Shakespeare

This notebook demonstrates basic text analysis techniques using Python. You'll use this for the formative assessment by embedding specific cells in your reflection notes.

[[start]]

In [1]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
import re

print("Libraries imported successfully!")

Libraries imported successfully!


In [3]:
# Sample text from Hamlet's "To be or not to be" soliloquy
hamlet_text = """
To be, or not to be, that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them. To die—to sleep,
No more; and by a sleep to say we end
The heartache and the thousand natural shocks
That flesh is heir to: 'tis a consummation
Devoutly to be wished. To die, to sleep;
To sleep, perchance to dream—ay, there's the rub:
For in that sleep of death what dreams may come,
When we have shuffled off this mortal coil,
Must give us pause.
"""

print(f"Text loaded: {len(hamlet_text.split())} words")

Text loaded: 105 words


In [4]:
# Clean and tokenize the text
def clean_text(text):
    # Convert to lowercase and remove punctuation
    text = re.sub(r'[^\w\s]', '', text.lower())
    # Split into words
    words = text.split()
    return words

words = clean_text(hamlet_text)
print(f"Cleaned text: {len(words)} words")
print(f"First 10 words: {words[:10]}")

Cleaned text: 105 words
First 10 words: ['to', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']


In [5]:
# Count word frequencies
word_counts = Counter(words)
print("Most common words:")
for word, count in word_counts.most_common(10):
    print(f"{word}: {count}")

Most common words:
to: 12
the: 6
sleep: 5
and: 4
be: 3
that: 3
of: 3
a: 3
or: 2
is: 2


In [5]:
# Create a visualization
top_words = dict(word_counts.most_common(8))

plt.figure(figsize=(10, 6))
plt.bar(top_words.keys(), top_words.values())
plt.title("Most Frequent Words in Hamlet's Soliloquy")
plt.xlabel("Words")
plt.ylabel("Frequency")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("Visualization complete!")

Visualization complete!


## Interpretation

This simple frequency analysis reveals interesting patterns:

1. **"To" is dominant** - reflects the infinitive constructions that structure Hamlet's philosophical reasoning
2. **"Sleep" appears frequently** - central metaphor for death in this passage
3. **Function words dominate** - but content words like "sleep", "death", "dream" carry the thematic weight

**Digital Humanities Insight**: Even basic computational analysis can reveal linguistic patterns that support close reading interpretations.

In [6]:
# Analysis summary
total_words = len(words)
unique_words = len(word_counts)
lexical_diversity = unique_words / total_words

print(f"Analysis Summary:")
print(f"Total words: {total_words}")
print(f"Unique words: {unique_words}")
print(f"Lexical diversity: {lexical_diversity:.2f}")
print(f"Most repeated word: '{word_counts.most_common(1)[0][0]}' ({word_counts.most_common(1)[0][1]} times)")

Analysis Summary:
Total words: 87
Unique words: 65
Lexical diversity: 0.75
Most repeated word: 'to' (9 times)
