# Check-In #1: Literary Form


**This assignment has multiple parts. Make sure you read the assignment very carefully and address every part.**

Bring your completed notebook to class on Wednesday, October 11. You will work on it more in class, and then submit the final notebook by the end of class.

## Poems for Children, Poems for Adults

Poems and children's literature are very different literary forms, although not mutually exclusive (there are, for example, poems for children). One question literary scholars often ask is what morphological features - features of the text and words themselves - distinguish one literary form from another. While you certainly will not definitively answer this question, you will explore it via two example texts that I provide. You will use built-in Python functions, including string and list functions, to analyze two potential morphological features that may distinguish literary forms: the length of words, and the use of pronouns. 

The example texts were strategically chosen in order to keep constant as many variables as possible, allowing you to focus more precisely on the role of the length of words and pronoun use in different literary forms. The example books are collections of poems by women, both written around the same time. One book introduces poetry to children, and the other is aimed at adults.

*Conversations Introducing Poetry* was written by Charlotte (Turner) Smith in 1819. As the title aptly demonstrates, this book was aimed at introducing children to the genre of poetry. 

*Poems, 1773*, is a collection of poems written by Anna Laetitia (Aikin) Barbauld, published by herself in 1773. These poems are aimed at adults.

Both of these texts come to us from [Northeastern's Women Writers Project](http://www.wwp.northeastern.edu/).

Your task is to determine if the length of the words in these two texts distinguish one from the other, and if the use of pronouns in each text differ from one another. You will use both quantitative methods and qualitative methods to do so.

### Part 1: Word Length

Does word length distinguish one literary form from another?

One hypothesis is that poems written for children will use shorter words that are easier to understand. As such, we would expect that there will be fewer long words in Smith's book than in Barbauld's book.


Alternatively, children's books often use made-up words that are colorful and fun to say. For example:
>["In the world of Dr. Seuss’s The Lorax, to hear the tale of the Lorax, you have pay the “Once-ler,” after which, whispering to you through a “snergelly” hose, he will paint you a word-picture of “truffula” trees, and “Bar-ba-loots” frolicking around, and the horrible garments called “thneeds” that started a path of environmental destruction."](https://www.theatlantic.com/science/archive/2015/11/the-secret-to-dr-seusss-made-up-words/417405/) 

A second hypothesis, then, is that poems for children will use proportionally more longer words than poems written for adults.

A third hypothesis is that each form will use a similar proportion of long words, but they will use very different *types* of long words.

Your task is to (1) quantify how the length of words in our two example texts differ overall (or, perhaps, are similar), and (2) examine the long words in more detail to determine if there are qualitative differences between the two forms.

In the first part of this assignment, you should:

1. Include one or two cells explaining what you expect to find and why.
2. Calculate and print the average word length in each text.
2. Calculate the proportion of total words in each text that are longer than four (4) characters.
3. Calculate the proportion of total words in each text that are longer than ten (10) characters.
4. Print some of these long words from each text to examine them qualitatively. Do you see differences that word length on its own is not capturing?
4. Include one or two cells, at the end or throughout, interpreting and discussing all of this output. What have you learned? What next steps would you take if you wanted to pursue this further?

### Part 2: Pronouns

[Research has shown](https://www.researchgate.net/publication/253291274_Gender_Differences_in_Language_Use_An_Analysis_of_14000_Text_Samples) that women use more personal pronouns compared to men, while men use more possessive pronouns. Both of our texts are written by women, but what about literary form? Will literature written for children use different types of pronouns compare to literature written for adults? 

Again, there are two possible hypotheses. On the one hand, both of these texts were written by women so we would expect them to use pronouns in a similar way. On the other hand, women might use personal pronouns more than men because they more often write children's books, and children's books as a literary form may contain more personal pronouns. If this is the case, we would expect Smith's book to contain more personal pronouns compared to Barbauld's book, and, perhaps, Barbauld's book will contain comparatively more possessive pronouns.


For our purposes, here is a list of personal and possessive pronouns:

>Personal pronouns: I, you, he, she, it, we, they, what, who, me, him, her, us, them  
>Possessive pronouns: mine, yours, his, hers, ours, theirs

In the second part of this assignment, you should:

1. Include one or two cells explaining what you expect to find and why. 
2. Compare the proportion of total words in Smith's book that are *personal* pronouns to the proportion of total words in Barbauld's collection that are *personal* pronouns. Show these results.
3. Compare the proportion of words in Smith's book that are *possessive* pronouns to the proportion of words in Barbauld's collection that are *possessive* pronouns. Show these results.
4. Include one or two cells interpreting and discussing the output, bringing in knowledge from your particular field. What have you learned about these two literary forms from your analyses? (You can bring in results from Part 1 too if appropriate.) What further steps might you take to take to learn more?


### Details and Reproducibility

For both parts of the assignment, details are important. Think about punctuation, whether you want to lowercase words, think about including or excluding digits, etc. Document each step you take and reasons for taking them.

### Format

Treat this assignment as you would a paper, which means formatting and writing matters. Include text describing what you are doing and your results, make sure the text and the output are easy for me as a reader to follow and understand, and make it look good. I will take this into consideration when grading the notebook.


##  In-Class: Wednesday, October 11

In class, you will pair up with a classmate and compare and contrast your notebooks. 

In the final part of your notebook:

1. Compare the general programming approach you and your classmate took. What was different? What was similar? Is one approach better? Why?
2. Provide three (3) further techniques you could use to extend this analysis (qualitative or quantitative). Explain how each technique would expand our understanding of the morphological features of literary forms.
2. Provide three (3) substantive questions about the social world this line of analysis could address. For this part you may propose additional corpora that could extend or expand this research.

By the end of the class on Wednesday you must submit your completed notebook to Blackboard, including the two parts of the assignment (you may change it during class), and the final part summarizing the in-class work you did.


## Grading Rubric

I will give a letter grade for the assignment, and it will be based on carrying out the required calculations, your presentation of the results, and the thoughtfulness of your interpretation and discussion. "Thoughfulness" will be determined by (1) whether or not your interpretation is appropriate for the techniques you used and your output, and (2) how well your interpretation and your further substantive questions relate to a humanities, social science, or scientific perspective on the social world. See the attached file in Blackboard for a more detailed grading rubric.

Remember, this check-in counts for 10% of your final grade.

In [None]:
#create a variable that contains the Smith text as a string
smith_string = open("../data/smith_conversations.txt", 'r', encoding='utf-8').read()

#create a variable that contains the Barbauld text as a string
barbauld_string = open("../data/barbauld_poems.txt", 'r', encoding='utf-8').read()

#Check to make sure they are correct
print("SMITH:")
print(smith_string[:500])
print("\n\nBARBAULD:")
print(barbauld_string[:500])