# Scenario: Pride and Prejudice

Today we will work with a classic of the English Literature: _[Pride and Prejudice](https://en.wikipedia.org/wiki/Pride_and_Prejudice)_.

<img src="https://www.gutenberg.org/files/1342/1342-h/images/cover.jpg" width="33%"/>

Since there is not enough time to read the whole book in one lecture, we will take an approach called [distant reading](https://uta.pressbooks.pub/datanotebook/chapter/1-4-distant-reading/), which uses computers to ingest large amounts of literary text (novels, short stories, etc.) and identify patterns in them. 

To keep things simpler, we will look for simple patterns of appearance of different words (for example the names of the main characters) throughout the chapters of _Pride and Prejudice_. This will give us an idea of which characters appear the most (central characters) and at which stages of the story.

# The data file

To make it simpler to process the data, we have prepared a .txt file where each chapter is (very long) line of text. This is what the initial part of the first 10 lines of the file look like: 
```
Pride and Prejudice by Jane Austen
Chapter 1 It is a truth universally acknowledged, that a single man in possession of  ...
Chapter 2 Mr. Bennet was among the earliest of those who waited on Mr. Bingley.  He h ...
Chapter 3 Not all that Mrs. Bennet, however, with the assistance of her five daughter ...
Chapter 4 When Jane and Elizabeth were alone, the former, who had been cautious in he ...
Chapter 5 Within a short walk of Longbourn lived a family with whom the Bennets were  ...
Chapter 6 The ladies of Longbourn soon waited on those of Netherfield. The visit was  ...
Chapter 7 Mr. Bennet's property consisted almost entirely in an estate of two thousan ...
Chapter 8 At five o'clock the two ladies retired to dress, and at half-past six Eliza ...
Chapter 9 Elizabeth passed the chief of the night in her sister's room, and in the mo ...
```

To view the full file in Jupyter, open the navigator sidebar and double click on `19-pandp12-simple.txt`, then do _File_ &rarr; _Wrap Words_.

---

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

<div class="alert alert-info"><strong>Good to know</strong> If at any time you accidentally deleted or modified the data file, come back here and run this cell. It will recreate the file for you. This cell cannot be edited.</div>

In [None]:
!git restore 19-pandp12-simple.txt

---

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

## Task \#1: count how many times a character appears in each chapter

Write a function called `count_mentions` that takes one string parameter &mdash; the `name` of a character. 

Your function should go through each chapter in the input file and count the number of times the name of that character appears in the chapter. 

It should then return a list with all these data. You can skip the first line of the file, since it only contains the title of the book and no other text.

This is what the list looks like when calling `count_mentions` with`"Elizabeth"`:

```python
[0, 1, 4, 3, 2, 11, 14, 14, 8, 16, 13, 7, 2, 1, 6, 18, 9, 26, 11, 9, 8, 10, 11,
 9, 7, 17, 8, 13, 14, 9, 10, 10, 4, 9, 2, 2, 4, 10, 9, 6, 10, 10, 40, 14, 18,
 12, 19, 8, 13, 3, 15, 5, 15, 12, 18, 20, 11, 14, 13, 7]
```

### Hint

A quick way to count how many times a string appears in some text is the `.count()` method of the `str` type. If you are unsure how it works, create a new cell and type `help(str.count)` to look up the docstring of this method.

In [None]:
def count_mentions(name):
    ...

# Try with different arguments!
mentions = count_mentions("Elizabeth")
    
#  This code displays the results in compact form
import pprint
pprint.pprint(mentions, compact=True)

---

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

## Task \#2: create a dictionary of mentions of various characters

Write a function called `create_dict` that takes one parameter &mdash; a list of character `names` (included in the cell below) &mdash; and returns a dictionaries with their mentions. 

Your function should start with an empty dictionary.

For each character in `names`, your functtion should call `count_mentions` to count the mentions of that character throughout the book, and it should create a new key&ndash;value pair in the dictionary associating the name of the character (the key) to the list of mentions (the value). 

Finally, your function should return the created dictionary.

In [None]:
def create_dict(names):
    ...

names = [
    "Elizabeth",
    "Mr. Darcy",
    "Mr. Bennet",
    "Mrs. Bennet",
    "Jane",
    "Mary",
    "Kitty",
    "Lydia",
    "Mr. Bingley",
    "Caroline",
]
    
# This code displays the dictionary in compact format
import pprint
data = create_dict(names)
pprint.pprint(data, compact=True)

---

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

## Task \#3: Store the mentions of all characters in a file

Write a function called `write_data` that takes two parameters --- the `filename` of the output file and a dictionary of character mentions data created at the previous step, and stores the data into a text file.


The data should be written in comma-separate format, with the character name first. For example, this is what the the file should look like for Elizabeth, Mr. Darcy, Mr. Bennet and Mrs. Bennet:
```
Elizabeth,0,1,4,3,2,11,14,14,8,16,13,7,2,1,6,18,9,26,11,9,8,10,11,9,7,17,8,13,14,9,10
Mr. Darcy,0,0,6,0,3,14,1,4,3,11,8,1,0,0,2,18,4,32,0,0,3,0,0,4,4,3,0,0,1,4,7,7,11,7,4,
Mr. Bennet,6,6,4,0,0,0,5,0,0,0,0,0,4,8,3,0,0,2,0,5,0,2,5,2,0,0,0,0,0,0,0,0,0,0,0,0,0,
Mrs. Bennet,0,3,6,0,3,0,5,0,9,1,0,3,7,3,4,1,3,7,3,10,2,2,9,3,4,1,0,0,0,0,0,0,0,0,0,0,
```

Last but not least, make sure to close the output file.

# Hint

There are several ways to write a comma-separated 1values file. Use the file handle's `.write()` method in conjunction with a for loop to write each value separately. Alternatively, you can use the `.join()` method from class `str`. If you are unsure how it works, create a new cell and type `help(str.join)` to look up the docstring of this method.

In [None]:
# Enter your code here!
...

names = [
    "Elizabeth",
    "Mr. Darcy",
    "Mr. Bennet",
    "Mrs. Bennet",
    "Jane",
    "Mary",
    "Kitty",
    "Lydia",
    "Mr. Bingley",
    "Caroline",
]

# create data dictionary
data = create_dict(names)

# save data to file in CSV format
write_data("output.txt", data)

To view the full file in Jupyter, open the navigator sidebar and double click on `output.txt`, then do _File_ &rarr; _Wrap Words_.

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

---

# Solutions

## Task 1

In [None]:
def count_mentions(name):
    # BEGIN SOLUTION
    fh = open("19-pandp12-simple.txt", "r")
    counts = []
    line_no = 0
    for line in fh:
        line_no += 1
        if line_no == 1:
            continue
        line = line.strip()
        counts.append(line.count(name))
    return counts
    # END SOLUTION

# Try with different arguments!
mentions = count_mentions("Elizabeth")
    
#  This code displays the results in compact form
import pprint
pprint.pprint(mentions, compact=True)

## Task 2

In [None]:
def create_dict(names):
    ### BEGIN SOLUTION
    d = {}
    for name in names:
        d[name] = count_mentions(name)
    return d
    ### END SOLUTION

names = [
    "Elizabeth",
    "Mr. Darcy",
    "Mr. Bennet",
    "Mrs. Bennet",
    "Jane",
    "Mary",
    "Kitty",
    "Lydia",
    "Mr. Bingley",
    "Caroline",
]
    
# This code displays the dictionary in compact format
import pprint
data = create_dict(names)
pprint.pprint(data, compact=True)

## Task 3

In [None]:
### BEGIN SOLUTION
def write_data(filename, data):
    fh = open(filename, mode='w')
    for name in data:
        values = data[name]
        fh.write(name)
        for c in values:
            fh.write("," + str(c))
        fh.write("\n")
    fh.close()
### END SOLUTION

names = [
    "Elizabeth",
    "Mr. Darcy",
    "Mr. Bennet",
    "Mrs. Bennet",
    "Jane",
    "Mary",
    "Kitty",
    "Lydia",
    "Mr. Bingley",
    "Caroline",
]

# create data dictionary
data = create_dict(names)

# save data to file in CSV format
write_data("output.txt", data)