# Regex Exercise

Let's try something more complicated.  Use regular expressions to figure out which characters in the play speak the most lines.

In [1]:
import re

## Step One

Read in the text of the play and cut out the front-matter, such as the list of characters 

In [2]:
with open('Tempest Notes.txt') as f:
    tempest = f.read()

In [3]:
re.search(r'ACT', tempest)

<re.Match object; span=(1006, 1009), match='ACT'>

In [4]:
tempest[1006:1020]

'ACT 1\n=====\n\nS'

In [5]:
tempest = tempest[1006:]

## Step Two

There are stage directions within speeches that will throw off our counting.  Stage directions are in square brackets. Remove all of them from the text.  NB. Some stage directions extend for more than one line.

In [6]:
re.findall(r'\[.*?\]', tempest, flags=re.S)

['[A tempestuous noise of thunder and lightning heard.\nEnter a Shipmaster and a Boatswain.]',
 '[He exits.]',
 '[Enter Mariners.]',
 '[Enter Alonso, Sebastian, Antonio, Ferdinand, Gonzalo,\nand others.]',
 '[He exits.]',
 '[He exits with Alonso, Sebastian,\nand the other courtiers.]',
 '[Enter Boatswain.]',
 '[(A cry\nwithin.)]',
 '[Enter Sebastian, Antonio, and Gonzalo.]',
 '[Enter more Mariners, wet.]',
 '[Mariners exit.]',
 '[Boatswain exits.]',
 '[A confused noise within:]',
 '[He exits with Antonio.]',
 '[He exits.]',
 '[Enter Prospero and Miranda.]',
 '[Putting aside his cloak.]',
 '[They sit.]',
 '[standing]',
 '[Miranda falls asleep.]',
 '[Prospero puts on his cloak.]',
 '[Enter Ariel.]',
 '[He folds his arms.]',
 '[Ariel exits.]',
 '[Miranda wakes.]',
 '[rising]',
 '[within]',
 '[Enter Ariel like a water nymph.]',
 '[He whispers to Ariel.]',
 '[He exits.]',
 '[to Caliban]',
 '[Enter Caliban.]',
 '[Aside.]',
 '[Caliban exits.]',
 '[Enter Ferdinand; and Ariel, invisible,\nplayi

In [None]:
new_text = re.sub(r'', '', tempest)



## Step Three

Write a regex that captures the details of each speech in the play. It should capture (in parentheses) the name of the speaker, and then it should capture (in parentheses) the rest of the speech.

In [16]:
re.findall(r'^([A-Z][A-Z\s]+?)\s{2,}(.+?)(?=(?:\n[A-Z][A-Z\s]+\s{2,})|\n\[|$)', tempest, flags=re.S | re.M)

[('MASTER', 'Boatswain!'),
 ('BOATSWAIN', 'Here, master. What cheer?'),
 ('MASTER', "Good, speak to th' mariners. Fall to 't yarely,"),
 ('BOATSWAIN', 'Heigh, my hearts! Cheerly, cheerly, my'),
 ('ALONSO', "Good boatswain, have care. Where's the Master?"),
 ('BOATSWAIN', 'I pray now, keep below.'),
 ('ANTONIO', 'Where is the Master, boatswain?'),
 ('BOATSWAIN', 'Do you not hear him? You mar our labor.'),
 ('GONZALO', 'Nay, good, be patient.'),
 ('BOATSWAIN', 'When the sea is. Hence! What cares these'),
 ('GONZALO', 'Good, yet remember whom thou hast'),
 ('BOATSWAIN', 'None that I more love than myself. You are'),
 ('GONZALO', 'I have great comfort from this fellow. Methinks'),
 ('BOATSWAIN', 'Down with the topmast! Yare! Lower, lower!'),
 ('SEBASTIAN', "A pox o' your throat, you bawling, blasphemous,"),
 ('BOATSWAIN', 'Work you, then.'),
 ('ANTONIO', 'Hang, cur, hang, you whoreson, insolent'),
 ('GONZALO', "I'll warrant him for drowning, though the"),
 ('BOATSWAIN', 'Lay her ahold, aho

In [19]:
speech = re.findall(r'^\s*([A-Z]{4,})\s+(.*?)\n\n', new_text, re.S|re.M)
speech

[('MASTER', 'Boatswain!'),
 ('BOATSWAIN', 'Here, master. What cheer?'),
 ('MASTER',
  "Good, speak to th' mariners. Fall to 't yarely,\nor we run ourselves aground. Bestir, bestir!\n[He exits.]"),
 ('BOATSWAIN',
  "Heigh, my hearts! Cheerly, cheerly, my\nhearts! Yare, yare! Take in the topsail. Tend to th'\nMaster's whistle.--Blow till thou burst thy wind, if\nroom enough!"),
 ('ALONSO', "Good boatswain, have care. Where's the Master?\nPlay the men."),
 ('BOATSWAIN', 'I pray now, keep below.'),
 ('ANTONIO', 'Where is the Master, boatswain?'),
 ('BOATSWAIN',
  'Do you not hear him? You mar our labor.\nKeep your cabins. You do assist the storm.'),
 ('GONZALO', 'Nay, good, be patient.'),
 ('BOATSWAIN',
  'When the sea is. Hence! What cares these\nroarers for the name of king? To cabin! Silence!\nTrouble us not.'),
 ('GONZALO', 'Good, yet remember whom thou hast\naboard.'),
 ('BOATSWAIN',
  'None that I more love than myself. You are\na councillor; if you can command these elements\nto sil

## Step Four

Process the output of the previous step, which is a list of tuples containing the speaker and the content for each speech.  Iterate over it, counting up the number of lines in each speech and adding it to the running total for each speaker. You might want to use a Counter. 

In [None]:
from collections import Counter
lines = Counter()