# Functions, Objects, and Interpreters
This is a demonstration of some Python features. 

Here we will open Shakespeare's plays,

In [5]:
shakes = open('shakespeare.txt')

Here we tell Python to read all the plays that he wrote, and split them into individual words,

In [6]:
text = shakes.read().split()

Now `text` contains Shakespeare's words. The first 25 words are the following,

In [7]:
text[:25]

['A',
 "MIDSUMMER-NIGHT'S",
 'DREAM',
 'Now',
 ',',
 'fair',
 'Hippolyta',
 ',',
 'our',
 'nuptial',
 'hour',
 'Draws',
 'on',
 'apace',
 ':',
 'four',
 'happy',
 'days',
 'bring',
 'in',
 'Another',
 'moon',
 ';',
 'but',
 'O']

"A Midsummer-night's Dream" is the title of the text. Below we can evaluate how many words there are in the text:

In [8]:
len(text)

980637

It turns out the word `the` appears a lot in the text! We can count how many times the word "the" appear,

In [9]:
text.count('the')

23272

One typical word that is used back in Shakespeare-an era is "thou". We can count how many times "thou" appear in the text,

In [10]:
text.count('thou')

4501

And we can try to count other words too,    

In [11]:
text.count('you')

12361

In [13]:
text.count('forsooth')

40

What is the most common appearance in the text? If we see from the first 25 words, seems like the comma `,` appears a lot. We can try counting the number of comma `,` in the text.

In [15]:
text.count(',')

81827

81,827! We can calculate the proportion of comma in the text as well by dividing the count with the total length of the text,

In [16]:
text.count(',') / len(text)

0.0834427010198473

As we can see, the comma `,` just made up about 8% of the text.

We can check if a word is within a text. First of all, we can create a `set`,

In [18]:
words = set(text)

`set` is an unordered collection of unique elements. We can see the documentation below,

In [20]:
set?

Now we can check whether some certain words are in `words`.

In [21]:
'forsooth' in words

True

In [22]:
'the' in words

True

We can see how many unique words are in the text by evaluating the length of `words`.

In [23]:
len(words)

33505

By now, we have covered 2 major themes of the course:

1. Functions, such as `len()`, `open()`
2. Objects
* `set` is an object. It represents and behaves like a set of all words in Shakespeare. 

And last but not least, the programming language itself, which is how we are expressing all the information in the cells above and what are being interpreted by the computer to give us the result that we see from each cell.

Everything that we have run so far in each cell are expressions. We have simple expression like the following,

In [24]:
'draw'

'draw'

We can also have an expression that contain operation,

In [25]:
'draw'[::-1]

'ward'

Above, the `[::-1]` is an operation that reverses a word. We can use this operation in combination inside a bigger expression,

In [26]:
{w for w in words if w == w[::-1] and len(w) > 4}

{'level', 'madam', 'minim', 'redder', 'refer', 'rever'}

Here we obtain the words that are longer than 4 characters and the same forward and backwards.

In [27]:
{w for w in words if w[::-1] in words and len(w) == 4}

{'bard',
 'bats',
 'brag',
 'deed',
 'deem',
 'deer',
 'dial',
 'doom',
 'door',
 'drab',
 'draw',
 'ecce',
 'elle',
 'esse',
 'evil',
 'flow',
 'garb',
 'gnat',
 'gums',
 'hoop',
 'keel',
 'laid',
 'leek',
 'leer',
 'lees',
 'liar',
 'live',
 'loop',
 'maws',
 'meed',
 'meet',
 'mood',
 'moor',
 'nips',
 'noon',
 'part',
 'peep',
 'pins',
 'pooh',
 'pool',
 'poop',
 'port',
 'pots',
 'rail',
 'rats',
 'reed',
 'reel',
 'rood',
 'room',
 'seel',
 'sees',
 'smug',
 'snip',
 'spin',
 'spit',
 'spot',
 'stab',
 'star',
 'stop',
 'swam',
 'tang',
 'teem',
 'tips',
 'tops',
 'trap',
 'trop',
 'trow',
 'ward',
 'wolf',
 'wort'}

Above, we display all words that are 4 characters long and in which the reverse are also contained in `words`. We can see if such thing exist for words with longer characters,

In [28]:
{w for w in words if w[::-1] in words and len(w) == 5}

{'asses',
 'deeps',
 'devil',
 'keels',
 'knits',
 'leets',
 'leper',
 'level',
 'lived',
 'madam',
 'minim',
 'refer',
 'repel',
 'rever',
 'sessa',
 'sleek',
 'speed',
 'spots',
 'steel',
 'stink',
 'stops'}

In [29]:
{w for w in words if w[::-1] in words and len(w) == 6}

{'diaper', 'drawer', 'redder', 'repaid', 'reward'}

In [30]:
{w for w in words if w[::-1] in words and len(w) > 6}

set()

It seems that within Shakespeare's plays, there isn't any word that is longer than 6 words in which the reverse is also contained in the plays.