# Jane Eyre - A Mathematical Theory of Communication

[A Mathematical Theory of Communication by Claude Shannon](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf)

[The Mathematical Theory of Communication by Claude Shannon and Warren Weaver](https://pure.mpg.de/rest/items/item_2383164/component/file_2383163/content)

[Jane Eyre: An Autobiography by Charlotte Brontë](https://www.gutenberg.org/ebooks/1260)

***
## Imports

In [19]:
# The random module from the Python Standard Library.
import random

## Zero-order Approximation

In [20]:
# The allowed symbols.
symbols = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "

Select several elements from symbols, with replacement.

https://docs.python.org/3/library/random.html#random.choices

In [21]:
# Randomly select k symbols from the string above.
L = random.choices(symbols, k=100)

# Show.
''.join(L)

'GDKH IMTIPSHV PUCMQMQRDTPEWADOTEWFMTAWODJFCGDMHQCQYDRUOHNCTMQWJXTQKOASXJFBEMBOCZZNDKIJLPR XZBWOBLWRQ'

## First-order Approximation

In [22]:
# Open the book.
with open('data/janeeyre.txt', 'r') as f:
  # Read the book into one long string.
  text = f.read().upper()

In [23]:
# Counts of the number of letters in the book.
counts = {s: text.count(s) for s in symbols}

In [24]:
counts

{'A': 34572,
 'B': 6380,
 'C': 10497,
 'D': 20748,
 'E': 55729,
 'F': 9157,
 'G': 8522,
 'H': 25327,
 'I': 30331,
 'J': 541,
 'K': 3379,
 'L': 17840,
 'M': 11970,
 'N': 29905,
 'O': 32867,
 'P': 6842,
 'Q': 541,
 'R': 26503,
 'S': 28083,
 'T': 37020,
 'U': 12455,
 'V': 3930,
 'W': 10049,
 'X': 719,
 'Y': 9019,
 'Z': 188,
 ' ': 91911}

In [25]:
counts['E']

55729

In [26]:
# Show the items in counts in sorted order.
# Adapted from : https://stackoverflow.com/a/613218
sorted(counts.items(), key=lambda item: -item[1])

[(' ', 91911),
 ('E', 55729),
 ('T', 37020),
 ('A', 34572),
 ('O', 32867),
 ('I', 30331),
 ('N', 29905),
 ('S', 28083),
 ('R', 26503),
 ('H', 25327),
 ('D', 20748),
 ('L', 17840),
 ('U', 12455),
 ('M', 11970),
 ('C', 10497),
 ('W', 10049),
 ('F', 9157),
 ('Y', 9019),
 ('G', 8522),
 ('P', 6842),
 ('B', 6380),
 ('V', 3930),
 ('K', 3379),
 ('X', 719),
 ('J', 541),
 ('Q', 541),
 ('Z', 188)]

In [27]:
# The dictionary keys.
counts.keys()

dict_keys(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' '])

In [28]:
# the dictionary values.
counts.values()

dict_values([34572, 6380, 10497, 20748, 55729, 9157, 8522, 25327, 30331, 541, 3379, 17840, 11970, 29905, 32867, 6842, 541, 26503, 28083, 37020, 12455, 3930, 10049, 719, 9019, 188, 91911])

In [31]:
# Randomly select k symbols from the string above.
L = random.choices(list(counts.keys()), weights=list(counts.values()), k=100)

# Show.
''.join(L)


'BII  EOSCMMENN BH  DI   IAROLSGNA  RUNRA  LNFTNME ETDYNILA HA HYUR GRE L RTMTH BTRWETDHMET RDOC RCTS'

***
## End