## [Comprehensions](https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html)

Comprehension is a language element that enables the concise definition of sequences. Comprehension is somewhat similar to [set builder notation](https://en.wikipedia.org/wiki/Set-builder_notation) used in mathematics (example: the set of odd numbers can be given as $\{2k + 1\ |\ k \in \mathbb{Z}\}$).

### Unconditional comprehension

In [1]:
# Prepare the list of the first 10 square numbers using an accumulator variable!
l = []
for i in range(1, 11):
    l.append(i**2)
l

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [2]:
# The same with a list comprehension:
l = [i**2 for i in range(1, 11)]
l

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [3]:
# Prepare the set of the first 10 square numbers using an accumulator variable!
s = set()
for i in range(1, 11):
    s.add(i**2)
s

{1, 4, 9, 16, 25, 36, 49, 64, 81, 100}

In [4]:
# The same with a set comprehension:
s = {i**2 for i in range(1, 11)}
s

{1, 4, 9, 16, 25, 36, 49, 64, 81, 100}

In [5]:
# Prepare a dict that assigns their ASCII code to lowercase vowels!
# Use an accumulator variable!
d = {}
for c in 'aeiou':
    d[c] = ord(c)
d

{'a': 97, 'e': 101, 'i': 105, 'o': 111, 'u': 117}

In [6]:
# The same with a dict comprehension:
d = {c: ord(c) for c in 'aeiou'}
d

{'a': 97, 'e': 101, 'i': 105, 'o': 111, 'u': 117}

In [7]:
# Exercise: Swap pair members in a list of pairs.
pairs = [('apple', 10), ('pear', 20), ('peach', 30)]
# => [(10 'apple'), (20, 'pear'), (30, 'peach')]

[(p[1], p[0]) for p in pairs]

[(10, 'apple'), (20, 'pear'), (30, 'peach')]

In [8]:
# (Alternative solution using unpacking that we will learn later:)
[(y, x) for x, y in pairs]

[(10, 'apple'), (20, 'pear'), (30, 'peach')]

In [9]:
# There is no tuple comprehension in Python!
g = (i**2 for i in range(1, 11))
# This is a generator object we can iterate over.

In [10]:
for j in g:
    print(j)

1
4
9
16
25
36
49
64
81
100


### Conditional comprehension

In [11]:
# Conditional list comprehension.
[i**2 for i in range(1, 11) if i % 2 == 0]

[4, 16, 36, 64, 100]

In [12]:
# Conditional set comprehension.
{i**2 for i in range(1, 11) if i % 2 == 0}

{4, 16, 36, 64, 100}

In [13]:
# Conditional dict comprehension.
{c: ord(c) for c in 'aeiou' if c != 'o'}

{'a': 97, 'e': 101, 'i': 105, 'u': 117}

## Sorting

In [14]:
# Sorting a list in place.
l = [2, 10, 5, 11, 7]
l.sort()

In [15]:
l

[2, 5, 7, 10, 11]

- Python uses the Timsort algorithm for sorting (named after Tim Peters).
- Time requirement: O(n×log(n)).
The sorting algorithm is stable (it preserves the original order of identical elements).

In [16]:
# Sorting to descending order.
l = [2, 10, 5, 11, 7]
l.sort(reverse=True)
l

[11, 10, 7, 5, 2]

In [17]:
# The list items have to be comparable!
l = [2, 10, 5, 11, 7, 'foo']
l.sort()

TypeError: '<' not supported between instances of 'str' and 'int'

In [18]:
# Sorting a list of strings.
l = ['baz', 'foo', 'bar']
l.sort()
l

['bar', 'baz', 'foo']

In [19]:
# A tuple cannot be sorted in place because it is immutable.
t = (2, 10, 5)
t.sort()

AttributeError: 'tuple' object has no attribute 'sort'

In [20]:
# Sorting a list into a new list.
l1 = [2, 10, 5, 11, 7]
l2 = sorted(l1)
print(l1, l2)

[2, 10, 5, 11, 7] [2, 5, 7, 10, 11]


In [21]:
# A tuple can also be sorted into a new list.
t = (2, 10, 5)
sorted(t)

[2, 5, 10]

In [22]:
# ...and a set too.
s = {2, 10, 5}
sorted(s)

[2, 5, 10]

In [23]:
# For a dict, the keys will be sorted.
sorted({'b': 10, 'a': 20})

['a', 'b']

In [24]:
sorted('bar')

['a', 'b', 'r']

In [25]:
# Sorting a list of pairs (lexicographically).
pairs = [(20, 'foo'), (10, 'xyz'), (20, 'bar')]
sorted(pairs)

[(10, 'xyz'), (20, 'bar'), (20, 'foo')]

## [File handling](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files)

- A file is a collection of data stored in one unit.
- The life cycle of a file looks like as follows:
  1. opening
  2. reading, writing, positioning, ...
  3. closing

In [26]:
# Opening the file called example_file.txt.
f = open('example_file.txt')

In [27]:
f

<_io.TextIOWrapper name='example_file.txt' mode='r' encoding='UTF-8'>

In [28]:
type(f)

_io.TextIOWrapper

In [29]:
# Closing the file.
f.close()

In [30]:
# Reading the content of the file into a string.
f = open('example_file.txt')
s = f.read()
f.close()
s

'# example data\napple,10\npear,20\ncherry,30\n'

In [31]:
# ...shorter version of the same:
s = open('example_file.txt').read()
s

'# example data\napple,10\npear,20\ncherry,30\n'

In [1]:
with open('example_file.txt') as f:
s = f.read()
s

'# example data\napple,10\npear,20\ncherry,30\n'

In [33]:
# Reading the first 2 lines.
f = open('example_file.txt')
l1 = f.readline()
l2 = f.readline()
f.close()
print(l1, l2)

# example data
 apple,10



In [34]:
# Remark: readline puts the line break too into the result.
l1

'# example data\n'

In [35]:
# The line break can be removed using strip().
l1.strip()

'# example data'

In [36]:
# Reading the lines of a file into a string list.
lines = open('example_file.txt').readlines()
for line in lines:
    print(line)

# example data

apple,10

pear,20

cherry,30



In [37]:
# Iterating over the lines of a text file.
for line in open('example_file.txt'):
    print(line)

# example data

apple,10

pear,20

cherry,30



In [38]:
# Splitting a string along a delimiter sequence (tokenization).
line.split(',')

['cherry', '30\n']

<small><u>Remark</u>: If split() is called without a parameter, the string is split by any whitespace characters (spaces, tabs, newlines),
and consecutive whitespace is treated as a single delimiter. Empty strings are automatically removed from the result.</small>

In [39]:
# Skipping the first line of the file and processing further lines.
data = []

f = open('example_file.txt')
f.readline()
for line in f:
    tok = line.split(',')
    data.append((tok[0], int(tok[1])))
f.close()

data

[('apple', 10), ('pear', 20), ('cherry', 30)]

In [40]:
# Writing a string into file.
f = open('example_file_2.txt', 'w')
f.write('hello\n')
f.close()

In [41]:
# ...a shorter version of the same:
open('example_file_2.txt', 'w').write('hello\n')

6

<small>
    <u>Remark</u>: The CPython interpreter closes the file immediately, as there is no more references to it.
</small>

In [42]:
# Professional solution, using the with statement.
with open('example_file_2.txt', 'w') as f:
    f.write('hello\n')

In [43]:
# Preparing a file that contains a Celsius-Fahrenheit table.
# [°F] = [°C] · 1.8 + 32

file = open('celsius_fahrenheit.txt', 'w')
file.write(f'     °C     °F\n')
for c in range(-20, 41, 5):
    f = int(c * 1.8 + 32 + 0.5)
    file.write(f'{c:7}{f:7}\n')
file.close()

In [44]:
# Determine the set of words contained in the text file real_programmers.txt!
words = set()
for line in open('real_programmers.txt'):
    words.add(line.strip())
words

{'accounting',
 'all',
 'artificial',
 'at',
 'do',
 'fortran',
 'if',
 'in',
 'intelligence',
 'it',
 'list',
 'manipulation',
 'processing',
 'programmers',
 'programs',
 'real',
 'string',
 'they'}

In [45]:
# ...the same with a one-liner:
{line.strip() for line in open('real_programmers.txt')}

{'accounting',
 'all',
 'artificial',
 'at',
 'do',
 'fortran',
 'if',
 'in',
 'intelligence',
 'it',
 'list',
 'manipulation',
 'processing',
 'programmers',
 'programs',
 'real',
 'string',
 'they'}

In [46]:
# Read the content of matrix.txt into a list of int lists!
matrix = []
for line in open('matrix.txt'):
    row = [int(s) for s in line.split()]
    matrix.append(row)
matrix

[[0, 1, 1, 0, 1, 0, 1, 1, 0, 1],
 [0, 0, 1, 0, 1, 1, 0, 1, 0, 1],
 [0, 0, 1, 0, 0, 0, 1, 1, 0, 0],
 [0, 1, 0, 0, 1, 0, 1, 1, 0, 0],
 [1, 0, 1, 1, 0, 0, 1, 0, 1, 1],
 [1, 0, 1, 0, 0, 1, 1, 0, 1, 0],
 [1, 1, 1, 0, 1, 1, 1, 0, 1, 1],
 [0, 0, 0, 0, 0, 1, 0, 1, 0, 1],
 [1, 1, 0, 1, 0, 1, 1, 1, 0, 0],
 [1, 0, 1, 0, 1, 0, 0, 1, 0, 1]]

In [None]:
# ...the same using a double comprehension:
[[int(s) for s in line.split()] for line in open('matrix.txt')]

## Exercise: Word statistics in Hamlet

The file [hamlet.txt](hamlet.txt) contains the full text of [Hamlet](https://en.wikipedia.org/wiki/Hamlet). Write a program that computes and prints the 30 most frequent words in the text. Use the following definitions and conditions:

- A word is defined as a sequence of characters separated by any whitespace (spaces, tabs, or line breaks).
- The comparison of words should be case-insensitive (i.e., treat lowercase and uppercase letters as equivalent).
- Remove any leading or trailing punctuation characters (such as periods, commas, exclamation points, etc.) from each word before counting its occurrences.

In [47]:
# Read the text and split it into lowercase words.
words = open('hamlet.txt').read().lower().split()

In [48]:
# Remove leading and trailing punctuation characters.
import string
words = [word.strip(string.punctuation) for word in words]

In [49]:
# We need a dict to count word frequencies. Keys are words, values are word counts.
# We will loop over the words. For each word, we increment the the counter if it is present
# in the dict, otherwise we put the word into the dict with count 1.

word_counts = {}
for word in words:
    if word in word_counts: word_counts[word] += 1
    else: word_counts[word] = 1

In [50]:
# Sort by ascending count values.
result = sorted([(p[1], p[0]) for p in word_counts.items()], reverse=True)

In [51]:
# Display most common 30 words.
for i in range(30):
    print(result[i])

(1145, 'the')
(973, 'and')
(736, 'to')
(674, 'of')
(565, 'i')
(539, 'you')
(534, 'a')
(513, 'my')
(431, 'in')
(409, 'it')
(381, 'that')
(358, 'ham')
(339, 'is')
(310, 'not')
(297, 'this')
(297, 'his')
(268, 'with')
(258, 'but')
(248, 'for')
(241, 'your')
(231, 'me')
(223, 'lord')
(219, 'as')
(216, 'be')
(213, 'he')
(200, 'what')
(195, 'king')
(195, 'him')
(194, 'so')
(180, 'have')


In [56]:
# (Alternative solution using the key parameter and a lambda expression.)
sorted(word_counts.items(), key=lambda p: -p[1])[:30]

[('the', 1145),
 ('and', 973),
 ('to', 736),
 ('of', 674),
 ('i', 565),
 ('you', 539),
 ('a', 534),
 ('my', 513),
 ('in', 431),
 ('it', 409),
 ('that', 381),
 ('ham', 358),
 ('is', 339),
 ('not', 310),
 ('his', 297),
 ('this', 297),
 ('with', 268),
 ('but', 258),
 ('for', 248),
 ('your', 241),
 ('me', 231),
 ('lord', 223),
 ('as', 219),
 ('be', 216),
 ('he', 213),
 ('what', 200),
 ('king', 195),
 ('him', 195),
 ('so', 194),
 ('have', 180)]

In [None]:
# Alternative solution using the get() method.
sorted(word_counts, key=word_counts.get, reverse=True)[:30]

In [54]:
word_counts.get('queen2', 0)

0

In [55]:
word_counts['queen2']

KeyError: 'queen2'