# Agenda

- Data structures
 * `list` (recap)
 * `tuple`    
 * `dict`
 * `set`
 * generators
- Comprehensions
 * `list` comprehensions
 * `dict` comprehensions 
 * generator comprehensions
- Functions
 * parameters, arguments and return types
 * named parameters
 * default parameters
 * higher order functions
 * [`raise` errors]

# Data structures
## Lists

Lists hold an arbitrary number of values that can be of arbitrary, mixed types. You can appent things to a list, remove things from a list or create new lists, e.g. by slicing them.

In [1]:
# create by literal brackets [ ]

numbers = [15, 25, 35, 45, 55, 65, 75, 85]

In [2]:
# loop a list

for n in numbers:
    print(n)

In [3]:
# test element membership

35 in numbers

In [4]:
# slice a list

numbers[0:4:2]

In [5]:
# call methods on a list

numbers.append(95)
numbers

## Tuples

Tuples are quite similar to lists as they
- may contain mixed objects of any size
- can be indexed
- can be sliced
- can be nested

But there is a key difference: Tuples are **immutable**. Once created, they can never be updated or changed in any way. Because Python knows their dimensions up front, there is no need for dynamic resizing, making tuples faster to process.

Tuples are great for storing data with a well-known, heterogeneous structure of fixed dimensions, whereas lists are great for storing homogeneous things of arbitrary length.

In [6]:
# create tuple by tuple literal parantheses ( )

me = ("Horst", "Schneider", 34)
me

In [7]:
# if you create a one-dimensional tuple, you need a trailing comma!

x = (1,)
x

In [8]:
# type is 'tuple'

type(me)

In [9]:
# index 

me[2]

34

In [10]:
# length

len(me)

3

In [11]:
# membership test

"Horst" in me

True

In [12]:
# slice - creates a new tuple

me[0:2]

('Horst', 'Schneider')

In [13]:
# some Python fun

me[::-1]

(34, 'Schneider', 'Horst')

In [14]:
# loop entries

for fact in me:
    print(fact)

Horst
Schneider
34


In [15]:
# unpacking

first_name, surname, age = me

print(first_name)
print(surname)
print(age)

Horst
Schneider
34


Unpacking also works for lists! But it's quite a dangerous operation as you can't know the dimensions of a list upfront. Due to the immutable, fixed-dimensions nature of tuples, unpacking is generally safer, although Python will not stop you from shooting yourself in the foot ;-).

In [16]:
# partial unpacking, discard remainder

first_name, *rest = me

print(first_name)
print(rest)  # it's a list!

Horst
['Schneider', 34]


In [17]:
# ... the other way around

*_, age = me

print(age)
print(rest)

34
['Schneider', 34]


In [18]:
# convert list to tuple

vertex = [1, 3, -7]
tuple(vertex)

(1, 3, -7)

In [19]:
# convert tuple to list

list(me)

['Horst', 'Schneider', 34]

In [20]:
# tuples offer no way to mutate them in any way!

dir(tuple)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index']

In [21]:
# __add__ allows you to combine tuples! (Note the comma)

me + ("Mannheim",)

('Horst', 'Schneider', 34, 'Mannheim')

In [22]:
# __mul__ allows for this:

me * 2

('Horst', 'Schneider', 34, 'Horst', 'Schneider', 34)

In [23]:
# tuples are orderable! Items are compared sequentially.

x = (1, 2, 5)
y = (2, 6, 8)

x < y

True

In [24]:
# tuples can be compared

a = (1, 2, 3)
b = (1, 2, 3)

a == b

True

## Dictionaries

Dictionaries store values assigned to a key. Values can be queried, updated or removed from a dicitionary.

In [25]:
# create a dict by dict literal braces { }
# keys must be str, values can be of any type! 

horst = {
    "first_name": "Horst",
    "surname": "Schneider",
    "age": 34,
    "languages": ["Javascript", "Python", "C#"]
}

horst

{'first_name': 'Horst',
 'surname': 'Schneider',
 'age': 34,
 'languages': ['Javascript', 'Python', 'C#']}

In [26]:
# type is 'dict'

type(horst)

dict

In [27]:
# query for a key

horst["age"]

34

In [28]:
# KeyError if key not found

try:
    horst["city"]
except KeyError as e:
    print(type(e), e)

<class 'KeyError'> 'city'


In [29]:
# set a new key

horst["city"] = "Mannheim"
horst

{'first_name': 'Horst',
 'surname': 'Schneider',
 'age': 34,
 'languages': ['Javascript', 'Python', 'C#'],
 'city': 'Mannheim'}

In [30]:
# getting older :-(

horst["age"] = 35
horst

{'first_name': 'Horst',
 'surname': 'Schneider',
 'age': 35,
 'languages': ['Javascript', 'Python', 'C#'],
 'city': 'Mannheim'}

In [31]:
# learning new stuff :-)

horst["languages"].append("Ruby")
horst

{'first_name': 'Horst',
 'surname': 'Schneider',
 'age': 35,
 'languages': ['Javascript', 'Python', 'C#', 'Ruby'],
 'city': 'Mannheim'}

In [32]:
# get all keys

horst.keys()

dict_keys(['first_name', 'surname', 'age', 'languages', 'city'])

In [33]:
# get all values

horst.values()

dict_values(['Horst', 'Schneider', 35, ['Javascript', 'Python', 'C#', 'Ruby'], 'Mannheim'])

In [34]:
# get both - it's tuples!

horst.items()

dict_items([('first_name', 'Horst'), ('surname', 'Schneider'), ('age', 35), ('languages', ['Javascript', 'Python', 'C#', 'Ruby']), ('city', 'Mannheim')])

In [35]:
# membership works - but for keys!

"age" in horst

True

In [36]:
# ... but not for values

35 in horst

False

In [37]:
# ... unless explicitily asked for!

35 in horst.values()

True

In [38]:
# dicts have a length - keys and values are 1:1, so len() means both
len(horst)

5

In [39]:
# loop keys

for key in horst:
    print(key)

first_name
surname
age
languages
city


In [40]:
# loop keys and values

for key, value in horst.items():
    print(f"'{key}': {value}")

'first_name': Horst
'surname': Schneider
'age': 35
'languages': ['Javascript', 'Python', 'C#', 'Ruby']
'city': Mannheim


In [41]:
# convert to list - returns the keys!

list(horst)

['first_name', 'surname', 'age', 'languages', 'city']

In [42]:
# convert to list - the values (will be nested)

list(horst.values())

['Horst', 'Schneider', 35, ['Javascript', 'Python', 'C#', 'Ruby'], 'Mannheim']

In [43]:
# can be converted to a tuple as well! (again, takes the keys by default)

tuple(horst)

('first_name', 'surname', 'age', 'languages', 'city')

In [44]:
# Nested tuples - brain explodes

tuple(horst.items())

(('first_name', 'Horst'),
 ('surname', 'Schneider'),
 ('age', 35),
 ('languages', ['Javascript', 'Python', 'C#', 'Ruby']),
 ('city', 'Mannheim'))

## Sets

Sets store a set of distinct values.

In [45]:
# create a dict by set literal braces { } 
# notice how duplicates are immediately removed
# sets do not have an order!

nums = { 1, 1, 2, 5, 5, 3, 4 }
nums

{1, 2, 3, 4, 5}

In [46]:
# type is 'set'

type(nums)

set

In [47]:
# membership test

1 in nums

True

In [48]:
6 in nums

False

In [49]:
# len (more like "size")

len(nums)

5

In [50]:
# no way to index a set, as it does not have any order

try:
    nums[0]
except Exception as e:
    print(e)

'set' object is not subscriptable


In [51]:
# compare sets

{2, 1} == {1, 2}

True

In [52]:
# loop

for v in a:
    print(v)

1
2
3


In [53]:
# add value to set

nums.add(6)
nums

{1, 2, 3, 4, 5, 6}

In [54]:
# update set with multiple values

nums.update((7, 8, 9))
nums

{1, 2, 3, 4, 5, 6, 7, 8, 9}

In [55]:
# remove a value - safe even if value not in set

nums.discard(6)
nums

{1, 2, 3, 4, 5, 7, 8, 9}

In [56]:
# remove a value - error if value not in set

nums.remove(5)
nums

{1, 2, 3, 4, 7, 8, 9}

In [57]:
# Fun Fact: No city in Germany is farther away from an Autobahn than Salzwedel (at least 65km)
# https://www.salzwedel.de/de/tourismus/an-und-abreise.html

has_nature = { "wuppertal", "salzwedel", "freudenstadt" }
autobahn_nearby = { "wuppertal", "ludwigshafen", "mannheim" }

In [58]:
# union (combine two sets)

# All the cities that either have an Autobahn or nature nearby

has_nature | autobahn_nearby

{'freudenstadt', 'ludwigshafen', 'mannheim', 'salzwedel', 'wuppertal'}

In [59]:
has_nature.union(autobahn_nearby)

{'freudenstadt', 'ludwigshafen', 'mannheim', 'salzwedel', 'wuppertal'}

In [60]:
# intersection (shared by a and b)

# Cities that have Autobahn as well as nature nearby

has_nature & autobahn_nearby

{'wuppertal'}

In [61]:
has_nature.intersection(autobahn_nearby)

{'wuppertal'}

In [62]:
# difference (in a, but not in b)

# I want to only have the Autobahn nearby, I don't care about nature

autobahn_nearby - has_nature

{'ludwigshafen', 'mannheim'}

In [63]:
autobahn_nearby.difference(has_nature)

{'ludwigshafen', 'mannheim'}

In [64]:
# complement / symmetric difference (exlusively in a or b, but not in both)

# If you want to settle for either Autobahn or nature, but NOT BOTH, Wuppertal is not for you.

autobahn_nearby ^ has_nature

{'freudenstadt', 'ludwigshafen', 'mannheim', 'salzwedel'}

In [65]:
autobahn_nearby.symmetric_difference(has_nature)

{'freudenstadt', 'ludwigshafen', 'mannheim', 'salzwedel'}

In [66]:
# Subsets

# set { "mannheim" } is a subset of all cities with "Autobahn nearby"

{ "mannheim" } < autobahn_nearby

True

In [67]:
# Supersets

# Having an Autobahn seems to correlate with being an industrial town...

has_huge_bridge = { "mannheim", "wuppertal", "ludwigshafen", "nistertal" }
has_huge_bridge > autobahn_nearby

True

In [68]:
# convert to set

set([1, 2, 1, 2])

{1, 2}

In [69]:
# convert to tuple

tuple(has_nature)

('freudenstadt', 'salzwedel', 'wuppertal')

In [70]:
# convert to list

list(autobahn_nearby)

['mannheim', 'ludwigshafen', 'wuppertal']

In [71]:
# use for duplicate checks checks

len(set("horst")) == len("horst")

True

In [72]:
len(set("tamas")) == len("tamas")

False

## Generators

Generators produce values "on the fly". No values are stored in memory upfront, only when you request them one by one.

In [73]:
three = range(3)
three

range(0, 3)

In [74]:
# Not a list, but a range!

type(three)

range

In [75]:
# resolve to a list

list(three)

[0, 1, 2]

In [76]:
# loop

for number in three:
    print(number)

0
1
2


In [77]:
# manually resolve

iter(three)

<range_iterator at 0x7f0acc1c51b0>

In [78]:
# Create iterator. Iterators can only be consumed once!

iterator = iter(three)
next(iterator)

0

In [79]:
next(iterator)

1

In [80]:
next(iterator)

2

In [81]:
try:
    next(iterator)
except StopIteration as e:
    print("Done")

Done


# Comprehensions

## List comprehension

Like SQL for lists. A mini language for value selection and projection.

In [82]:
ten = range(10)
ten

range(0, 10)

In [83]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [84]:
# Brackets literal (mirrors list creation)

[n for n in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [85]:
# with selection (filter condition)

[n for n in range(10) if n > 5]

[6, 7, 8, 9]

In [86]:
# with projection (transform source value to destination value)

[n ** 2 for n in range(10) if n > 5]

[36, 49, 64, 81]

In [87]:
# with projection (transform source value to destination value)

[num ** 2 for num in range(10) if num > 5]

[36, 49, 64, 81]

In [88]:
# Break complex statements into multiple lines

[
    num ** 2     # SELECT SQUARE(num) 
    for num 
    in range(10) # FROM range(10)
    if num > 5   # WHERE num > 5
]

[36, 49, 64, 81]

In [89]:
# multiple values

census = [
    ("Mannheim", 320_000),
    ("Wuppertal", 350_000),
    ("Dresden", 550_000),
    ("Nistertal", 1322)   
]

[
    f"{city}: {population}" 
    for city, population 
    in census
]

['Mannheim: 320000', 'Wuppertal: 350000', 'Dresden: 550000', 'Nistertal: 1322']

In [90]:
# filter down a list 

remote_villages = [
    district 
    for district, population 
    in census 
    if population < 2500
]

remote_villages

['Nistertal']

In [91]:
# no need to do this (note the implicit unpacking of tuples!)

remote_villages = []

for district, population in census:
    if (population < 5000):
        remote_villages.append(district)

remote_villages

['Nistertal']

## Dictionary comprehension

Quite similar to list comprehensions, but you need to handle key and value!

In [92]:
census

[('Mannheim', 320000),
 ('Wuppertal', 350000),
 ('Dresden', 550000),
 ('Nistertal', 1322)]

In [93]:
# Braces literal (mirrors dict creation)

census_dict = {
    city: population
    for city, population
    in census
}

census_dict

{'Mannheim': 320000, 'Wuppertal': 350000, 'Dresden': 550000, 'Nistertal': 1322}

In [94]:
# Filter

hip_urban_megacities = {
    city: population
    for city, population
    in census
    if population >= 350_000
}

hip_urban_megacities

{'Wuppertal': 350000, 'Dresden': 550000}

In [95]:
# Project key and value to new representation

hip_urban_megacities = {
    city.lower(): int(population / 1000)
    for city, population
    in census
    if population >= 350_000
}

hip_urban_megacities

{'wuppertal': 350, 'dresden': 550}

## Generator comprehensions

In [96]:
evens_generator = (
    num
    for num 
    in range(7) 
    if num % 2 == 0
)

evens_generator

<generator object <genexpr> at 0x7f0acc1b3900>

In [97]:
list(evens_generator)

[0, 2, 4, 6]

In [98]:
# generator instances can be only consumed once!

list(evens_generator)

[]

In [99]:
# can be chained! No intermediate lists stored in memory.

cities = ["mannheim", "wuppertal", "nistertal", "dresden"]
uppers = (city.upper() for city in cities)
abbreviated = (city[:3] for city in uppers)

list(abbreviated)

['MAN', 'WUP', 'NIS', 'DRE']

In [100]:
# Efficient if you only partially process things

cities = ["mannheim", "wuppertal", "nistertal", "dresden"]
uppers = (city.upper() for city in cities)
abbreviated = (city[:3] for city in uppers)
first = (next(abbreviated) for _ in range(1))

list(first)

['MAN']

# Functions

Functions are parametrized chunks of code. They can be called with a set of arguments that parameters are being substituted for and may - or may not - yield a result. Many definitions of functions focus on the "reusability" of the code. What's way more important from a conceptual point of view is that functions are great for abstracting things and to introduce concepts into code, assigning a name to each concept (the function name).

In [101]:
# A simple function definition

def is_capitalized(word):
    """
    Checks whether word is capitalized, 
    i.e. starts with an uppercase letter.
    """
    initial = word[0]
    
    return initial == initial.upper()

In [102]:
# Call a function

is_capitalized("Hallo")

True

In [103]:
is_capitalized("hallo")

False

In [104]:
# A "void" function (not really, it returns the constant `None`)
names = ["horst", "tamas"]

def clear(a_list):
    """
    Clears a list. Returns nothing!
    """
    a_list.clear()
    
result = clear(names)

In [105]:
names

[]

In [106]:
type(result)

NoneType

In [107]:
result == None

True

In [108]:
# More than one parameter

def abbreviated(text, length):
    return text[0:length] + "..."

In [109]:
# Call with two arguments

abbreviated("Sphinx of black quartz, judge my vow!", 10)

'Sphinx of ...'

In [110]:
# Positional and keyword argument

abbreviated("Sphinx of black quartz, judge my vow!", length=10)

'Sphinx of ...'

In [111]:
# This would fail with a syntax error - positional arguments come first, then come the keyword arguments
# abbreviated(text="Sphinx of black quartz, judge my vow!", 10)

In [112]:
# All arguments must be provided - you'll get a TypeError otherwise

# abbreviated("Sphinx of black quartz, judge my vow!")

In [113]:
# Default arguments

def abbreviated(text, length=8):
    return text[0:length] + "..."

In [114]:
# Uses default argument length=10

abbreviated("Sphinx of black quartz, judge my vow!")

'Sphinx o...'

In [115]:
# Override default argument by position

abbreviated("Sphinx of black quartz, judge my vow!", 20)

'Sphinx of black quar...'

In [116]:
# Override default argument by keyword

abbreviated("Sphinx of black quartz, judge my vow!", length=20)

'Sphinx of black quar...'

In [117]:
# Higher order functions - accept function as an argument

def words(sentence, transform, separator=" "):
    return [
        transform(word) 
        for word 
        in sentence.split(separator)
    ]

In [118]:
def emojize(word):
    replacements = {
        "dog": "🐶",
        "fox": "🦊"
    }
    
    return replacements.get(word) or word

In [119]:
# Pass a function pointer

sentence = "the quick brown fox jumps over the lazy dog"
words(sentence, emojize)

['the', 'quick', 'brown', '🦊', 'jumps', 'over', 'the', 'lazy', '🐶']

In [120]:
def parse(csv, skip_header, value_separator, line_separator="\n"):
    rows = [row for row in csv.split(line_separator) if row]
    
    if skip_header:
        rows = rows[1:]
            
    return [tuple(row.split(value_separator)) for row in rows]

In [121]:
csv = """
first_name,surname,age
horst,schneider,34
tamas,janusko,26
lars,,
"""

parse(csv, skip_header=True, value_separator=",", line_separator="\n")

[('horst', 'schneider', '34'), ('tamas', 'janusko', '26'), ('lars', '', '')]

In [122]:
# Return a function pointer

def create_parser(skip_header, value_separator, line_separator):
    """
    Wraps arguments and creates a new function with a closure to these arguments
    """
    def parser(csv):
        return parse(csv, skip_header, value_separator, line_separator)
    
    return parser

In [123]:
# the result of create_parser() is a function!
custom_parser = create_parser(False, ";", "\n")
type(custom_parser)

function

In [124]:
# you can now invoke the parser; the whole parse specification is abstracted

csv = """
first_name;surname;age
horst;schneider;34
tamas;janusko;26
lars;;
"""

custom_parser(csv)

[('first_name', 'surname', 'age'),
 ('horst', 'schneider', '34'),
 ('tamas', 'janusko', '26'),
 ('lars', '', '')]