# Modules, functions and data types

[Modules and imports](#modules-imports)

[Defining functions](#functions)

[Function parameters](#parameters)

[Scope](#scope)

[Recursive functions](#recursion)

[Basic data types](#data-types)

[Immutability and mutability](#mutability)

## Modules and imports
<a id='modules-imports'></a>

A single file containing definitions of Python functions, types and/or variables is known as a __module__. Modules can be imported using the __import__ statement:

In [None]:
import math

We can then access functions and variables in the module:

In [None]:
math.factorial(5)

In [None]:
math.pi

Functions can be imported directly:

In [None]:
from math import factorial, pi

In [None]:
factorial(5)

In [None]:
pi

An import statement can change the name under which some function or variable can be accessed:

In [None]:
from math import factorial as fac

In [None]:
fac(5)

Similarly, module names can also be imported under an alias:

In [None]:
import math as m

In [None]:
m.pi

## Defining functions
<a id='3.2'></a>

__Functions__ are units of a program that can be executed. They can return a value, just like mathematical functions that have a result. Sometimes they do not have a return value, they just perform some task. Functions help us better design programs by being able to break them down to smaller units, just like in everyday life, we can better think in bigger steps, not the details. For example, when I plan my day, I think in tasks like getting up, having breakfast, going to work etc., but I do not think about the details, how I spread butter on the bread and so on. But when I actually start making breakfast, I will have to spread butter on my bread. Similarly, we can construct functions that perform a specific task and can call that when that task is necessary. Another advantage of functions is that they are reusable. For example, if I eat butterbread for breakfast and dinner too, I only have to define once how butterbread is made.

To be able to call a function (= perform a complex task), we first need a function definition (= the description of how the task is performed). A __function definition__ consists of a header specifying the name of the function and listing its parameters using names that will be used to refer to them in the definition. The header is followed by an indented block containing the definition itself, i.e. the lines of code to be executed when the function is called. One or more __return__ statements within the definition indicate when the function should terminate and what value it should return.

In [None]:
def double_plus_one(x):
    return 2*x+1

In [None]:
double_plus_one(10)

If a function terminates - the interpreter reaches the end of the code - without encountering a return statement, it will return None

In [None]:
def hello():
    print("Hello World")

In [None]:
x = hello()

In [None]:
print(x)

Functions are Python objects in their own right and may themselves be passed to other functions. The built-in function _map_ takes as its argument a function and a list, then applies the function to all elements of the list and returns the results in a list. We can thus use the double_plus_one function to generate odd numbers:

In [None]:
for i in map(double_plus_one, range(10)):
    print(i)

Another notable function is _reduce_, which takes as its arguments a two-place function and an iterable, and repeatedly calls the function on the leftmost two elements of the iterable to eventually reduce it to a single value. The example below sums numbers between 1 and 100:

In [None]:
def plus(x, y):
    return x + y

In [None]:
from functools import reduce
reduce(plus, range(1, 101))

It is often the case that we need to define simple functions just so we can pass them to functions such as _map_, _reduce_, etc. This is more convenient if we use __anonymous functions__, also known as __lambda functions__. The plus function could be defined as a lambda function like this:

In [None]:
plus = lambda x, y: x + y

This is equivalent to the previous definition. However, since a single line contains the entire definition (along with its header, it need not be given a name before we pass it to some other function:

In [None]:
reduce(lambda x, y: x + y, range(1, 101))

Similarly, we can generate odd numbers using an anonymous function and _map_:

In [None]:
for i in map(lambda x: 2*x + 1, range(20)):
    print(i)

One more notable function that takes a function as its argument is _filter_. Given a sequence and a function that can process all sequence elements, it will keep only those elements for which function(element) is True (when typecast to _bool_). Here's another way to generate odd numbers, using _filter_:

In [None]:
for i in filter(lambda x: x%2==1, range(40)):
    print(i)

Lambdas are practical for short expressions but are more limited than real functions since their body can only contain expressions not statements. The if statement has an expression variant as well, which is sometimes practical for lambdas. For example, you can find the longest string in a list in the following way:

In [None]:
from functools import reduce

l = ["apple", "pear", "honeymelon", "grape", "mango"]

longest = reduce(lambda x, y: x if len(x) > len(y) else y, l)
print(longest)

## Function parameters
<a id='parameters'></a>

So far we have seen definitions of functions that only allow a fixed number of parameters. An error is raised if a function is called with an incorrect number of parameters:

In [None]:
double_plus_one(3, 5)

In [None]:
hello("world")

A function may take any number of mandatory parameters, these can be followed by optional arguments, known as __keyword arguments__. Keyword arguments must have default values, specified in the function header:

In [None]:
def hello(what="world"):
    print("Hello " + what + "!")

This function has zero mandatory and one optional parameters. It can be called without any parameters:

In [None]:
hello()

Or with one parameter which will become the value of the variable "what" in the definition:

In [None]:
hello("friend")

Finally, it may also be called with by stating explicitly the value of a particular optional argument:

In [None]:
hello(what="friend")

This syntax allows us to specify any subset of optional parameters that a function might take:

In [None]:
def hello(what="world", punct="!"):
    print("Hello " + what + punct)

In [None]:
hello()

In [None]:
hello(punct='.')

In [None]:
hello(what='friend')

In [None]:
hello(what='friend', punct='.')

In [None]:
hello(punct='.', what='friend')

If argument names are not passed to the function, parameters will be interpreted in the order specified by the definition header:

In [None]:
hello("friend", ".")

In [None]:
hello(".", "friend")

In case we'd like to allow for an arbitrary number of arguments, we must indicate this in the definition and then process the arguments as a list:

In [None]:
def hello_multi(*args):
    for name in args:
        hello(name)

In [None]:
hello_multi("world", "friend", "John", "Mary")

## Scope
<a id='scope'></a>

Variable names inside a function definition are __local__ to the given function:

In [None]:
def double(x):
    y = 2*x
    return y

In [None]:
y = 1
x = double(2)
print(x)
print(y)

The variables in the last block, outside of any function definition, are __global__ variables and they are not affected by assigning values to local variables of the same name inside a function. Functions may access global variables, however:

In [None]:
def foo():
    print(y)

In [None]:
foo()

If a function needs to assign a value to a global variable, it must explicitly state the name of the that variable using the keyword __global__

In [None]:
def foo():
    y = 5

In [None]:
y

In [None]:
def foo():
    global y
    y = 5

In [None]:
foo()

In [None]:
y

Unlike in e.g. Java or C++, loops like _while_ and _for_ do not have their own scope:

In [None]:
for x in range(10):
    print(x)

In [None]:
x

Global variables should generally be avoided - every variable should be local to some function (or class). When we decide to use them, e.g. to set some global parameters that can be accessed by various functions, it is customary to give them names in ALL_UPPERCASE so that we do not confuse them with any local variables.

In [None]:
GLOBAL_PARAM = 3
def sum_plus(x, y):
    return x + y + GLOBAL_PARAM

In [None]:
sum_plus(2, 4)

## Recursive functions
<a id='recursion'></a>

Any function definition may contain a call to itself. Functions that call themselves under certain conditions are called __recursive__. A recursive function must ensure that it will terminate under certain conditions and that these conditions will be met eventually. Recursivea algorithms are similar to induction in mathematics. We can solve the problem for a simple cases and if a complex cases is given, we can phrase the solution referring to a less complex case. For example, the factorial of __n__ can be expressed with the factorial of __n-1__: n! = (n-1)! * n

Below is a recursive definition of the factorial operation:

In [None]:
def fac(x):
    if x == 0:
        return 1
    else:
        return x*fac(x-1)

In [None]:
fac(5)

This definition is faulty: it does not check if its argument is a non-negative integer and may therefore never terminate. Python will refuse to recurse beyond 1000 levels, hence the error below:

In [None]:
fac(-1)

The corrected version of the _fac_ function will display an error message and terminate if its argument is not a positive integer:

In [None]:
def fac(x):
    if x < 0 or int(x) != x:
        print("Invalid argument")
    elif x == 1:
        return 1
    else:
        return x*fac(x-1)

In [None]:
fac(-1)

Recursive functions may be used to traverse complex data structures. The following function will collect all elements from a list of lists:

In [None]:
def flatten(some_list):
    flat_list = []
    for element in some_list:
        if type(element) != list:
            flat_list.append(element)
        else:
            flat_list += flatten(element)
    return flat_list        

In [None]:
l = [1, [2, 3], [4, [5, 6]], [7], [8, 9], [10]]

In [None]:
flatten(l)

## Basic data types
<a id='data-types'></a>

### Numbers (int, float, complex)

__Ints__ can be of arbitrary size. __Floats__ store numbers using the C type _double_. An exceptional feature of Python is that it has built-in support for complex numbers as well through the __complex__ type.

Converting floats to integers is equivalent to a floor operation:

In [None]:
int(3.74)

Floats can also be __round__ed to the nearest whole number, this function also returns a float.

In [None]:
round(3.74)

The __complex__ type reresents complex numbers. Literals take the form of "a+bj", where a and b are the real and imaginary part of the complex number, respectively.

In [None]:
type(1+2j)

In [None]:
complex(1.5)

In [None]:
1+2j+3+4j

In [None]:
type(1+2j)

Operations over complex numbers are implemented in the __cmath__ module. E.g. to compute the square root of a negative number, one must use cmath.sqrt:

In [None]:
from cmath import sqrt
sqrt(-1)

### Sequences (string, list, tuple, dict)


#### Strings

String literals may be enclosed between either single quotes (') or double quotes ("), although single quotes are preferred.

Two common operations that we use on strings are _strip_ and _split_. We have already seen strip: it cuts whitespaces from the beginning and the end of the string. For example:

In [None]:
"      A string with lots of whitespaces in the beginning and in the end.       ".strip()

Split can split the string at whitespace characters and returns a list:

In [None]:
"Some   words\tseparated\nby    whitespaces.".split()

We can also specify at which character we want to split the string:

In [None]:
"apple,pear,orange,melon".split(",")

We can also split at a substring:

In [None]:
"apple, pear, orange, melon".split(", ")

One important limitation of split is that we cannot tell it to split the string at any of a series of characters. For example, we cannot use it to split a text to words because words can be delimited by spaces, comas, dots, question marks and so on. For this we must use the re module's _re.split_ function. This function takes a pattern in the form of a regular expression and a string.

What are regular expressions? They are patterns that can capture certain strings that match the specified pattern. These pattern can be used for example for advanced search or input validation (= checking whether the input the user entered is in the expected format). Let's check some simple regular expressions!

- An ordinary letter or number matches itself. For example the pattern "giraffe" only matches the exact word "giraffe".

- A dot matches any single character. For example, the pattern "j.y" matches "joy", "jay", "j3y", "jzy", "j_y" and lots of other words. Any single character can stand at the place of the dot. If we want exactly a dot, we must escape it: "example\\.com" will only match "example.com".

- A pipe character (|) can be used to describe alternatives. For example, "a|b" matches "a" and "b". Or "ba|ottle" matches "battle" or "bottle". If we want to group more characters, we can use parentheses, for example "s(ee|oi)l" will match "seel" or "soil". If we want to match the pipe character exactly, we can escape it.

- The ? character makes the previous mark optional: beetles? will match "beetle" and "beetles". The * character marks zero or unlimited repetitions and + marks one or unlimited repetitions.

Now that we know the basics of regular expressions, we can write a regular expression that captures all the characters that may delimit a word, so we can split text to words. With regular expressions, it is practical to use r-strings. These are string literals prefixed with an r character. When you write regular expressions, you often escape characters as noted above. However, to write a backslash inside a string, you have to escape that backslash as well. This would mean that all backslashes would have to be written doubled. With r-strings, you do not have escape special characters for the Python interpreter. See: https://www.codevscolor.com/python-raw-string/

In [None]:
import re

text = "This a long text, I swear. Does it have several punctuation marks? Yes, it does!"
words = re.split(r" |\.|,|\?|!", text)
print(words)

<hr />

__Lists__ are the most common sequence type in Python, they will be discussed in greater detail later on. List literals contain square brackets surrounding elements separated by commas.

In [None]:
l = [3, "foo", 2.4, True, (3+2j), None]

A list can hold elements of any type, including other lists:

In [None]:
[[1, 2], [3, 4], [5, 6]]

<hr />

__Tuples__ are also sequences of arbitrary elements, tuple literals enclose the elements between round brackets:

In [None]:
t = (1, 2, "penguin", 3)

A crucial difference between lists and tuples is that of mutability. (See later.) In short: tuples do not support item assignment: once created, their contents will remain unchanged for the object's lifetime. This makes them hashable, allowing us to e.g. sort them, search them effectively, etc.

Another difference between lists and tuples are their typical use cases. We can think of lists as structures that impose a given __ordering__ on an arbitrary number of elements. Tuples, on the other hand, impose __structure__, i.e. positions within a tuple all have some meaning. For example, it would be unusual to represent points in 2D space as lists of two floats each, because these points are ordered pairs of two real numbers, not a collection of numbers that happens to be 2 in size and happens to have a particular order.

<hr />

__Dictionaries__ are maps: they store an unordered set of __keys__, each of which is mapped to a __value__. Dictionary literals contain comma-separated lists of colon-separated key-value pairs:

In [None]:
d = {'a': 1, 'b': 2, 'c': 3}

The value for a given key is accessed by key lookup, using square brackets as if we were indexing a list or string:

In [None]:
d['a']

In [None]:
d['b'] = 5

In [None]:
d

Looking up a key that does not exist raises an error:

In [None]:
d['d']

There are many ways to avoid KeyErrors, including alternative methods for accessing a dictionary and built-in types extending dictionaries. All these will be discussed later.

<hr />

Strings, lists, tuples, dictionaries are all __sequences__ and are treated similarly by many common operations, most of which will be discussed later. A notable example is that it is possible to __iterate__ over each of them with a for loop.

Another shared property of sequences is that we can test for the membership of any element in them, i.e. ask whether any given element is contained by some sequence or not. This is achieved using the keyword __in__.

In [None]:
3 in [2,3,4]

In [None]:
'a' in 'penguin'

In [None]:
"d" in {"a": 1, "b": 2, "c": 3}

In [None]:
2.0 in (2, 4)

### Sets
<a id='sets'></a>

Sets store a collection of elements without imposing an ordering and without measuring multiplicity. In other words, every python object is either a member of a set or not, and that's all a set will ever tell us. Sets can be created from any sequence, an empty set can be created using the _set_ method:

In [None]:
s = set()
s

In [None]:
l = [1,2,3,2,2,3,4]

In [None]:
s = set(l)

In [None]:
s

__Note:__ while set literals use curly brackets like above, an empty pair of curly brackets ({}) represents an empty dictionary, NOT an empty set. A literal empty set is "set()" or "set([])"

Sets implement the __add__ method for adding elements:

In [None]:
s = set()
s.add(3)
s.add(2)
s.add(3)
s.add(3)
s

The __remove__ method removes an element from a set and raises an error if the element is not present:

In [None]:
s.remove(3)

In [None]:
s.remove(3)

The __discard__ method, on the other hand, will remove an element if it is a member of the set and leave the set unchanged if it is not, without raising an error

In [None]:
s.discard(2)

In [None]:
s

In [None]:
s.discard(2)

Standard set operations such as union, intersection, etc. are all implemented for sets in python:

In [None]:
s1 = set(range(0, 40, 5))
s2 = set(range(0, 40, 7))

In [None]:
s1

In [None]:
s2 

In [None]:
s1.union(s2) 

In [None]:
s1.intersection(s2)

### Booleans

Objects of the __boolean__ type can take one of two values: True and False. These are the boolean literals.

In [None]:
type(True)

Any Python object can be converted to a boolean, as we have seen earlier. This means that any expression can be used as a condition in e.g. an if statement:

In [None]:
l = []
m = 3
if l:
    print("bool(l) is True")
if m:
    print("bool(m) is True")

When used in conditions (i.e. typecast to booleans), zero values of numeric types (0, 0L, 0.0, 0j), empty sequences, ('', (), [], etc.), empty dictionaries ({}) all evaluate to False, and so does None, an object of its own type NoneType

Boolean operators __and, or__ and __not__ work as expected on boolean values:

In [None]:
True and True

In [None]:
True and False

In [None]:
True or False

In [None]:
not True

In [None]:
not False

When using these operators on other types, we must understand what they actually mean:

X or Y evaluates to X if bool(X) is True and Y otherwise

In [None]:
0 or 4

In [None]:
3 or 4

In [None]:
"penguin" or "giraffe"

In [None]:
"giraffe" or "penguin"

"X and Y" evaluates to X if bool(x) is False and Y otherwise:

In [None]:
3 and 5

In [None]:
"penguin" and "giraffe"

In [None]:
"" and "penguin"

"not X" evaluates to "not bool(X)"

In [None]:
not 0

In [None]:
not 3

## Immutability and mutability
<a id='mutability'></a>

An object type is __mutable__ if its contents can be modified "in place". Of the types mentioned so far, lists and dictionaries are mutable, i.e. if some elements of a list are modified, this does not create a new list object in the background:

In [None]:
l = [1,2,3,4,5]

In [None]:
m = l

In [None]:
l[2] = 7

In [None]:
l

In [None]:
m

Strings, ints, floats, tuples, on the other hand, are __immutable__ - if they are modified, a new object is created in the background:

In [None]:
x = "a"
y = x

In [None]:
x *= 2

In [None]:
x

In [None]:
y

In [None]:
x = [1, 2]
y = list(x)
x == y

Another way to demonstrate this behavior is by using the built-in function __id__, which returns a unique identifier for all Python objects.

In [None]:
l = [1,2,3]

In [None]:
id(l)

In [None]:
l2 = [1,2,3]

In [None]:
id(l2)

In [None]:
l2.append(4)

In [None]:
id(l2)

In [None]:
l == l2

In [None]:
x = 3

In [None]:
id(x)

In [None]:
x += 1

In [None]:
id(x)

An important consequence of immutability is that tuples do not support item assignment:

In [None]:
t = (1, 2, 3)

In [None]:
t[1] = 7

Lists, on the other hand, are not __hashable__, so e.g. they cannot be used as keys in a dictionary:

In [None]:
d = {}

In [None]:
d[[1,2]] = 3

In [None]:
d[(1, 2)] = 3

In [None]:
d