# Introduction to Python - Data Types II, Functions and Packages

In [1]:
# Authors: Matthias Huber (huber@ifo.de), Alex Schmitt (schmitt@ifo.de)

import datetime
print('Last update: ' + str(datetime.datetime.today()))

Last update: 2017-04-26 08:59:12.504311


## Sets

*Sets* are similar to tuples in that they are immutable. However, elements are not in a particular order, hence you cannot use indices for sets. Additionally, there are not duplicates, so they can be called an "Unordered collections of unique elements" 

In [2]:
# set
B = {4,5,6}
print (B)

{4, 5, 6}


In [3]:
#print(b[0])  # will throw an error!

As for arrays, the length (number of items) can be measured with the function len()

In [4]:
print (len(B))

3


### Methods for Sets

The equivalent to the **append()** method for sets is the **add()** method. Unsurprisingly, adding an element to a set which is already in there will not change the set.

In [5]:
B.add(3)
print(B)
# BTW: adding an element to a set whic/h is already in there will change the set
B.add(4)
print(B)

{3, 4, 5, 6}
{3, 4, 5, 6}


Further methods of sets are the major mathematical/logical operations. Important ones are:
* **intersection:** $A \cap B$
* **union:** $A \cup B$
* **issubset:** $A \subseteq B$
* **issuperset:**$A	\supseteq B$
* **difference:** $A \setminus B$

In [6]:
# Define a second set, "A"
A={1,2,3,4}
print (A)
print (B)
# Intersect and union A with B
print (A.intersection(B))
print (A.union(B))

{1, 2, 3, 4}
{3, 4, 5, 6}
{3, 4}
{1, 2, 3, 4, 5, 6}


In [7]:
# Test whether A is subset or superset of B
print (A.issubset(B))
print (A.issuperset(B))

False
False


In [8]:
# Compute the differences of the sets
print (A.difference(B))
print (B.difference(A))

{1, 2}
{5, 6}


We can also define a new set that is built by an operation of two existing sets, e.g. $C=A \cup B$

In [9]:
C=A.union(B)
print (C)

{1, 2, 3, 4, 5, 6}


In [10]:
# The union of the sets A and B is a superset of B/A now and B/A are subsets of the new set C 
print (C.issuperset(B))
print (C.issubset(B))
print (B.issubset(C))

True
False
True


## Dictionaries

A very important and useful type of arrays are *dictionaries*. Dictionaries are similar to lists, but its entries (*values*) are indexed by names (*keys*) rather than numbers. In other words, dictionaries are *key-value mappings*: they map a key (e.g. 'name') to a value (e.g. the string 'Alex'). Note that both the keys and the values in a dictionary can be of different types (integers, floats, strings, booleans, arrays etc.).

In [11]:
# dictionary
info = {'name': 'Alex', 'age': 34, 'likes_football': True, 'interests': ['Python', 'Economics', 'Game of Thrones']}

print(info)
print(info['name'])
print(info['age'])
print(info['likes_football'])
print(info['interests'])

{'age': 34, 'interests': ['Python', 'Economics', 'Game of Thrones'], 'likes_football': True, 'name': 'Alex'}
Alex
34
True
['Python', 'Economics', 'Game of Thrones']


You can add new key-value pairs to an existing (or an empty) dictionary. 

In [12]:
# add a new entry to an existing dictionary
info['height'] = 1.82
print(info)

# create an empty dictionary and fill it 
residents = dict()
residents['Munich'] = 1.5e+6
residents['Berlin'] = 3.5e+6
residents['London'] = 8.5e+6

print(residents)

{'age': 34, 'interests': ['Python', 'Economics', 'Game of Thrones'], 'likes_football': True, 'height': 1.82, 'name': 'Alex'}
{'Berlin': 3500000.0, 'London': 8500000.0, 'Munich': 1500000.0}


As a list, a dictionary is mutable, i.e. the value its entries can be changed:

In [13]:
residents['Munich'] += 100000
residents['London'] = residents['London'] * 0.9
print(residents)

{'Berlin': 3500000.0, 'London': 7650000.0, 'Munich': 1600000.0}


Like for all Python arrays, **len()** can also be used to determine the lengths (number of key-value pairs) of a dictionary.

In [14]:
print(len(residents))

3


You can also define a new dictionary using a *dictionary comprehension* (i.e. a compact version of a loop). They work similar as list comprehensions. You can iterate through one list, two lists or a list of arrays:

In [15]:
D1 = { x:abs(x) for x in range(-10, 10) }
print(D1)

{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, -2: 2, -10: 10, -9: 9, -8: 8, -7: 7, -6: 6, -5: 5, -4: 4, -3: 3, -1: 1}


In [16]:
D2 = { (x,y):x*y for x in [1,2,3] for y in [1,2,3] }
print(D2)

{(1, 2): 2, (3, 2): 6, (1, 3): 3, (2, 3): 6, (3, 3): 9, (2, 2): 4, (3, 1): 3, (1, 1): 1, (2, 1): 2}


In [17]:
D3 = { k:v for (k,v) in [(3,2),(4,0),(100,1)] }
print(D3)

{100: 1, 3: 2, 4: 0}


With the **zip** function, you don't even need a comprehension to define a dictionary:

In [18]:
names = ['Daenerys', 'Tyrion', 'Arya', 'Samwell']
houses = ['Targaryen', 'Lannister', 'Stark', 'Tarly']

D4 = {name: house for (name, house) in zip(names, houses)}
print(D4)    

D5 = dict(zip(names, houses))
print(D5)

{'Arya': 'Stark', 'Daenerys': 'Targaryen', 'Samwell': 'Tarly', 'Tyrion': 'Lannister'}
{'Arya': 'Stark', 'Daenerys': 'Targaryen', 'Samwell': 'Tarly', 'Tyrion': 'Lannister'}


### Methods for Dictionaries

Both the complete list of keys and of values of a dictionary can accessed by using the **.keys()** and **.values()** methods, respectively:

In [19]:
print(info)

print(info.keys())
print(info.values())

{'age': 34, 'interests': ['Python', 'Economics', 'Game of Thrones'], 'likes_football': True, 'height': 1.82, 'name': 'Alex'}
dict_keys(['age', 'interests', 'likes_football', 'height', 'name'])
dict_values([34, ['Python', 'Economics', 'Game of Thrones'], True, 1.82, 'Alex'])


Another helpful method for dictionaries is **.update(other)**. It is used to extend (update) an existing dictionary with another dictionary or an array of two-item tuples.

In [20]:
de_en={'blau':'blue','grün':'green'}
de_en_add={'rot':'red','gelb':'yello'}
print(de_en)
de_en.update(de_en_add)
print(de_en)
de_en.update([('blau', 'newblue'),('schwarz','black')])
print (de_en)

{'blau': 'blue', 'grün': 'green'}
{'blau': 'blue', 'grün': 'green', 'rot': 'red', 'gelb': 'yello'}
{'blau': 'newblue', 'grün': 'green', 'rot': 'red', 'schwarz': 'black', 'gelb': 'yello'}


In [21]:
print(de_en['schwarz'])

black


## Writing Functions

In order to get an intuition behind the idea of a function in Python (or in any other programming language), recall what a function in Math does, for example $y = f(x) = x^2$: it is a mapping that takes a number $x$, performs some operation on it -- here multiplies it with itself -- and then returns the result as "output" $y$. A function in programming does pretty much the same, with the difference that inputs and outputs can be anything, not just numbers. More formally, a function is a *named sequence of statements* that are executed *when the function is called*. We say that a function *takes* one or more arguments (of any type) and *returns* some kind of output. 

Functions come in two varieties: built-in functions are contained in the Standard Library or some package, and be used right away (if they are part of a module, this has to be imported first, see below). We already encountered some functions, namely **print()**, **type()**, **len()** and **range()**. The full list of built-in functions can be found here: 

In [22]:
import webbrowser     
url = 'https://docs.python.org/3/library/functions.html'
webbrowser.open(url)

True

In addition, you can (and should) write your own functions. Here are three examples. The first one, called **sum_squared** translates the math function $ f(x,y) = x^2 + y^2$ into Python code: it takes two numbers (**int** or **float**) as inputs and returns the sum of its squares. The second function, **reverse_order**, takes a list and returns it in reverse order. The third example just prints a string, and hence does not return anything:

In [23]:
def sum_squared(x, y):
    return x**2 + y**2

def reverse_order(ls):
    return ls[::-1]

def all_men_must_die():
    print("Valar Morghulis!")


In [24]:
print(sum_squared(8, 3))

73


In [25]:
names = ['Daenerys', 'Tyrion', 'Arya', 'Samwell']
names_reverse = reverse_order(names)
print(names)
print(names_reverse)

['Daenerys', 'Tyrion', 'Arya', 'Samwell']
['Samwell', 'Arya', 'Tyrion', 'Daenerys']


In [26]:
all_men_must_die()

## Note: the return value of this function is None (a special type of object)
x = all_men_must_die()
print(x)
print(type(x))

Valar Morghulis!
Valar Morghulis!
None
<class 'NoneType'>


Note that functions are treated by Python another type of object, i.e. the name of a function refers to an object in memory:

In [27]:
print(type(all_men_must_die))

<class 'function'>


#### Function syntax

Some comments about the syntax for writing functions. The *header* of a function consists of the following elements:
1. A function always starts with the keyword **def** (for define or definition). This is followed by the *function name*, which is the choice of the programmer and can be virtually anything - be careful though not to use names that are *already used for built-in functions*!
2. The function name is followed by *parentheses* containing the names for the inputs into the function. Sometimes functions may not take inputs, in which cases the parentheses are left empty (as seen in the third example above).
3. As it is the case with loops and if-statement, the function definition is concluded with a colon (**:**).

The *body* or code block of a function is a sequence of statements that the function should perform. The rules about *indentation* that we discussed in the context of loops above apply here as well. The code block can consist of a single return statement or many lines of code. 

The output that a function gives is determined by the **return** statement. If there is no return statement, the function returns **None**. Note that a function can have arbitrarily many return statements; execution of the function terminates when the first return is hit:

In [28]:
def f(x):
    if x < 0:
        return 'negative'
    return 'nonnegative'

print(f(-3))

negative


#### Why use functions?

Functions are an extremely important tool in computing. User-defined functions help improving the clarity and readability of your code (and make it easier to debug!) by
- separating different strands of logic
- eliminating repetitive code, which makes the program smaller and if you have to make a change, you just have to make it in one place;
- facilitating code reuse across several programs/scripts

In other words, very often a (large) computational problem is broken up into smaller subproblems, which are coded up as functions. The main program then coordinates these functions, calling them to do their job at the appropriate time.

#### docstrings

In order to increase the clarity of your code, it is good practice to include a description about what the function does. Inserting regular comments using "#" would do the trick, but a better way is using *docstrings*, as in the following example. In contrast to regular code, they are written within three quotation marks (**"""**). The great advantage is that you can access the description without actually opening the function (this is very useful when your function is stored in a different file or in an imported package): 

In [29]:
def reverse_order(ls):
    """
    Takes a list and returns it in reverse order
    """
    return ls[::-1]

## get information about the function
reverse_order?

As a side note, using the question mark in connection with a function name to access its docstring also works for in-built Python function. This is useful if you want to check on what a function does and what input arguments it requires.

In [30]:
?len

Note that for more complex functions, it is often considered good practice to include not only a description of the function, but also some information about the input(s) and output(s). For example:

In [31]:
def sum_squared(x, y):
    """
    (float, float) -> float
    
    Returns the sum of two squared numbers.
    """
    
    return x**2 + y**2

### Testing a function using doctest

Suppose you write a function and want to check if it does what it is supposed to do, using some test input. An easy way to do this would be to include a few **print** statements after the function definition. A more elegant way is to include the test cases in the docstring and then run all the tests at once using the **doctest** module (more on what a module is and how to use it in the next section). Using **doctest** is a way to free your code of unnecessary clutter (such as print statements) and thus make it more readable.

When writing the docstring, you need to include both the function call with the test data (using ***>>>***, as in a console) and the expected output: 

In [32]:
def reverse_order(ls):
    """
    Takes a list and returns it in reverse order
    
    >>> reverse_order([-2, 0, 2])
    [2, 0, -2]
    
    >>> reverse_order('Matthias')
    'aihttaM'
    
    """
    return ls[::-1]

Next, we import the **doctest** module and run its **testmod()** function:

In [33]:
import doctest
doctest.testmod()

**********************************************************************
File "__main__", line 8, in __main__.reverse_order
Failed example:
    reverse_order('Matthias')
Expected:
    'aihttaM'
Got:
    'saihttaM'
**********************************************************************
1 items had failures:
   1 of   2 in __main__.reverse_order
***Test Failed*** 1 failures.


TestResults(failed=1, attempted=2)

If all tests delivered the expected results, you will get a one-line message. If one or more tests failed, you will get a report containing information on the expected and actual function output.

One word of warning: when using **doctest** in the Jupyter environment, as above, running **testmod()** will check on the last function you have defined, which can create some confusion if you do the test in a different cell than the function definition. Typically, if you define functions in separate modules, you can include the call to **doctest** in there and then run them automatically (more on this below). 

## Importing Modules and Packages

So far, we have used data types and functions which are part of the core language and which you can use without any additional code. In addition to this core functionalities, the Python standard library also contains *modules*. Modules are basically files that contain additional functions and definitions. In order to use the functions provided by a module, you need to **import** it. We have done this already at the beginning of this tutorial when we imported the module **webbrowser** in order to open a webpage, and in the previous section when we used the **doctest** module. As another example, the following cell imports a module called *random*, which you can use, among other things, to draw a random number from a uniform distribution. Importing the whole module makes all functions available for use in your program. In this case, the name of the function - e.g. *uniform* - must be preceded by the name of the module, i.e. *module_name.function_name*.

In [34]:
import random   # import module

print(random.random()) # draws a random number from a uniform distribution between 0 and 1
print(random.uniform(0,1)) # draws a random number from a uniform distribution between 0 and 1
print(random.uniform(-5,5)) # draws a random number from a uniform distribution between -5 and 5
print(random.randrange(0, 11))  # draws a random integer from 0 to 10 (i.e. excluding the given endpoint)

0.46429433357183514
0.8139462669477247
-2.060021085048888
1


Alternatively, you can import individual functions from a module. Then, calling the name of the function is sufficient. However, I would avoid this syntax for the most part since it may cause potential conflicts with respect to the variable or function names. 

In [35]:
from random import uniform   # import module

print(uniform(0,1)) # draws a random number from a uniform distribution between 0 and 1

0.27169598480660495


The problem set contains examples for other useful modules that are part of the standard library, such as **math** and **time**.

In addition to the functions and modules contained in the standard library, there is a large number of *packages* or *external libraries*. Those are usually written and maintained by external developers and consist of one or more modules. If you have installed the Anaconda distribution of Python, many packages are automatically included, which means you just need to import them. If you have only the core package installed or if you want to use a package that is not part of Anaconda, you will need to download and install it first. 

## Writing and importing  your own modules

Python allows you to define your own modules and import them in the same way as modules from the core language or from other developers. As a simple example, we put the functions that were defined above in an external file called "firstmodule.py". The file can be written in each text editor (and even directly in the Jupyter environment) and has to be saved with the ending ".py". Writing your own modules is another way of enhancing the readability of your code and makes sense in particular for functions that you use in several programs.

You import the module without the ".py" ending:

In [36]:
import firstmodule
firstmodule.sum_squared(2,3)

13

As an aside, note that for modules with long names or that you use very often, you can import them using an abbreviation, as in the example below. We will see this on a regular basis for frequently used modules such as **numpy** or **pandas** later on.

In [37]:
import firstmodule as fm
fm.all_men_must_die()

Valar Morghulis!


#### Reloading modules

As modules can change during their development, they must be updated in the current working environment. We can achieve this using the **reload** module: 

In [38]:
from importlib import reload

In [39]:
# Alternatively there is a so-called magic function that enables automatic reloading/updating of modules:
# %load_ext autoreload
# %autoreload 2

After changing the module, we reload it in the following way, to make the changes active:

In [40]:
reload(fm)

<module 'firstmodule' from '/Users/Alex1/Dropbox/PythonTeaching/firstmodule.py'>

#### Running vs. importing a module

There is a subtle, but important difference between importing a module as above, and running it as you would a normal Python script. Consider the module **secondmodule.py** containing the function **same_start_and_end** that checks whether an array starts and ends with the same value and returns a boolean. 

Running the file has the same effect as defining this function (and all others if they were more) in our Jupyter notebook, as we have done several times above. In more technical terms, it adds the function **same_start_and_end** to Python's *namespace* (which is essentially a directory of all modules, functions and variables that were either defined by us or are built-in).  

In [41]:
%run secondmodule.py
## ignore the output below for the moment!

**********************************************************************
File "/Users/Alex1/Dropbox/PythonTeaching/secondmodule.py", line 12, in __main__.same_start_and_end
Failed example:
    same_start_and_end(s)
Expected:
    True
Got:
    False
**********************************************************************
1 items had failures:
   1 of   4 in __main__.same_start_and_end
***Test Failed*** 1 failures.


Importing the function adds only the name of the module to Python's namespace (here **sm**).

In [42]:
import secondmodule as sm

The difference between running and importing a module is also important in a related context. Suppose you have a module that does not only include functions, but also other content. A typical example would be some test calls to the functions in the module, for example using **doctest**. We don't want to see these calls every time we import the module. There is a neat way to achieve this, exploiting the way Python stores functions and variables. The file **secondmodule.py** contains the following lines at the end:

In [43]:
# if __name__ == '__main__':
#     import doctest
#     doctest.testmod()

The conditional statement **if __name__ == '__main__':** is a way to tell Python to execute the code block below *only if the module is run, but not if it is imported*. That's why above we get a test report when running **secondmodule.py** (note that one of the test cases is misspecified), but not when importing it.  

Hence, running a module is useful in development, when writing the functions for the first time or when adding new test cases. Once we have ensured the functions work properly, we do not need to test them any more, and hence import the module.  

In [44]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3