**Autor:** Andrej Gajdoš  <br> 
_[Ústav matematických vied](https://www.upjs.sk/prirodovedecka-fakulta/ustav/umv/), [Prírodovedecká fakulta](https://www.upjs.sk/prirodovedecka-fakulta/), Univerzita Pavla Jozefa Šafárika v Košiciach,_ <br> 
email: [andrej.gajdos@upjs.sk](mailto:andrej.gajdos@upjs.sk)
***  

**_Tento materiál vznikol za podpory grantu VVGS-2022-2412._**

***

**<font size=6 color=gold> Introduction to Python</font>**  

<a id=table_of_contents></a>
###  Table of Contents 


* [What is Python?](#python) - a brief description of Python


* [Python as a calculator](#calculator) - basic operations and arithmetic, getting help


* [Strings](#strings) - manipulations with strings


* [Data structures](#data_structures) - fundamental data structures 


* [Flow control tools](#flow) - loops, conditionals, list/set comprehensions


* [Functions](#functions) - defining functions


* [Code profiling](#profiling) - basic profiling of Python code


* [Useful Python libraries](#libraries) - selected Python libraries


* [References](#references)


**To get back to the contents, use <font color=brown>the Home key</font>.**

***
<a id=python></a>
# <font color=brown> What is Python?</font>

Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.

The Python interpreter and the extensive standard library are freely available in source or binary form for all major platforms from the Python Web site, https://www.python.org/, and may be freely distributed. The same site also contains distributions of and pointers to many free third party Python modules, programs and tools, and additional documentation.

The Python interpreter is easily extended with new functions and data types implemented in C or C++ (or other languages callable from C). Python is also suitable as an extension language for customizable applications.

***
<a id=calculator></a>
# <font color=brown> Python as a calculator</font>

Variables are typed dynamically.

In [8]:
# b is an integer
b = 1 
print(b, type(b))

1 <class 'int'>


In [10]:
# now be is a float
b = 2.0*b 
print(b, type(b))

2.0 <class 'float'>


Additional types of numbers include `complex`, `Decimal` and `Fraction`.

In [16]:
# complex number
3+5j

(3+5j)

In [1]:
(50 - 5*6) / 4

5.0

Division always returns a floating point number. 

In [2]:
8 / 5

1.6

The `%` operator returns the remainder of the division. 

In [3]:
17 % 3

2

With Python, it is possible to use the `**` operator to calculate powers.

In [4]:
2 ** 7

128

The equal sign `=` is used to assign a value to a variable. Afterwards, no result is displayed before the next interactive prompt.

In [5]:
width = 20
height = 5 * 9
width * height

900

Examples of 'core' Python functions.  

In [74]:
abs(-1)

1

In [75]:
max(1, 2, 3)

3

Python also includes many mathematical functions which are contained for example in library *math*. At first we need to **load**  this **library** in our notebook so that we can call those (mathematical) functions. 

In [64]:
import math

Run the following command to get some information about the library. The same principle is valid for Python **functions** (involved in libraries) and **classes**. Moreover, there is a function called `help` to provide more information, detailed description and some examples concerning particular function.

In [18]:
# info about math library
math?

In [23]:
# info about sqrt function
math.sqrt?

In [22]:
# help 
help(math.sqrt)

Help on built-in function sqrt in module math:

sqrt(x, /)
    Return the square root of x.



Let's calculate the squared root of 2: $\sqrt{2}$.

In [14]:
math.sqrt(2)

1.4142135623730951

What is the value of $\sin(\pi)$?

In [15]:
math.sin(math.pi)

1.2246467991473532e-16

***
<a id=strings></a>
# <font color=brown> Strings</font>

Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes `'...'` or double quotes `"..."` with the same result. `\` can be used to escape quotes. 

In [25]:
'spam eggs'

'spam eggs'

In [28]:
"spam eggs"

'spam eggs'

In [27]:
'"Yes," they said.'

'"Yes," they said.'

In [26]:
# use \' to escape the single quote
'doesn\'t'

"doesn't"

The `print` function produces a more readable output, by omitting the enclosing quotes and by printing escaped and special characters.

In [29]:
'"Isn\'t," they said.'

'"Isn\'t," they said.'

In [30]:
print('"Isn\'t," they said.')

"Isn't," they said.


With `print`, `\n` produces a new line. 

In [33]:
s = 'First line.\nSecond line.'
print(s)

First line.
Second line.


If you don't want characters prefaced by `\` to be interpreted as special characters, you can use raw strings by adding an r before the first quote.

In [34]:
print('C:\some\name') # here \n means newline!

C:\some
ame


In [35]:
print(r'C:\some\name')  # note the r before the quote

C:\some\name


Two or more string literals (i.e. the ones enclosed between quotes) next to each other are automatically concatenated.

In [36]:
'Py' 'thon'

'Python'

This feature is particularly useful when you want to break long strings. 

In [38]:
text = ('Put several strings within parentheses '
...         'to have them joined together.')
text

'Put several strings within parentheses to have them joined together.'

If you want to concatenate variables or a variable and a literal, use `+` .

In [39]:
prefix = 'Py'
prefix + 'thon'

'Python'

Strings can be indexed (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one.

In [41]:
word = 'Python'
word[0], word[5]

('P', 'n')

Indices may also be negative numbers, to start counting from the right. 

In [42]:
word[-1], word[-2]

('n', 'o')

Characters from position 0 (included) to 2 (excluded).

In [43]:
word[0:2]

'Py'

In [44]:
word[:2] + word[2:]

'Python'

***
<a id=data_structures></a>
# <font color=brown> Data structures</font>

### Lists

The most versatile data structure is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.

In [12]:
squares = [1, 4, 9, 16, 25]
squares

[1, 4, 9, 16, 25]

Like strings (and all other built-in sequence types), lists can be indexed and sliced. 

In [13]:
# indexing returns the item
squares[0]

1

In [14]:
# indexing returns the item
squares[-1]

25

In [15]:
# slicing returns a new list
squares[-3:]

[9, 16, 25]

Lists also support operations like concatenation. 

In [17]:
squares + [36, 49, 64, 81, 100]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Unlike strings, which are immutable, lists are a mutable type, i.e. it is possible to change their content. 

In [19]:
cubes = [1, 8, 27, 65, 125]
cubes

[1, 8, 27, 65, 125]

In [20]:
cubes[3] = 64
cubes

[1, 8, 27, 64, 125]

Assignment to slices is also possible, and this can even change the size of the list or clear it entirely. 

In [21]:
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
letters

['a', 'b', 'c', 'd', 'e', 'f', 'g']

In [23]:
# replace some values
letters[2:5] = ['C', 'D', 'E']
letters

['a', 'b', 'C', 'D', 'E', 'f', 'g']

In [24]:
# now remove them
letters[2:5] = []
letters

['a', 'b', 'f', 'g']

The built-in function `len` gives the length of a list. 

In [25]:
len(letters)

4

Here we have another two methods - `insert` and `append` to insert something into list. 

In [35]:
a = [1, 'one']
a

[1, 'one']

In [36]:
a.append('un')
a

[1, 'one', 'un']

In [37]:
a.insert(0, '0')
print(a)

['0', 1, 'one', 'un']


A list can also hold any data type or data structure inside it, for example a list inside a list (referred to as nested lists) is helpful in defining matrices (but *numpy* arrays are much more convenient - see Introduction to NumPy).

In [26]:
a = ['a', 'b', 'c']
n = [1, 2, 3]
x = [a, n]
print(x)

[['a', 'b', 'c'], [1, 2, 3]]


In [27]:
x[0]

['a', 'b', 'c']

In [28]:
x[0][1]

'b'

In [29]:
simple_matrix = [[1],[2],[3]]
simple_matrix

[[1], [2], [3]]

There is a way to remove an item from a list given its index instead of its value: the `del` statement. This differs from the `pop` method which returns a value. The `del` statement can also be used to remove slices from a list or clear the entire list (which we did earlier by assignment of an empty list to the slice). For example:

In [30]:
a = [-1, 1, 66.25, 333, 333, 1234.5]
a

[-1, 1, 66.25, 333, 333, 1234.5]

In [31]:
del(a[0])

In [32]:
a

[1, 66.25, 333, 333, 1234.5]

In [33]:
del(a[2:4])

In [34]:
a

[1, 66.25, 1234.5]

### Tuples and sequences

We saw that lists and strings have many common properties, such as indexing and slicing operations. They are two examples of sequence data types. Since Python is an evolving language, other sequence data types may be added. There is also another standard sequence data type - the tuple. 

A tuple consists of a number of values separated by commas, for instance:

In [38]:
t = 12345, 54321, 'hello!'
t

(12345, 54321, 'hello!')

In [39]:
t[0]

12345

Tuples may be nested. 

In [40]:
u = t, (1, 2, 3, 4, 5)
u

((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))

Unlike the lists, tuples are immutable!

In [41]:
t[0] = 88888

TypeError: 'tuple' object does not support item assignment

But they can contain mutable objects. 

In [42]:
v = ([1, 2, 3], [3, 2, 1])
v

([1, 2, 3], [3, 2, 1])

Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.

A special problem is the construction of tuples containing 0 or 1 items: the syntax has some extra quirks to accommodate these. Empty tuples are constructed by an empty pair of parentheses; a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses). Ugly, but effective. For example:

In [43]:
empty = ()
singleton = 'hello', # <-- note trailing comma
len(empty), len(singleton)

(0, 1)

In [44]:
singleton

('hello',)

The statement `t = 12345, 54321, 'hello!'` is an example of tuple packing: the values `12345`, `54321` and `'hello!'` are packed together in a tuple. The reverse operation (unpacking) is also possible:

In [45]:
x, y, z = t

In [46]:
x,y,z

(12345, 54321, 'hello!')

### Sets

Python also includes a data type for sets. A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference.

Curly braces or the `set` function can be used to create sets. Note: to create an empty set you have to use `set`, not `{}`; the latter creates an empty dictionary, a data structure that we discuss in the next section.

Here is a brief demonstration:

In [47]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
basket # show that duplicates have been removed

{'apple', 'banana', 'orange', 'pear'}

In [48]:
# fast membership testing
'orange' in basket

True

In [49]:
'crabgrass' in basket

False

Demonstration of set operations on unique letters from two words. 

In [50]:
a = set('abracadabra')
a # unique letters in a

{'a', 'b', 'c', 'd', 'r'}

In [52]:
b = set('alacazam')
b # unique letters in b

{'a', 'c', 'l', 'm', 'z'}

In [53]:
# letters in a but not in b 
a - b

{'b', 'd', 'r'}

In [54]:
# letters in a or b or both 
a | b

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [55]:
# letters in both a and b 
a & b

{'a', 'c'}

In [56]:
# letters in a or b but not both 
a ^ b 

{'b', 'd', 'l', 'm', 'r', 'z'}

### Dictionaries

Another useful data type built into Python is the dictionary. Dictionaries are sometimes found in other languages as "associative memories" or "associative arrays". Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like `append` and `extend`.

It is best to think of a dictionary as a set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: `{}`. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.

In [67]:
tel = {'jack': 4098, 'sape': 4139}
tel

{'jack': 4098, 'sape': 4139}

In [68]:
tel['guido'] = 4127
tel

{'jack': 4098, 'sape': 4139, 'guido': 4127}

The main operations on a dictionary are storing a value with some key and extracting the value given the key. It is also possible to delete a key:value pair with `del`. If you store using a key that is already in use, the old value associated with that key is forgotten. It is an error to extract a value using a non-existent key.

In [69]:
del(tel['sape'])
tel

{'jack': 4098, 'guido': 4127}

In [70]:
tel['irv'] = 4127
tel

{'jack': 4098, 'guido': 4127, 'irv': 4127}

Performing `list(d)` on a dictionary returns a list of all the keys used in the dictionary, in insertion order (if you want it sorted, just use `sorted(d)` instead). To check whether a single key is in the dictionary, use the `in` keyword.

In [71]:
list(tel)

['jack', 'guido', 'irv']

In [72]:
sorted(tel)

['guido', 'irv', 'jack']

In [73]:
'guido' in tel, 'jack' not in tel

(True, False)

The `dict` constructor builds dictionaries directly from sequences of key-value pairs.

In [74]:
dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])

{'sape': 4139, 'guido': 4127, 'jack': 4098}

***
<a id=flow></a>
# <font color=brown> Flow control tools</font>

### Loops

`for` allows us to repeat tasks over a range of values or objects.

In [45]:
for i in range(5):
    print(i)

0
1
2
3
4


In [46]:
for animal in ['cat', 'dog', 'chinchilla']:
    print(animal)

cat
dog
chinchilla


In [47]:
a = ['a', 1, 2]
for x in a:
    print(x)

a
1
2


`while` allows us to repeat tasks as far as particular condition is satisfied. 

In [54]:
i = 1
while i < 6:
    print(i)
    i += 1

1
2
3
4
5


Command `break` terminates the loop. 

In [55]:
i = 1
while i < 6:
    print(i)
    if i == 3:
        break
    i += 1

1
2
3


The `continue` statement (borrowed from C), continues with the next iteration of the loop.

In [53]:
i = 0
while i < 6:
    i += 1
    if i == 3:
        continue
    print(i)

1
2
4
5
6


The `pass` statement does nothing. It can be used when a statement is required syntactically but the program requires no action. For example:

In [51]:
while True: 
    pass  # Busy-wait for keyboard interrupt (Ctrl+C)

KeyboardInterrupt: 

### Conditionals

`if` statements are the most basic unit of logic and allows us to conditionally operate on things.

In [56]:
x = 4
if x > 5:
    print("x is greater than 5")
elif x < 5:
    print("x is less than 5")
else:
    print("x is equal to 5")

x is less than 5


In [62]:
x = int(input("Please enter an integer: "))
if x < 0:
    x = 0
    print('Negative changed to zero')
elif x == 0:
    print('Zero')
elif x == 1:
    print('Single')
else:
    print('More')

Please enter an integer: -5
Negative changed to zero


In [63]:
for n in range(2, 10):
    is_prime = True
    for x in range(2, n):
        if n % x == 0:
            print(n, 'equals', x, '*', n / x)
            is_prime = False
            break
    if is_prime:
        print("%s is a prime number" % (n))

2 is a prime number
3 is a prime number
4 equals 2 * 2.0
5 is a prime number
6 equals 2 * 3.0
7 is a prime number
8 equals 2 * 4.0
9 equals 3 * 3.0


### List / set / dict comprehensions and some looping techniques

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.

For example, assume we want to create a list of squares, like:

In [59]:
squares = []
for x in range(10): 
    squares.append(x**2)
    
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Note that this creates (or overwrites) a variable named `x` that still exists after the loop completes. We can calculate the list of squares without any side effects using:

In [60]:
squares = list(map(lambda x: x**2, range(10)))
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

or, equivalently:

In [61]:
squares = [x**2 for x in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

which is more concise and readable.

A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:

In [62]:
[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]

[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

List comprehensions can contain complex expressions and nested functions:

In [66]:
[str(round(math.pi, i)) for i in range(1, 6)]

['3.1', '3.14', '3.142', '3.1416', '3.14159']

Similarly to list comprehensions, set comprehensions are also supported. 

In [58]:
a = {x for x in 'abracadabra' if x not in 'abc'} 
a 

{'d', 'r'}

In addition, dict comprehensions can be used to create dictionaries from arbitrary key and value expressions. 

In [77]:
{x: x**2 for x in (2, 4, 6)}

{2: 4, 4: 16, 6: 36}

When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the `items` method.

In [75]:
knights = {'gallahad': 'the pure', 'robin': 'the brave'}
for k, v in knights.items():
    print(k ,v)

gallahad the pure
robin the brave


When looping through a sequence, the position index and corresponding value can be retrieved at the same time using the `enumerate` function.

In [76]:
for i, v in enumerate(['tic', 'tac', 'toe']): 
    print(i, v)

0 tic
1 tac
2 toe


***
<a id=functions></a>
# <font color=brown> Functions</font>

A function is defined using the `def` keyword.

In [76]:
def my_print_function(x):
    print(x)

In [77]:
my_print_function(3)

3


In [78]:
def my_add_function(a, b):
    return a + b, b

In [79]:
my_add_function(3, 5)

(8, 5)

In [72]:
def my_great_function(a, b, c):
    return (a+b)*c

In [73]:
my_great_function(1, 2, 3)

9

We can create a function which returns the Fibonacci series to an arbitrary boundary. 

In [66]:
# write Fibonacci series up to n
def fib(n):  
    """Print a Fibonacci series up to n."""
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

Now call the function we have just defined. 

In [67]:
fib(2000)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 


The keyword `def` introduces a function definition. It must be followed by the function name and the parenthesized list of formal parameters. The statements that form the body of the function start at the next line, and must be indented.

The first statement of the function body can optionally be a string literal; this string literal is the function's documentation string, or docstring. (More about docstrings can be found in the section Documentation Strings.) There are tools which use docstrings to automatically produce online or printed documentation, or to let the user interactively browse through code; it's good practice to include docstrings in code that you write, so make a habit of it.

The execution of a function introduces a new symbol table used for the local variables of the function. More precisely, all variable assignments in a function store the value in the local symbol table; whereas variable references first look in the local symbol table, then in the local symbol tables of enclosing functions, then in the global symbol table, and finally in the table of built-in names. Thus, global variables and variables of enclosing functions cannot be directly assigned a value within a function (unless, for global variables, named in a global statement, or, for variables of enclosing functions, named in a nonlocal statement), although they may be referenced.

The actual parameters (arguments) to a function call are introduced in the local symbol table of the called function when it is called; thus, arguments are passed using call by value (where the value is always an object reference, not the value of the object). When a function calls another function, a new local symbol table is created for that call.

A function definition introduces the function name in the current symbol table. The value of the function name has a type that is recognized by the interpreter as a user-defined function. This value can be assigned to another name which can then also be used as a function. This serves as a general renaming mechanism:

In [70]:
fib

<function __main__.fib>

In [71]:
f = fib
f(100)

0 1 1 2 3 5 8 13 21 34 55 89 


It is simple to write a function that returns a list of the numbers of the Fibonacci series, instead of printing it. 

In [68]:
# return Fibonacci series up to n
def fib2(n):
    """Return a list containing the Fibonacci series up to n."""
    result = []
    a, b = 0, 1
    while a < n:
        result.append(a)    # see below
        a, b = b, a+b
    return result

In [69]:
f100 = fib2(100)
f100 

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

The `return` statement returns with a value from a function. return without an expression argument returns `None`. Falling off the end of a function also returns `None`.

The statement `result.append(a)` calls a method of the list object `result`. A method is a function that 'belongs' to an object and is named `obj.methodname`, where `obj` is some object (this may be an expression), and `methodname` is the name of a method that is defined by the object's type. Different types define different methods. Methods of different types may have the same name without causing ambiguity. (It is possible to define your own object types and methods, using classes) The method append() shown in the example is defined for list objects; it adds a new element at the end of the list. In this example it is equivalent to `result = result + [a]`, but more efficient.

The most useful form is to specify a default value for one or more arguments. This creates a function that can be called with fewer arguments than it is defined to allow. For example:

In [80]:
def my_crazy_function(a, b, c=1):
    d = a + b**c
    return d

In [81]:
my_crazy_function(2, 3)

5

In [82]:
my_crazy_function(2, 3, 2)

11

In [83]:
my_crazy_function(2, 3, c=2)

11

The following example also introduces the `in` keyword. This tests whether or not a sequence contains a certain value. 

In [84]:
def ask_ok(prompt, retries=4, reminder='Please try again!'):
    while True:
        ok = input(prompt)
        if ok in ('y', 'ye', 'yes'):
            return True
        if ok in ('n', 'no', 'nop', 'nope'):
            return False
        retries = retries - 1
        if retries < 0:
            raise ValueError('invalid user response')
        print(reminder)

In [85]:
ask_ok('yes')

yesye


True

This function can be called in several ways:

* giving only the mandatory argument: `ask_ok('Do you really want to quit?')`


* giving one of the optional arguments: `ask_ok('OK to overwrite the file?', 2)`


* or even giving all arguments: `ask_ok('OK to overwrite the file?', 2, 'Come on, only yes or no!')`

In [86]:
ask_ok('Do you really want to quit?')

Do you really want to quit?no


False

In [87]:
ask_ok('OK to overwrite the file?', 2)

OK to overwrite the file?n


False

In [88]:
ask_ok('OK to overwrite the file?', 2, 'Come on, only yes or no!')

OK to overwrite the file?no yes
Come on, only yes or no!
OK to overwrite the file?no


False

The default values are evaluated at the point of function definition in the defining scope, so that

In [89]:
i = 5

def f(arg=i):
    print(arg)

i = 6
f()

5


prints 5. 

**Important warning**: The default value is evaluated only once. This makes a difference when the default is a mutable object such as a list, dictionary, or instances of most classes. For example, the following function accumulates the arguments passed to it on subsequent calls:

In [97]:
def f(a, L=[]):
    L.append(a)
    return L

In [98]:
print(f(1))
print(f(2))
print(f(3))

[1]
[1, 2]
[1, 2, 3]


If you don't want the default to be shared between subsequent calls, you can write the function like this instead:

In [99]:
def f(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

In [100]:
print(f(1))
print(f(2))
print(f(3))

[1]
[2]
[3]


Small anonymous functions can be created with the `lambda` keyword. This function returns the sum of its two arguments: `lambda a, b: a+b`. Lambda functions can be used wherever function objects are required. They are syntactically restricted to a single expression. Semantically, they are just syntactic sugar for a normal function definition. Like nested function definitions, lambda functions can reference variables from the containing scope.

In [4]:
def make_incrementor(n):
    return lambda x: x + n

In [6]:
f = make_incrementor(42)

In [9]:
f(0)

42

In [10]:
f(1)

43

The above example uses a `lambda` expression to return a function.

***
<a id=profiling></a>
# <font color=brown> Code profiling</font>

Python documentation defines a profile as "a set of statistics that describes how often and for how long various parts of the program executed." In addition to measuring time, profiling can also tell us about memory usage.

The idea of profiling code is to identify bottlenecks in performance. It may be tempting to guess where the bottlenecks could be but profiling is more objective and quantitative. Profiling is a necessary step before attempting to optimize any program. Profiling can lead to restructuring code, implementing better algorithms, releasing unused memory, caching computational results, improving data handling, etc.

There are two types of profiling:

**1. Deterministic Profiling**: All events are monitored. It provides accurate information but has a big impact on performance (overhead). It means the code run slower under profiling. Its use in production systems is often impractical. This type of profiling is suitable for small functions.


**2. Statistical profiling**: Sampling the execution state at regular intervals to compute indicators. This method is less accurate, but it also reduces the overhead.


Python comes with two modules for deterministic profiling: **_cProfile_** and **_profile_**. Both are different implementations of the same interface. The former is a C extension with relatively small overhead, and the latter is a pure Python module. As the official documentation says, the module **_profile_** would be suitable when we want to extend the profiler in some way. Otherwise, **_cProfile_** is preferred for long-running programs. Unfortunately, there is no built-in module for statistical profiling, but we will see some external packages for it.

After a brief description we can approach a concrete illustrative example of code profiling. As a toy example, we would like to evaluate the summation of the reciprocals of squares up to a certain integer $n$ for evaluating $\pi$. The relation we want to use has been proven by Euler in 1735 and is known as the [Basel problem](https://en.wikipedia.org/wiki/Basel_problem). A simple Python code for evaluating the truncated sum looks like this: 

In [1]:
def recip_square(i):
    return 1. / i ** 2

def approx_pi(n=10000000):
    val = 0.
    for k in range(1, n + 1):
        val += recip_square(k)
    return (6 * val) ** .5

The higher we choose $n$, the better will be the approximation for $\pi$. An experienced Python programmer will already see plenty of places to optimize this code. But remember the golden rule of optimization: Never optimize without having profiled.  Your thoughts about which part of your code takes too much time are wrong. At least, mine are always wrong. So let's write a short script to profile our code. 

In [3]:
import pstats, cProfile

Notice, that we have also imported *pstats* - a statistics browser for reading and examining profile dumps. It has a simple line-oriented interface (implemented using cmd) and interactive help. So statistics can be formatted into reports via the *pstats* module. 

In [6]:
cProfile.runctx("approx_pi()", globals(), locals(), "Profile.prof")

In [7]:
s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()

Mon Apr  6 17:57:21 2020    Profile.prof

         10000004 function calls in 9.395 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10000000    6.904    0.000    6.904    0.000 <ipython-input-1-828668dcb981>:1(recip_square)
        1    2.490    2.490    9.394    9.394 <ipython-input-1-828668dcb981>:4(approx_pi)
        1    0.000    0.000    9.395    9.395 {built-in method builtins.exec}
        1    0.000    0.000    9.394    9.394 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




<pstats.Stats at 0x1e5d85a74a8>

As we can see, the profiler has recorded information for each function of our code. How can we interpret the results? 

- ncalls: as the name suggests, the number of calls. We should try to optimize functions that have a lot of calls or consume too much time per call; 


- tottime: the total time spent in the function itself, excluding sub calls. This is where we should look closely at. 


- cumtime: cumulative time. It includes sub calls. 


- percall: We have two "per call" metrics. The first one: total time per call, and the second one: cumulative time per call. Again, we should focus on the total time metric. 

We can also sort the functions by some criteria. For example. 

In [8]:
s.strip_dirs().sort_stats("tottime").print_stats()

Mon Apr  6 17:57:21 2020    Profile.prof

         10000004 function calls in 9.395 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10000000    6.904    0.000    6.904    0.000 <ipython-input-1-828668dcb981>:1(recip_square)
        1    2.490    2.490    9.394    9.394 <ipython-input-1-828668dcb981>:4(approx_pi)
        1    0.000    0.000    9.395    9.395 {built-in method builtins.exec}
        1    0.000    0.000    9.394    9.394 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




<pstats.Stats at 0x1e5d85a74a8>

Other useful modules to install include *line_profiler*, *memory_profiler* and *psutil*. The module *pprofile* takes longer to profile but gives more detailed information than *line_profiler*. For memory profiling, *heapy* and *meliea* may also help. *PyCounters* can be useful in production. "A picture is worth a thousand words" so you can also try to install a visual profiler called *vprof*. 

***
<a id=libraries></a>
# <font color=brown> Useful Python libraries</font>

- [numpy](https://numpy.org/) - fundamental package for scientific computing (powerful N-dimensional array object, sophisticated (broadcasting) functions, useful linear algebra, Fourier transform, and random number capabilities, ...)


- [matplotlib](https://matplotlib.org/) - plotting, visualizations


- [scipy](http://www.scipy.org/) - optimization, ODEs, sparse linear algebra, etc.


- [sympy](http://sympy.org/) - symbolic computation


- [pandas](http://pandas.pydata.org/) - data analysis


- [numba](http://numba.pydata.org/) - open source JIT compiler that translates a subset of Python and NumPy code into fast machine code; Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN


- [mpi4py](http://mpi4py.scipy.org/) - parallel computing


- [petsc4py](http://code.google.com/p/petsc4py/), [pytrilinos](http://trilinos.sandia.gov/packages/pytrilinos/) - Python bindings for the "big 2" parallel scientific libraries


- [pyCUDA](http://mathema.tician.de/software/pycuda), [pyOpenCL](http://mathema.tician.de/software/pyopencl) - GPGPU computing


- [FENiCS](http://fenicsproject.org/), [FiPy](http://www.ctcms.nist.gov/fipy/), [PyClaw](http://clawpack.github.io/doc/pyclaw/) - solve complicated PDEs with very sophisticated numerical methods


- [networkX](http://networkx.github.com/), [pygraphviz](http://networkx.lanl.gov/pygraphviz/) - graphs

***
<a id=references></a>
# <font color=brown> References</font>

* Jake VanderPlas (2016). [A Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/). Copyright 2016 O’Reilly Media, Inc., 978-1-491-96465-1. 


* Jake VanderPlas (2016). Python Data Science Handbook: Essential Tools for Working with Data (1st. ed.). O’Reilly Media, Inc.


* Zimmermann, P., Casamayou, A., Cohen, N., Connan, G., Dumont, T., Fousse, L., Maltey, F., Meulien, M., Mezzarobba, M., Pernet, C., Thiéry, N.M., Bray, E., Cremona, J., Forets, M., Ghitza, A., & Thomas, H.H. (2018). [Computational Mathematics with SageMath](http://sagebook.gforge.inria.fr/english.html).

* This tutorial was created with the help of official [Python tutorial](https://docs.python.org/3/tutorial/) and also thanks to [Kyle Mandli](https://github.com/mandli/), [Tristan Glatard](https://github.com/tgteacher), [Antonio Molner](https://medium.com/@antoniomdk1). 