# Python

## Basic concepts

### Basic input and output

The traditional "Hello, world" program is very simple in Python. You can run the program by selecting the cell by mouse and pressing control-enter on keyboard. Try editing the string in the quotes and rerunning the program.

In [117]:
print("Hello world!")

Hello world!


Multiple strings can be printed. By default, they are concatenated with a space:

In [118]:
print("Hello,", "John!", "How are you?")

Hello, John! How are you?


In the print function, numerical expression are first evaluated and then automatically converted to strings. Subsequently the strings are concatenated with spaces:

In [119]:
print(1, "plus", 2, "equals", 1+2)

1 plus 2 equals 3


Reading textual input from the user can be achieved with the input function. The input function is given a string parameter, which is printed and prompts the user to give input. In the example below, the string entered by the user is stored the variable `name`.

In [120]:
name=input("Give me your name: ")
print("Hello,", name)

Give me your name: Jarkko
Hello, Jarkko


### Indentation

Repetition is possible with the for loop. Note that the body of for loop is indented whith a tabulator or four spaces.
Unlike in some other languages, braces are not needed to denote the body of the loop. When the indentation stops the body of the loop ends.

In [121]:
for i in range(3):
    print("Hello")
print("Bye!")

Hello
Hello
Hello
Bye!


Indentation applies to other compound statements as well, such as bodies of functions, different branches of an if statement, and while loops. We shall see examples of these later.

The range(3) expression above actually results with the sequence of integers 0, 1, and 2. So, the range is a half-open interval with the end point excluded from the range. In general, expression range(n) gives integers 0, 1, 2, ..., n-1. Modify the above program to make it also print the value of variable i at each iteration. Rerun the code with control-enter.

#### <div class="alert alert-info"> Exercise 1 (hello world)</div>
Fill in the missing piece in the solution stub to make it print the following:

`Hello, world!`

Make sure you use correct indenting.
<hr/>

#### <div class="alert alert-info"> Exercise 2 (compliment)</div>
Fill in the stub solution to make the program work as follows. The program should ask the user for an input, and the print an answer as the examples below show. The string the user entered is shown below in red.

What country are you from? <font color='red'>Sweden</font>  
I have heard that Sweden is a beautiful country.

What country are you from? <font color='red'>Chile</font>  
I have heard that Chile is a beautiful country.
<hr/>

#### <div class="alert alert-info">Exercise 3 (multiplication)</div> 
Make a program that gives the following output. You must use a for loop in your solution.

```
4 multiplied by 0 is 0
4 multiplied by 1 is 4
4 multiplied by 2 is 8
4 multiplied by 3 is 12
4 multiplied by 4 is 16
4 multiplied by 5 is 20
4 multiplied by 6 is 24
4 multiplied by 7 is 28
4 multiplied by 8 is 32
4 multiplied by 9 is 36
4 multiplied by 10 is 40
```
<hr/>

### Variables and data types

We saw already earlier that assigning a value to variable is very simple:

In [122]:
a=1
print(a)

1


Note that we did not need to introduce the variable a in any way. No type was given for the variable. Python automatically detected that the type of a must be int. We can query the type of a variable with the builtin function type:

In [123]:
type(a)

int

Note also that the type of a variable is not fixed:

In [124]:
a="some text"
type(a)

str

In Python the type of a variable is not attached to the name of the variable, like in C for instance, but instead with the actual value. This is called dynamic typing.

![typing.svg](typing.svg)

We say that a variable is a name that *refers* to a value or and object, and the assignment operator *binds* a variable name to a value.

The basic data types in Python are: int, float, complex, str (a string), bool (a boolean with values True and False), and bytes. Below are few examples of their use.

In [125]:
i=5
f=1.5
b = i==4
print("Result of the comparison:", b)
c=0+2j
print("Complex multiplication:", c*c)
s="conca" + "tenation"
print(s)

Result of the comparison: False
Complex multiplication: (-4+0j)
concatenation


The names of the types act as conversion operators between types:

In [126]:
print(int(-2.8))
print(float(2))
print(int("123"))
print(bool(-2), bool(0))  # Zero is interpreted as False
print(str(234))

-2
2.0
123
True False
234


A *byte* can represent numbers between 0 and 255. A byte consists of 8 *bits*, which can in turn represent either 0 or 1. All the data that is stored on disks or transmitted across the internet are sequences of bytes. Normally we don't have to care about bytes, since our strings and other variables are automatically converted to a sequence of bytes when needed to. An example of the correspondence between the usual data types and bytes is the characters in a string. A single character is encoded as a sequence of one or more bytes. For example, in the common [UTF-8](https://en.wikipedia.org/wiki/UTF-8) encoding the character `c` corresponds to the byte with integer value 99 and the character `ä` corresponds to sequence of bytes [195, 164]. An example conversion between characters and bytes:

In [127]:
b="ä".encode("utf-8")     # Convert character(s) to a sequence of bytes
print(b)                  # Prints bytes in hexadecimal notation
print(list(b))            # Prints bytes in decimal notation

b'\xc3\xa4'
[195, 164]


In [128]:
bytes.decode(b, "utf-8")  # convert sequence of bytes to character(s)

'ä'

During this course we don't have to care much about bytes, but in some cases, when loading data sets, we might have to specify the encoding if it deviates from the default one.

#### Creating strings
A string is a sequence of characters commonly used to store input or output data in a program. The characters of a string are specified either between single (') or double (") quotes. This optionaly is useful if a string needs to contain a quotation mark:
"I don't want to go!". You can also achieve this by *escaping* the quotation mark with the backslash: 'I don\'t want to go'.

The string can also contain other escape sequences like \n for newline and \t for a tabulator. See [literals](https://docs.python.org/3/reference/lexical_analysis.html#literals) for a list of all escape sequences.

In [129]:
print("One\tTwo\nThree\tFour")

One	Two
Three	Four


A string containing newlines can be easily given within triple double or triple single quotes:

In [130]:
s="""A string
spanning over
several lines"""

Although we can concatenate strings using the + operator, for effiency reasons, one should use the join method to concatenate largen number of strings:

In [131]:
a="first"
b="second"
print(a+b)
print(" ".join([a, b, b, a]))   # More about the join method later


firstsecond
first second second first


Sometimes printing by concatenation from pieces can be clumsy:

In [132]:
print(str(1) + " plus " + str(3) + " is equal to " + str(4))
# slightly better
print(1, "plus", 3, "is equal to", 4)

1 plus 3 is equal to 4
1 plus 3 is equal to 4


The multiple catenation and quotation characters break the flow of thought. *String interpolation* offers somewhat easier syntax:

In [133]:
print("%i plus %i is equal to %i" % (1, 3, 4))

1 plus 3 is equal to 4


Or alternatively using the newer format-method:

In [134]:
print("{} plus {} is equal to {}".format(1, 3, 4))

1 plus 3 is equal to 4


The %i format specifier corresponds to integers and the specifier %f corresponds to floats.
It is often useful to specify the number of decimals when printing the float:

In [135]:
print("%.1f %.2f %.3f" % (1.6, 1.7, 1.8))               # Old style
print("{:.1f} {:.2f} {:.3f}".format(1.6, 1.7, 1.8))     # new style

1.6 1.70 1.800
1.6 1.70 1.800


The specifier `%s` is used for strings. An example:

In [136]:
print("%s concatenated with %s produces %s" % ("water", "melon", "water"+"melon"))

water concatenated with melon produces watermelon


Look [here](https://pyformat.info/#number) for more details about format specifiers, and for comparison between the old and new style of string interpolation.

### Expressions
An *expression* is a piece of Python code that results in a value. It consists of values combined together with *operators*. Values can be literals, such as `1`, `1.2`, `"text"`, or variables. Operators include arithmetics operators, comparison operators, function call, indexing, attribute references, among others. Below there are a few examples of expressions:

```1+2
7/(2+0.1)
a
cos(0)
mylist[1]
c > 0 and c !=1
(1,2,3)
a<5
obj.attr
(-1)**2 == 1```

<div class="alert alert-warning">Note that in Python the operator `//` performs integer division and operator `/` performs float division. The `**` operator denotes exponentiation. These operators might therefore behave differently than in many other comman languages.</div>

As another example the following expression computes the kinetic energy of a non-rotating object:
`0.5 * mass * velocity**2`

### Statements
Statements are command that have some effect. For example, a function call (that is not part of another expression) is a statement. Also, the variable assignment is a statement:

In [137]:
i = 5
i = i+1    # This is a commong idion to increment the value of i by one
i += 1     # This is a short-hand for the above

It turns out that the operators `+ - * / // % & | ^ >> << **` have the corresponding *augmented assignment operators* `+= -= *= /= //= %= &= |= ^= >>= <<= **=`

Another large set of statements if the flow-control statements such as if-else, for and while loops. We will look into these in the next sections.

#### Loops for repetitive tasks
In Python we have two kinds of loops: while and for. We briefly saw the for loop earlier. Let's now look at the while loop. A while loop repeats a set of statements while a given condition holds. An example:

In [138]:
i=1
while i*i < 1000:
    print("Square of", i, "is", i*i)
    i = i + 1
print("Finished printing all the squares below 1000.")

Square of 1 is 1
Square of 2 is 4
Square of 3 is 9
Square of 4 is 16
Square of 5 is 25
Square of 6 is 36
Square of 7 is 49
Square of 8 is 64
Square of 9 is 81
Square of 10 is 100
Square of 11 is 121
Square of 12 is 144
Square of 13 is 169
Square of 14 is 196
Square of 15 is 225
Square of 16 is 256
Square of 17 is 289
Square of 18 is 324
Square of 19 is 361
Square of 20 is 400
Square of 21 is 441
Square of 22 is 484
Square of 23 is 529
Square of 24 is 576
Square of 25 is 625
Square of 26 is 676
Square of 27 is 729
Square of 28 is 784
Square of 29 is 841
Square of 30 is 900
Square of 31 is 961
Finished printing all the squares below 1000.


Note again that the body of the while statement was marked with the indentation.

Another way of repeating statements is with the for statement. An example

In [139]:
s=0
for i in [0,1,2,3,4,5,6,7,8,9]:
    s = s + i
print("The sum is", s)

The sum is 45


The for loop executes the statements in the block as many times as there are elements in the given list. At each iteration the variable i refers to another value from the list in order. Instead of the giving the list explicitly as above, we could have used the *generator* range(10) which returns values from the sequence 0,1,...,9 as the for loop asks for a new value. In the most general form the for loop goes through all the elements in an *iterable*.
Besides lists and generators there are other iterables. We will talk about iterables and generators later this week.

When one wants to iterate through all the elements in an iterable, then the for loop is a natural choice. But sometimes while loops offer cleaner solution. For instance, if we want
to go through all Fibonacci number up till a given limit, then it is easier to do with a `while` loop.

#### <div class="alert alert-info">Exercise X (multiplication table)</div>

In the main function print a multiplication table, which is shown below:
```
   1   2   3   4   5   6   7   8   9  10
   2   4   6   8  10  12  14  16  18  20
   3   6   9  12  15  18  21  24  27  30
   4   8  12  16  20  24  28  32  36  40
   5  10  15  20  25  30  35  40  45  50
   6  12  18  24  30  36  42  48  54  60
   7  14  21  28  35  42  49  56  63  70
   8  16  24  32  40  48  56  64  72  80
   9  18  27  36  45  54  63  72  81  90
  10  20  30  40  50  60  70  80  90 100
```
For example at row 4 and column 9 we have 4*9=36.

Use two nested for loops to achive this. Note that you can use the following form to stop the `print` function from automatically starting a new line:

In [140]:
print("text", end="")
print("more text")

textmore text


Print the numbers in a field with width four, so that the numbers are nicely aligned. For instructions on how adjust the field width refer to [pyformat.info](https://pyformat.info/#number_padding).
<hr/>

#### Decision making with the if statement
The if-else statement works as can be expected.
Try running the below cell by pressing control+enter.

In [141]:
x=input("Give an integer: ")
x=int(x)
if x >= 0:
    a=x
else:
    a=-x
print("The absolute value of %i is %i" % (x, a))

Give an integer: -1
The absolute value of -1 is 1


The general from of an if-else statement is

```
if condition1:
    statement1_1
    statement1_2
    ...
elif condition2:
    statement2_1
    statement2_2
    ...
...
else:
    statementn_1
    statementn_2
    ...
```

Another example:

In [142]:
c=float(input("Give a number: "))
if c > 0:
    print("c is positive")
elif c<0:
    print("c is negative")
else:
    print("c is zero")

Give a number: -3
c is negative


#### Breaking and continuing loop
Breaking the loop, when the wanted element is found, with the `break` statement:

In [143]:
l=[1,3,65,3,-1,56,-10]
for x in l:
    if x < 0:
        break
print("The first negative list element was", x)

The first negative list element was -1


Stopping current iteration and continuing to the next one with the `continue` statement:

In [144]:
from math import sqrt, log
l=[1,3,65,3,-1,56,-10]
for x in l:
    if x < 0:
        continue
    print("Square root of %i is %f" % (x, sqrt(x)))
    print("Natural logarithm of %i is %f" % (x, log(x)))

Square root of 1 is 1.000000
Natural logarithm of 1 is 0.000000
Square root of 3 is 1.732051
Natural logarithm of 3 is 1.098612
Square root of 65 is 8.062258
Natural logarithm of 65 is 4.174387
Square root of 3 is 1.732051
Natural logarithm of 3 is 1.098612
Square root of 56 is 7.483315
Natural logarithm of 56 is 4.025352


#### <div class="alert alert-info">Exercise X (two dice)</div>

Let us consider throwing two dice. (A dice can give a value between 1 and 6.) Use two nested `for`
loops in the `main` function to iterate through all possible combinations the pair of dice can give. 
There are 36 possible combinations. Print all those combinations as pairs that sum to 5. 
For example, your printout should include the pair (2,3). Print one pair per line.
<hr/>

### Functions
A function is defined with the `def` statement. Let's do a doubling function.

In [145]:
def double(x):
    "This function multiplies its argument by two."
    return x*2
print(double(4), double(1.2), double("abc")) # It even happens to work for strings!

8 2.4 abcabc


The double function takes only one parameter. Notice the *docstring* on the second line. It documents the purpose and usage of the function. Let's try to access it.

In [146]:
print("The docstring is:", double.__doc__)
help(double)   # Another way to access the docstring

The docstring is: This function multiplies its argument by two.
Help on function double in module __main__:

double(x)
    This function multiplies its argument by two.



Most of Python's builtin functions, classes, and modules should contain a docstring.

In [147]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



Here's another example function:

In [148]:
def sum_of_squares(a, b):
    "Computes the sum of arguments squared"
    return a**2 + b**2
print(sum_of_squares(3, 4))

25


<div class="alert alert-warning">Note the terminology: in the function definition the names a and b are called *parameters* of the function; in the function call, however, 3 and 4 are called *arguments* to the function.
</div>

It would be nice that the number of arguments could be arbitrary, not just two. We could pass a list to the function as a parameter.

In [149]:
def sum_of_squares(lst):
    "Computes the sum of squares of elements in the list given as parameter"
    s=0
    for x in lst:
        s += x**2
    return s
print(sum_of_squares([-2]))
print(sum_of_squares([-2,4,5]))

4
45


This works perfectly! There is however some extra typing with the brackets around the lists. Let's see if we can do better:

In [150]:
def sum_of_squares(*t):
    "Computes the sum of squares of arbitrary number of arguments"
    s=0
    for x in t:
        s += x**2
    return s
print(sum_of_squares(-2))
print(sum_of_squares(-2,4,5))

4
45


The strange looking argument notation is called *argument packing*. It packs all the given positional arguments into a tuple `t`. We will encounter tuples again later, but it suffices now to say that tuples are immutable lists. With the for loop we can iterate through all the elements in the tuple.

Conversely, there is also syntax for *argument unpacking*. It has confusingly exactly same notation as argument packing, but they are separated by the location where used. Packing happens in the parameter list of the functions definition, and unpacking happens where the function is called:

In [151]:
lst=[1,5,8]
print("With list unpacked as arguments to the functions:", sum_of_squares(*lst))
# print(sum_of_squares(lst))    # Does not work correctly

With list unpacked as arguments to the functions: 90


The second call failed because the function tried to raise the list of numbers to the second power. Inside the function body we have t=([1,5,8]), where the parentheses denote a tuple with one element, a list.

In addition to positional parameters we have seen so far, a function can also have *named parameters*. An example will explain this concept best:

One can also specify optional parameter by giving the parameters a default value. The parameters that have default values must come after those parameters that don't. We saw that the parameters of the print function were of form `print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)`. There were four parameters with default values. If some default values don't suit us, we give them in the function call using the name of the parameter:

In [152]:
print(1, 2, 3, end=' |', sep=' -*- ')
print("first", "second", "third", end=' |', sep=' -*- ')

1 -*- 2 -*- 3 |first -*- second -*- third |

Note that the named arguments didn't need to be in the same order as in the function definition. Nor did we need to specify all the parameters with default values, only those we wanted to change.

In [153]:
def length(*t, degree=2):
    """Computes the length of the vector given as parameter. By default, it computes
    the Euclidean distance (degree==2)"""
    s=0
    for x in t:
        s += abs(x)**degree
    return s**(1/degree)
print(length(-4,3))
print(length(-4,3, degree=3))

5.0
4.497941445275415


With the default parameter this is the Euclidean distance, and if $p\ne 2$ it is called [p-norm](https://en.wikipedia.org/wiki/P-norm).

We saw that it was possible to use packing and unpacking of arguments with the * notation, when one wants to specify arbitrary number of *positional arguments*. This is also possible for arbitrary number of named arguments with the `**` notation. We will talk about this more in the data structures section.

#### Visibility of variables
Function definition creates a new namespace (also called local scope). Variables created inside this scope are not available from outside the function definition. Also, the function parameters are only visible inside the function definition. Variables that are not defined inside any function are called `global variables`.

Global variable are readable also in local scopes, but an assignment creates a new local variable without rebinding the global variable. If we are inside a function, a local variable hides a global variable by the same name:

In [154]:
i=2
def f():
    i=3       # this creates a new variable, it does not rebind the global i
    print(i)  # This will print 3    
f()
print(i)      # This will print 2

3
2


If you really need to rebind a global variable from a function, use the `global` statement. Example:

In [155]:
i=2
def f():
    global i
    i=5       # rebind the global i variable
    print(i)  # This will print 5
f()
print(i)      # This will print 5

5
5


Unlike languages like C or C++, Python allows defining a function inside another function. This *nested* function will have nested scope:

In [156]:
def f():            # outer function
    b=2
    def g():        # inner function
        #nonlocal b # Without this nonlocal statement,
        b=3         # this will create a new local variable
        print(b)
    g()
    print(b)
f()

3
2


Try first running the above cell and see the result. Then uncomment the nonlocal stamement and run the cell again. The `global` and `nonlocal` statements are similar. The first will force a variable refer to a global variable, and the second will force a variable to refer to the variable in the nearest outer scope (but not the global scope).

#### <div class="alert alert-info">Exercise X (triple square)</div>

Write two functions: `triple` and `square`. Function `triple` multiplies its parameter by three. Function `square` raises its parameter to the power of two. For example, we have equalities `triple(5)==15`
and `square(5)==25`.

Part 1.

In the `main` function write a `for` loop that iterates through values 1 to 10, and for each value prints its triple and its square. The output should be as follows:
```
triple(1)==3 square(1)==1
triple(2)==6 square(2)==4
...
```

Part 2.

Now modify this `for` loop so that it stops iteration when the square of a value is larger than the
triple of the value, without printing anything in the last iteration.
<hr/>

#### <div class="alert alert-info">Exercise X (areas of shapes)</div>

Create a program that can compute the areas of three shapes, triangles, rectangles and circles, when
their dimensions are given.
An endless loop should ask for which shape you want the area be calculated. If the user gives a
string that is none of the given shapes, the message “unknown shape!” should be printed.
An empty string will exit the loop.
Then it will ask for dimensions for that particular shape. When all the necessary dimensions are
given, it prints the area, and starts the loop all over again. Use format specifier "%f" for the radius.
What happens if you give incorrect dimensions, like giving string "aa" as radius? You don't have to check for errors in the input.

Example interaction:
```
Choose a shape (triangle, rectangle, circle): triangle
Give base of the triangle: 20
Give height of the triangle: 5
The area is 50.000000
Choose a shape (triangle, rectangle, circle): rectangel
Unknown shape!
Choose a shape (triangle, rectangle, circle): rectangle
Give width of the rectangle: 20
Give height of the rectangle: 4
The area is 80.000000
Choose a shape (triangle, rectangle, circle): circle
Give radius of the circle: 10
The area is 314.159265
Choose a shape (triangle, rectangle, circle): 
```
<hr/>

### Data structures
The main data structures in Python are stringss, lists, tuples, dictionaries, and sets. We saw some examples of lists, when we discussed for loops. And we saw briefly tuples when we introduced argument packing and unpacking. Let's get into more details now.

#### Sequences
A *list* contains arbitrary number of elements (even zero) that are stored in sequential order. The elements are separated by commas and written between brackets. The elements don't need to be of the same type. An example of a list with four values:

In [157]:
[2, 100, "hello", 1.0]

[2, 100, 'hello', 1.0]

A *tuple* is fixed length, immutable, and ordered container. Elements of tuple are separated by commas and written between parentheses. Examples of tuples:

In [158]:
(3,)               # a singleton
(1,3)              # a pair
(1, "hello", 1.0); # a triple

<div class="alert alert-warning">Note the difference between `(3)` and `(3,)`. The first one defines an integer, and the second one defines a tuple with single element.</div>

As we can see, both lists and tuples can contain values different type.

List, tuples, and strings are called *sequences* in Python, and they have several commonalities:

* their length can be queried with the `len` function
* `min` and `max` function find the minimum and maximum element of a sequence, and `sum` adds all the elements of numbers together
* Sequences can be concatenated with the `+` operator, and repeated with the `*` operator: `"hi"*3=="hihihi"`
* Since sequences are ordered, we can refer to the elements of a sequences by integers using the *indexing* notation: `"abcd"[2] == "c"`
* Note that the indexing begins from 0
* Negative integers start indexing from the end: -1 refers to the last element, -1 refers to the second last, and so on

Above we saw that we can access a single element of a sequence using indexing. If we want a subset of a sequence, we can use the *slicing* syntax. A slice consists of elements of the original sequence, and it is itself a sequence as well. A simple slice is a range of elements:

In [159]:
s="abcdefg"
s[1:4]

'bcd'

Note that Python ranges exclude the last index. The generic form of a slice is
`sequence[first:last:step]`. If any of the three parameters are left out, they are set to default values as follows: first=0, last=len(L), step=1. So, for instance "abcde"[1:]=="bcde". The step parameter selects elements that are step distance apart from each other. For example:

In [160]:
print([0,1,2,3,4,5,6,7,8,9][::3])

[0, 3, 6, 9]


#### <div class="alert alert-info">Exercise X (solve quadratic)</div>

In mathematics, the quadratic equation $ax^2+bx+c=0$ can be solved with the formula 
$x=\frac{-b\pm \sqrt{b^2 -4ac}}{2a}$. 

Write a function `solve_quadratic`, that returns both solutions of a generic quadratic as a pair (2-tuple)
when the coefficients are given as parameters. It should work like this:
```python
print(solve_quadratic(1,-3,2))
(2.0,1.0)
print(solve_quadratic(1,2,1))
(-1.0,-1.0)
```

Use the `math.sqrt` function from the `math` module in your solution. Test that your function works in the main function!
<hr/>

#### Modifying lists
We can assign values to elements of a list by indexing or by slicing. An example:

In [161]:
L=[11,13,20,32]
L[1]=2          # Changes the third element
print(L)

[11, 2, 20, 32]


Or we can assign a list to a slice:

In [162]:
L[1:3]=[4]
print(L)

[11, 4, 32]


We can also modify a list by using *mutating methods* of the list class, namely the methods `append`, `extend`, `insert`, `remove`, `pop`, `reverse`, and `sort`. Try Python's help functionality to find more about these methods: e.g. `help(list.extend)` or `help(list)`.

<div class="alert alert-warning">Note that we cannot perform these modifications on tuples or strings since they are *immutable*</div>

#### Generating sequences
Trivial lists can be tedious to write: `[0,1,2,3,4,5,6]`. The function range creates numeric ranges automatically. The above sequence can be generated with the function call range(7). Note again that then end value is not included  in the sequence. An example of using the range function:

In [163]:
L=range(3)
for i in L:
    print(i)
# Note that L is not a list!
print(L)

0
1
2
range(0, 3)


So `L` is not a list, but it is a sequence. We can for instace access its last element with `L[-1]`. If really needed, then it can be converted to a list with the `list` constructor:

In [164]:
L=range(10)
print(list(L))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


<div class="alert alert-warning">Note that using a range consumes less memory than a corresponding list. This is because in a list all the elements are stored in the memory, whereas the range generates the requested elements only when needed. For example, when the for loop asks for the next element from the range at each iteration, only a single element from the range exists in memory at the same time.</div>

The range function works in similar fashion as slices. So, for instance the step of the sequence can be given:

In [165]:
print(list(range(0, 7, 2)))

[0, 2, 4, 6]


#### Sorting sequences

In Python there are two ways to sort sequences. The `sort` *method* modifies the original list, whereas the `sorted` *function* returns a new sorted list and leaves the original intact. A couple of examples will demonstrate this:

In [166]:
L=[5,3,7,1]
L.sort()      # here we call the sort method of the object L
print(L)
L2=[6,1,7,3,6]
print(sorted(L2))
print(L2)

[1, 3, 5, 7]
[1, 3, 6, 6, 7]
[6, 1, 7, 3, 6]


The parameter `reverse=True` can be given (both to sort and sorted) to get descending order of elements:

In [167]:
L=[5,3,7,1]
print(sorted(L, reverse=True))

[7, 5, 3, 1]


#### <div class="alert alert-info">Exercise X (merge)</div>

Suppose we have two lists `L1` and `L2` that contain integers which are sorted in ascending order.
Create a function `merge` that gets these lists as parameters and returns a new sorted list `L` that has
all the elements of `L1` and `L2`. So, `len(L)` should equal to `len(L1)+len(L2)`. Do this using the
fact that both lists are already sorted. You can’t to use the `sorted` function or the `sort` method in implementing the `merge` method. You can however use these functions in the main function for creating inputs to the `merge` function.
Test with a couple of examples in the `main` function that your solution works correctly.
<hr/>

#### <div class="alert alert-info">Exercise X (detect ranges)</div>

Create a function named `detect_ranges` that gets a list of integers as a parameter. The function
should then sort this list, and transform the list into another list where pairs are used for all the
detected intervals. So `3,4,5,6` is replaced by the pair `(3,7)`. Numbers that are not part of any
interval result just single numbers. The resulting list consists of these numbers and
pairs, separated by commas.
An example of how this function works:
```python
print(detect_ranges([2,5,4,8,12,6,7,10,13]))
[2,(4,9),10,(12,14)]
```

Note that the second element of the pair does not belong to the range. This is consistent with the way Python's `range` function works.
<hr/>

#### Zipping sequences

The `zip` function combines two (or more) sequences into one sequence. If, for example, two sequences are zipped together, the resulting list contains pairs. In general, if `n` sequences are zipped together, the elements of the resulting list contains `n`-tuples. Here's an example of using the `zip` function.

In [168]:
days="Monday Tuesday Wednesday Thursday Friday Saturday Sunday".split()
weathers="rainy rainy sunny cloudy rainy sunny sunny".split()
temperatures=[10,12,12,9,9,11,11]
for day, weather, temperature in zip(days,weathers,temperatures):
    print("On %s it was %s and the temperature was %i degrees celsius." % (day,weather,temperature))

# Or equivalently, we following would be shorter:
#for t in zip(days,weathers,temperatures):
#    print("On %s it was %s and the temperature was %i degrees celsius." % t)

On Monday it was rainy and the temperature was 10 degrees celsius.
On Tuesday it was rainy and the temperature was 12 degrees celsius.
On Wednesday it was sunny and the temperature was 12 degrees celsius.
On Thursday it was cloudy and the temperature was 9 degrees celsius.
On Friday it was rainy and the temperature was 9 degrees celsius.
On Saturday it was sunny and the temperature was 11 degrees celsius.
On Sunday it was sunny and the temperature was 11 degrees celsius.


If the sequences are not of equal length, then the resulting sequence will be as long as the shortest input sequence is.

#### <div class="alert alert-info">Exercise X (interleave)</div>

Write function `interleave` that gets arbitrary number of lists as parameters. You may assume that all the lists have equal length. The function should return one list containing all the elements from the input lists interleaved.

Example:
`interleave([1,2,3], [20,30,40], ['a', 'b', 'c'])`
should return
`[1, 20, 'a', 2, 30, 'b', 3, 40, 'c']`.
Use the `zip` function and the `extend` method of the list object to implement `interleave`.
<hr/>

#### Enumerating sequences

In some other programming languages one iterates through the elements using their indices (0,1, ...) in the sequence. In Python we normally don't need to think about indices when iterating, because the for loop allows simpler iteration through the elements. But sometimes you really need to know the index of the current element in the sequence. In this this one uses Python's `enumerate` function. In the next example we would like find the second occurrence of integer 5 in a list.

In [169]:
L=[1,2,98,5,-1,2,0,5,10]
counter = 0
for i, x in enumerate(L):
    if x == 5:
        counter += 1
        if counter == 2:
            break
print(i)

7


The `enumerate(L)` function call can be thought to be equivalent to `zip(range(len(L)), L)`.

#### Dictionaries
A *dictionary* is a dynamic, unordered container. Instead of using integers to access the elements of the container, the dictionary uses *keys* to acces the stored *values*. The dictionary can be created by listing the comma separated key-value pairs in braces. Keys and values are separated by a colon. A tuple (key,value) is called an *item* of the dictionary.

Let's demonstrate the dictionary creation and usage:

In [170]:
d={"key1":"value1", "key2":"value2"}
print(d["key1"])
print(d["key2"])

value1
value2


Keys can have different types even in the same container. So the following code is legal:
`d={1:"a", "z":1}`. The only restriction is that the keys must be *hashable*. That is, there has to be a mapping from keys to integers. Lists are *not* hashable, but tuples are!

There are alternative syntaxes for dictionary creation:

In [171]:
dict([("key1", "value1"), ("key2", "value2"), ("key3", "value3")]) # list of items
dict(key1="value1", key2="value2", key3="value3");

If a key is not found in a dictionary, the indexing `d[key]` results in an error (*exception* `KeyError`). But an assignment with non-existing key causes the key to be added in the dictionary associated with the corresponding value:

In [172]:
d={}
d[2]="value"
print(d)

{2: 'value'}


In [173]:
# d[1]   # This would cause an error

Dictionary object contains several non-mutating methods:
```
d.copy()
d.has key(k)
d.items()
d.keys()
d.values()
d.iteritems()
d.iterkeys()
d.itervalues()
d.get(k[,x])
```

Some methods change the dictionary:
```
d.clear()
d.update(d1)
d.setdefault(k[,x])
d.pop(k[,x])
d.popitem()
```

Try out some of these in the below cell. You can find more info with `help(dict)` or `help(dict.keys)`.

In [174]:
d=dict(a=1, b=2, c=3, d=4, e=5)
d.values()

dict_values([1, 2, 3, 4, 5])

#### Sets
Set is a dynamic, unordered container. It works a bit like dictionary, but only the keys are stored. And each key can be stored only once. The set requires that the keys to be stored are hashable. Below are a few ways of creating a set:

In [175]:
s=set([1,2,2,'a'])
print(s)
s=set()  # empty set
print(s)
s.add(7) # add one element
print(s)

{1, 2, 'a'}
set()
{7}


A more useful example:

In [176]:
s="mississippi"
print("There are %i distinct characters in %s" % (len(set(s)), s))

There are 4 distinct characters in mississippi


The `set` provides the following non-mutating methods:

In [177]:
s=set()
s1=set()
s.copy()
s.issubset(s1)
s.issuperset(s1)
s.union(s1)
s.intersection(s1)
s.difference(s1)
s.symmetric_difference(s1);

The last four operation can be tedious to write to create a more complicated expression. The alternative is to use the corresponding operator forms: `|`, `&`, `-`, and `^`. An example of these:

In [178]:
s=set([1,2,7])
t=set([2,8,9])
print("Union:", s|t)
print("Intersection:", s&t)
print("Difference:", s-t)
print("Symmetric difference", s^t)

Union: {1, 2, 7, 8, 9}
Intersection: {2}
Difference: {1, 7}
Symmetric difference {1, 7, 8, 9}


There are also the following mutating methods:
```
s.add(x)
s.clear()
s.discard()
s.pop()
s.remove(x)
```

And the set operators `|`, `&`, `-`, and `^` have the corresponding mutating, augmented assignment forms: `|=`, `&=`, `-=`, and `^=`.

#### <div class="alert alert-info">Exercise X (distinct characters)</div>

Write function `distinct_characters` that gets a list of strings as a parameter. It should return a dictionary whose keys are the strings of the input list and the corresponding values are the numbers of distinct characters in the key.

Use the `set` container to temporarily store the distinct characters in a string.
Example of usage:
`distinct_characters(["check", "look", "try", "pop"])`
should return
`{ "check" : 4, "look" : 3, "try" : 3, "pop" : 2}`.
<hr/>

#### Miscellaneous stuff

To find out whether a container includes an element, the `in` operator can be used. The operator returns a truth values. Some examples of the usage:

In [179]:
print(1 in [1,2])
d=dict(a=1, b=3)
print("b" in d)
s=set()
print(1 in s)
print("x", "text")

True
True
False
x text


As a special case, for string the `in` operator can be used to check whether a string is part of another string:

In [180]:
print("issi" in "mississippi")
print("issp" in "mississippi")

True
False


Elements of a container can be unpacked into variables:

In [181]:
first, second = [4,5]
a,b,c = "bye"
print(c)
d=dict(a=1, b=3)
key1, key2 = d
print(key1, key2)

e
a b


In membership testing and unpacking only the keys of a dictionary are used, unless either values or items (like below) are explicitly asked.

In [182]:
for key, value in d.items():
    print("For key '%s' value %i was stored" % (key,value))

For key 'a' value 1 was stored
For key 'b' value 3 was stored


To remove the binding of a variable, use the `del` statement. For example:

In [183]:
s="hello"
del s
# print(s)    # This would cause an error

To delete an item from a container, the `del` statement can again be applied:

In [184]:
L=[13,23,40,100]
del L[1]
print(L)

[13, 40, 100]


In similar fashion `del` can be used to delete a slice. Later we will see that `del` can delete attributes from an object.

#### <div class="alert alert-info">Exercise X (reverse dictionary)</div>

Let `d` be a dictionary that has English words as keys and a list of Finnish words as values. So, the
dictionary can be used to find out the Finnish equivalents of an English word in the following way:

```
d["move"]
["liikuttaa"]
d["hide"]
["piilottaa", "salata"]
```

Make a function `reverse_dictionary` that creates a Finnish to English dictionary based on a English to Finnish dictionary given as a parameter. It should work like this:
```
d={"move":["liikuttaa"], "hide":["piilottaa", "salata"]}
reverse_dictionary(d)
{’liikuttaa’: [’move’], ’salata’: [’hide’], ’piilottaa’: [’hide’]}
```

Be careful with synonyms!&#8718;
<hr/>

#### <div class="alert alert-info">Exercise X (find matching)</div>

Write function `find_matching` that gets a list of strings and a search string as parameters. The function should return the indices to those elements in the input list that contain the search string.

An example:
`find_matching(["sensitive", "engine", "rubbish", "comment"], "en")`
should return the list
`[0, 1, 3]`.
<hr/>

### Compact way of creating data structures
We can now easily create complicated data structures using for loops:

In [185]:
L=[]
for i in range(10):
    L.append(i**2)
print(L)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


Because this kind of pattern is often used, Python offers a short-hand for this. A *list comprehension* is an expression that allows creating complicated lists on one line. The notation is familiar from mathematics:

$\{a^3 : a \in \{1,2, \ldots, 10\}\}$

The same written in Python as a list comprehension:

In [186]:
L=[ a**3 for a in range(1,11)]
print(L)

[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]


The generic from of a list comprehension is:
`[ expression for element in iterable lc-clauses ]`.
Let's break this syntax into pieces. The iterable can be any sequence (or something more general). The lc-clauses consists of zero or more of the following clauses:
* for elem in iterable
* if expression

A more complicated example. How would you describe these numbers?

In [187]:
L=[ 100*a + 10*b +c for a in range(0,10)
                    for b in range(0,10)
                    for c in range(0,10) 
                    if a <= b <= c]
print(L)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 22, 23, 24, 25, 26, 27, 28, 29, 33, 34, 35, 36, 37, 38, 39, 44, 45, 46, 47, 48, 49, 55, 56, 57, 58, 59, 66, 67, 68, 69, 77, 78, 79, 88, 89, 99, 111, 112, 113, 114, 115, 116, 117, 118, 119, 122, 123, 124, 125, 126, 127, 128, 129, 133, 134, 135, 136, 137, 138, 139, 144, 145, 146, 147, 148, 149, 155, 156, 157, 158, 159, 166, 167, 168, 169, 177, 178, 179, 188, 189, 199, 222, 223, 224, 225, 226, 227, 228, 229, 233, 234, 235, 236, 237, 238, 239, 244, 245, 246, 247, 248, 249, 255, 256, 257, 258, 259, 266, 267, 268, 269, 277, 278, 279, 288, 289, 299, 333, 334, 335, 336, 337, 338, 339, 344, 345, 346, 347, 348, 349, 355, 356, 357, 358, 359, 366, 367, 368, 369, 377, 378, 379, 388, 389, 399, 444, 445, 446, 447, 448, 449, 455, 456, 457, 458, 459, 466, 467, 468, 469, 477, 478, 479, 488, 489, 499, 555, 556, 557, 558, 559, 566, 567, 568, 569, 577, 578, 579, 588, 589, 599, 666, 667, 668, 669, 677, 678, 679, 688, 689, 699, 777, 778, 779,

If one needs only to iterate through the list once, it is more memory efficient to use a *generator expression* instead. The only thing that changes syntactically is that the surrounding brackets are replace by parentheses:

In [188]:
G = ( 100*a + 10*b + c for a in range(0,10)
                       for b in range(0,10)
                       for c in range(0,10) 
                       if a <= b <= c )
print(sum(G))   # This iterates through all the elements from the generator
print(sum(G))   # It doesn't start from the beginning, so all elements are already consumed

60885
0


<div class="alert alert-warning">Note above that one can only iterate through the generator once.</div>

Similary a *dictionary comprehension* creates a dictionary:

In [189]:
d={ k : k**2 for k in range(10)}
print(d)

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}


And a *set comprehension* creates a set:

In [190]:
s={ i*j for i in range(10) for j in range(10)}
print(s)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 24, 25, 27, 28, 30, 32, 35, 36, 40, 42, 45, 48, 49, 54, 56, 63, 64, 72, 81}


#### <div class="alert alert-info">Exercise X (two dice comprehension)</div>

Redo the earlier exercise which printed all the pairs of two dice results that sum to 5. But this time use a list comprehension.
<hr/>

### Processing sequences
In this section we will go through some useful tool, are maybe familiar to you from some function programming language like *lisp* or *haskell*. These functions rely on functions being first-class objects in Python, that is, you can

* pass a function as a parameter to another function
* return a function as a return value from some function
* store a function in a data structure or a variable

We will talk about `map`, `filter`, and `reduce` functions. We will also cover how to create functions with no name using the *lambda* expression.

#### Map and lambda functions
The `map` function gets a list and a function as parameters, and it returns a new list whose
elements are elements of the original list transformed by the parameter function. For this to work the parameter must take exactly one parameter and return a value. An example will clarify this concept:

In [191]:
def double(x):
    return 2*x
L=[12,4,-1]
print(map(double, L))

<map object at 0x7fe8cc6c86a0>


The map function returns a map object for efficiency reasons. However, since we only want print the contents, we first convert it to a list and then print it:

In [192]:
print(list(map(double,L)))

[24, 8, -2]


When one reads numerice data from a file or from the internet, the number of in string. Before they can be used in computations, they must first be converted to ints or floats.
A simple example will showcase this.

In [193]:
s="12 43 64 6"
L=s.split()        # The split method of the string class, breaks the string at whitespaces
                   # to a list of strings.
print(L)
print(sum(map(int, L)))  # The int function converts a string to an integer

['12', '43', '64', '6']
125


Sometimes it feels unnecessary to write a function is you are only going to use it in one  `map` function call. For example the function

In [194]:
def add_double_and_square(x):
    return 2*x+x**2 

It is not likely that you will need it elsewhere in your program. The solution is to use an *expression* called *lambda* define a function with no name. Because it is an expression we can put it, for instance, in a argument list of a function call. The lambda expression has the form `lambda param1,param2, ... : expression`, where after the lambda keyword you list the parameters of the function, and after the colon is the expression that uses the parameters to compute the return value of function. Let's replace the above `add_double_and_square` function with a lambda function and apply it to a list using the `map` function.

In [195]:
L=[2,3,5]
print(list(map(lambda x : 2*x+x**2, L)))

[8, 15, 35]


#### <div class="alert alert-info">Exercise X (transform)</div>

Write a function `transform` that gets two strings as parameters and returns a list of integers. The function should split the strings into words, and convert these words to integer. This should give two lists of integers. Then the function should return a list whose elements are multiplication of two integers in the respective positions in the lists.
For example
`transform("1 5 3", "2 6 -1")`
should return the list of integers
`[2, 30, -3]`.

You **have** to use `split`, `map`, and `zip` functions/methods. You may assume that the two input strings are in correct format.
<hr/>

#### Filter function


The `filter` function takes a function and a list as parameters. But unlike with the map construct, now the function must take exactly one parameter and return a truth value (True or False). The `filter` function then creates a new list with only those elements from the original list for which the parameter function returns True. The elements for which the parameter function returns False are filtered out. An will demonstrate the `filter` function:

In [196]:
def is_odd(x):
    """Returns True if x is odd and False if x is even"""
    return x % 2 == 1         # The % operator returns the remainder of integer division
L=[1, 4, 5, 9, 10]
print(list(filter(is_odd, L)))

[1, 5, 9]


The even elements of the list were filtered out.

#### <div class="alert alert-info">Exercise X (positive list)</div>

Write a function `positive_list` that gets a list of numbers as a parameter, and returns a list with the negative numbers filtered out using the `filter` function.

The function call `positive_list([2,-2,1,-7])` should return the list `[2,1]`. Test your function in the `main` function.
<hr/>

#### The reduce function
The `sum` function that returns the sum of a numeric list, can be though to reduce a list to a single element. It does this reduction by repeated applying the `+` operator until all the list elements are consumed. For instance, the list [1,2,3,4] is reduced by the expression `(((0+1)+2)+3)+4` of repeated applications of the `+` operator. We could implement this with the following function:

In [197]:
def sumreduce(L):
    s=0
    for x in L:
        s = s+x
    return s

Because this is a common pattern, the designers of Python included a function called `reduce` to simplify the reduction of a sequence. You give the operator you want to use as a parameter to reduce (addition in the above example). And you also give a starting value of the computation (starting value 0 was used above). We can now get rid of the separate function sumreduce by using the reduce function:

In [198]:
L=[1,2,3,4]
from functools import reduce   # import the reduce function from the functools module
reduce(lambda x,y:x+y, L, 0)

10

If we wanted to get a product of all numbers in a sequence, we would use

In [199]:
reduce(lambda x,y:x*y, L, 1)

24

This corresponds to the sequence `(((1*1)*2)*3)*4` of application of operator `*`.

<div class="alert alert-warning">Note that use of the starting value is necessary, because we want to be able to reduce lists of lengths 0 and 1 as well. The default starting value is zero.

## String handling
We have already seen how to index, slice, concatenate, and repeat strings. Let's now look into what methods the `str` class offers. In Python strings are immutable. This means that for instance the following assignment is not legal:

In [200]:
s="text"
# s[0] = "a"    # This is not legal in Python

Because of the immutability of the strings, the string methods work by returning a value; they don't have any side-effects. In the rest of this section we briefly describe several of these methods. The methods are here divided into five groups.

### Classification of strings
All the following methods will take no parameters and return a truth value. An empty string will always result in `False`.

* `s.isalpha()` True if all characters are letters or digits
* `s.isalpha()` True if all characters are letters
* `s.isdigit()` True if all characters are digits
* `s.islower()` True if contains letters, and all are lowercase
* `s.isupper()` True if contains letters, and all are uppercase
* `s.isspace()` True if all characters are whitespace
* `s.istitle()` True if uppercase in the beginning of word, elsewhere lowercase

### String transformations
The following methods do conversions between lower and uppercase characters in the string. All these methods return a new string.

* `s.lower()`      Change all letters to lowercase
* `s.upper()`      Change all letters to uppercase
* `s.capitalize()` Change all letters to capitalcase
* `s.title()` Change to titlecase
* `s.swapcase()` Change all uppercase letters to lowercase, and vice versa







### Searching for substrings
All the following methods get the wanted substring as the
parameter, except the replace method, which also gets the
replacing string as a parameter

* `s.count(substr)` Counts the number of occurences of a substring
* `s.find(substr)` Finds index of the first occurence of a substring, or -1
* `s.rfind(substr)` Finds index of the last occurence of a substring, or -1
* `s.index(substr)` Like find, except ValueError is raised if not found
* `s.rindex(substr)` Like rfind, except ValueError is raised if not found
* `s.startswith(substr)` Returns True if string starts with a given substring
* `s.endswith(substr)` Returns True if string ends with a given substring
* `s.replace(substr, replacement)` Returns a string where occurences of one string
are replaced by another

Keep also in mind that the expression `"issi" in "mississippi"` returns a truth value of whether the first string occurs in the second string.








### Trimming and adjusting
* `s.strip(x)` Removes leading and trailing whitespace by default, or characters found in string x
* `s.lstrip(x)` Same as strip but only leading characters are removed
* `s.rstrip(x)` Same as strip but only trailing characters are removed
* `s.ljust(n)` Left justifies string inside a field of length n
* `s.rjust(n)` Right justifies string inside a field of length n
* `s.center(n)` Centers string inside a field of length n

An example of using the `center` method and string repetition:

In [201]:
L=[1,3,5,7,9,1,1]
print("-"*11)
for i in L:
    s="*"*i 
    print("|%s|" % s.center(9))
print("-"*11)

-----------
|    *    |
|   ***   |
|  *****  |
| ******* |
|*********|
|    *    |
|    *    |
-----------


### Joining and splitting
The `join(seq)` method joins the strings of the sequence `seq`. The string itself is used as a delimitter. An example:

In [202]:
"--".join(["abc", "def", "ghi"])

'abc--def--ghi'

In [203]:
L=list(map(lambda x : " %s" % x, range(100)))
s=""
for x in L:
    s = s + x   # Don't ever do this, it creates a new string at every iteration
print(s)
print("".join(L))  # This is the correct way of building a string out of smaller strings

 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99


<div class="alert alert-warning">If you want to build a string out of smaller strings, then
first put the small strings into a list, and then use the `join` method to catenate the pieces together. It is much more efficient this way. Use the `+` catenation operator only if you have very few short strings that you want to catenate.</div>

The method `split(sep=None)` divides a string into pieces that are separated by the string `sep`. The pieces are returned in a list. For instance, the call 'abc--def--ghi'.split("--") will result in

In [204]:
'abc--def--ghi'.split("--")

['abc', 'def', 'ghi']

#### <div class="alert alert-info">Exercise X (acronyms)</div>

Write function `acronyms` which takes a string as a parameter and returns a list of acronyms. A word is an acronym if it has length at least two, and all its characters are in uppercase. Before acronym detection, delete punctuation with the `strip` method.

Test this function in the `main` function with the following call:
```python
print(acronyms("""For the purposes of the EU General Data Protection Regulation (GDPR), the controller of your personal information is International Business Machines Corporation (IBM Corp.), 1 New Orchard Road, Armonk, New York, United States, unless indicated otherwise. Where IBM Corp. or a subsidiary it controls (not established in the European Economic Area (EEA)) is required to appoint a legal representative in the EEA, the representative for all such cases is IBM United Kingdom Limited, PO Box 41, North Harbour, Portsmouth, Hampshire, United Kingdom PO6 3AU."""))
```

This should return MISSING!!!!!!!
<hr/>

#### <div class="alert alert-info">Exercise X (sum equation)</div>

Write a function `sum_equation` which takes a list of positive integers as parameters and returns a string with an equation of the sum of the elements.

Example:
`sum_equation([1,5,7])`
returns
`"1 + 5 + 7 = 13"`
Observe, the spaces should be exactly as shown above. For an empty list the function should return the string "0 = 0".
<hr/>

## Modules

To ease management of large programs, software is divided
into smaller pieces. In Python these pieces are called *modules*.
A module should be a unit that is as independent from other
modules as possible.
Each file in Python corresponds to a module.
Modules can contain classes, objects, functions, ...
For example, functions to handle regular expressions are in
module `re`

The standard library of Python consists of hundreds of
modules. Some of the most common standard modules include

* `re`
* `math`
* `random`
* `os`
* `sys`

Any file with extension `.py` that contains Python source code
is a module. So, no special notation is needed to create a module.

### Using modules

Let’s say that we need to use the cosine function.
This function, and many other mathematical functions are
located in the `math` module.
To tell Python that we want to access the features offered by
this module, we can give the statement `import math`.
Now the module is loaded into memory.
We can now call the function like this:
```python
math.cos(0)
1.0
```

Note that we need to include the module name where the `cos`
function is found.
This is because other modules may have a function (or other
attribute of a module) with the same name.
This usage of different namespace for each module prevents
name clashes. For example, `gzip.open`, `os.open` are not to be confused
with the builtin `open` function.

### Breaking the namespace

If the cosine is needed a lot, then it might be tedious to
always specify the namespace, especially if the name of the
namespace/module is long.
For these cases there is another way of importing modules.
Bring a name to the current scope with
`from math import cos` statement.
Now we can used it without the namespace specifier: `cos(1)`.

Several names can be imported to the current scope with
from `math import name1, name2, ...`
Or even all names of the module with `from math import *`
The last form is sensible only in few cases, normally it just
confuses things since the user may have no idea what names
will be imported.

### Module lookup

When we try to import a module mod with the import
statement, the lookup proceeds in the following order:

* Check if it is a builtin module
* Check if the file `mod.py` is found in any of the directories in
the list `sys.path`. The first item in this list is the current
directory
* When Python is started, the `sys.path` list is initialised with
the contents of the `PYTHONPATH` environment variable

### Module hierarchy

The standard library contains hundreds of modules.
Hence, it is hard to comprehend what the library includes.
The modules therefore need to be organised somehow.
In Python the modules can be organised into hierarchies using
packages.
A package is a module that can contain other packages and
modules.
For example, the `numpy` package contains subpackages core,
distutils, f2py, fft, lib, linalg, ma, numarray, oldnumeric,
random, and testing.
And package `numpy.linalg` in turn contains modules `linalg`,
`lapack_lite` and `info`.

### Importing from packages

The statement `import numpy` imports the top-level package `numpy`
but not its subpackages:

* `import numpy.linalg` imports the subpackage, and
* `import numpy.linalg.linalg` imports the module only

If we want to skip the long namespace specification, we can
use the form

```python
from numpy.linalg import linalg
```

or

```python
from numpy.linalg import linalg as lin
```

if we want to use a different name for the module. The following command imports the function `det` (computes the determinant of a matrix) from the module linalg, which is contained in a subpackage linalg, which belongs to package numpy:
```python
from numpy.linalg.linalg import det
```

PIIRRÄ KUVA YLLÄOLEVASTA.

### Correspondence between folder and module hierarchies

The packages are represented by folders in the filesystem.
The folder should contain a file named `__init__.py` that
makes up the package body. This handles the initialisation of
the package.
The directory may contain also further directories
(subpackages) or Python files (normal modules).

```
a/
    __init__.py
    b.py
    c/
        __init__.py
        d.py
        e.py
```
![package.svg](package.svg)

### Contents of a module

Suppose we have a module named `mod.py`.
All the assignments, class definitions with the `class` statement,
and function definitions with `def` statement will create new
attributes to this module.
Let’s import this module from another Python file using the
`import mod` statement.
After the import we can access the attributes of the module
object using the normal dot notation: `mod.f()`,
`mod.myclass()`, `mod.a`, etc.
Note that Python doesn’t really have global variables that are
visible to all modules. All variables belong to some module
namespace.

Just like other objects, the module object contains its
attributes in the dictionary `modulename.__dict__`
Usually a module contains at least the attributes name and
file . Other common attributes are `__version__`,
`__author__` and `__doc__` , which contains the docstring of the
module.
If the first statement of a file is a string, this is taken as the
docstring of that module.
The attribute `__file__` is always the filename of the module.

The module attribute `__name__` has value `“__main__”` if we in are the main program,
otherwise some other module has imported us and name
equals `__file__`.

#### <div class="alert alert-info">Exercise x (usemodule) </div>

Create your own module as file `triangle.py` in the `src` folder. The module should contain two functions:

* `hypothenuse` which returns the length of the hypothenuse when given the lengths of two other sides of a right-angled triangle
* `area` which returns the area of the right-angled triangle, when two sides, perpendicular to each other, are given as parameters.

Make sure both the functions and the module have descriptive docstrings. Add also the `__version__` and `__author__` attributes to the module. Call both your functions from the main function (which is in file `usemodule.py`).

## Regular expressions

### Examples

We have already seen that we can ask from a string str
whether it begins with some substring as follows:
`str.startswith(’Apple’)`.
If we would like to know whether it starts with `”Apple”` or
`”apple”`, we would have to call `startswith` method twice.
Regular expressions offer a simpler solution:
`re.match(r’[Aa]pple’, str)`.
The bracket notation is one example of the special syntax of
*regular expressions*. In this case it says that any of the
characters inside brackets will do: either `’A’` or `’a’`. The other
letters in `”pple”` will act normally. The string `r’[Aa]pple’` is
called a pattern.

A more complicated example asks whether the string `str`
starts with either `apple` or `banana` (no matter if the first letter
is capital or not):
`re.match(r’[Aa]pple|[Bb]anana’, str)`.
In this example we saw a new special character `|` that denotes
an alternative. On either side of the bar character we have a
subpattern.

A legal variable name in Python starts with a letter or an
underline character and the following characters can also be
digits.
So legal names are, for instance: `_hidden`, `L_value`, `A123_`.
But the name `2abc` is not a valid variable name.
Let’s see what would be the regular expression pattern to
recognise valid variable names:
`r’[A-Za-z_][A-Za-z_0-9]*\Z’`.
Here we have used a shorthand for character ranges: `A-Z`.
This means all the characters from `A` to `Z`.

The first character of the variable name is defined in the first
brackets. The subsequent characters are defined in the second
brackets.
The special character `*` means that we allow any number
(0,1,2, . . . ) of the previous subpattern. For example the
pattern `r’ba*’` allows strings `’b’`, `’ba’`, `’baa’`, `’baaa’`, and
so on.
The special syntax `\Z` denotes the end of the string.
Without it we would also accept `abc$` as a valid name since
match normally checks only that a string starts with a pattern.

The special notations, like `\Z`, also cause problems with string
handling.
Remember that normally in string literals we have some
special notation: `\n` stands for newline, `\t` stands for tab, and
so on.
So, both string literals and regular expressions use similar
looking notations, which can create serious confusion.
This can be solved by using the so-called raw strings. We
denote a raw string by having an r letter before the first
quotation mark, for example `r’ab*\Z’`.
When using raw strings, the newline (`\n`), tab (`\t`), and other
special string literal notations aren’t interpreted. One should
always use raw strings when defining regular expression
patterns.

### Patterns

A pattern represents a set of strings. This set can even be
potentially infinite.
They can be used to describe a set of strings that have some
commonality; some regular structure.
Regular expressions are a classical computer science topic.
They are very common in programming tasks. Scripting
languages, like Python, are very fluent in regular expressions.
Very complex text processing can be achieved using regular
expressions.

Normal characters (letters, numbers) just represent
themselves, unless preceded by a backslash, which may trigger
some special meaning
Punctuation characters have special meaning, unless preceded
by backslash (\), which deprives their special meaning.
Use \\ to represent a backslash character without any special
meaning.
In the following slides we will go through some of the more
common RE notations.

```
. Matches any character
[...] Matches any character contained within the brackets
[^...] Matches any character not appearing after the hat (ˆ)
ˆ Matches the start of the string
$ Matches the end of the string
* Matches zero or more previous RE
+ Matches one or more previous RE
{m,n} Matches m to n occurences of previous RE
? Matches zero or one occurences of previous RE
```

We have already seen that a `|` character denotes alternatives.
For example, the pattern `r’Get (on|off|ready)’` matches
the following strings: `”Get on”`, `”Get off”`, `”Get ready”`.
We can use parentheses to create groupings inside a pattern:
`r’(ab)+’` will match the strings `”ab”`, `”abab”`, `”ababab”`,
and so on.
These groups are also given a reference number starting from 1. 
We can refer to groups using backreferences: `\number`.
For example, we can find separated patterns that get
repeated: `r’([a-z]{3,}) \1 \1’`.
This will recognise, for example, the following strings: `"aca
aca aca"`, `"turn turn turn"`. But not the strings `"aca
aba aca"` or `"ac ac ac"`.


In the following, note that a hat (ˆ) as the first character
inside brackets will create a complement set of characters:

```
`\d` same as `[0-9]`, matches a digit
`\D` same as `[ˆ0-9]`, matches anything but a digit
`\s` matches a whitespace character (newline, tab, ... )
`\S` matches a nonwhitespace character
`\w` same as `[a-zA-Z0-9_]`, matches one alphanumeric character
`\W` matches one non-alphanumeric character
```

Using the above notation we can now shorten our previous
variable name example to `r’[a-zA-Z_]\w*\Z’`

The patterns `\A`, `\b`, `\B`, and `\Z` will all match an empty
string, but in specific places.
The patterns `\A` and `\Z` will recognise the beginning and end
of the string, respectively.
Note that the patterns `ˆ` and `$` can in some cases match also
after a newline and before a newline, correspondingly.
So, `\A` is distinct from `ˆ`, and `\Z` is distinct from `$`.
The pattern `\b` matches at the start or end of a word. The
pattern `\B` does the reverse.

### Match and search functions

We have so far only used the re.match function which tries
to find a match at the beginning of a string
The function re.search allows to match any substring of a
string.
Example: `re.search(r'\bback\b', s)` will match
strings `"back"`, `"a back, is a body part"`, `"get back"`. But it
will not match the strings `"backspace"` or `"comeback"`.

The function `re.search` finds only the first occurence.
We can use the `re.findall` function to find all occurences.
Let’s say we want to find all present participle words in a
string `s`. The present participle words have ending `'ing'`.
The function call would look like this:
`re.findall(r'\w+ing\b', s)`.
Let’s try running this:

In [205]:
import re
s = "Doing things, going home, staying awake, sleeping later"
re.findall(r'\w+ing\b', s)

['Doing', 'going', 'staying', 'sleeping']

Let’s say we want to pick up all the integers from a string.
We can try that with the following function call:
`re.findall(r'[+-]?\d+', s)`.
An example run:

In [206]:
re.findall(r'[+-]?\d+', "23 + -24 = -1")

['23', '-24', '-1']

Suppose we are given a string of if/then sentences, and we
would like to extract the conditions from these sentences.
Let’s try the following function call:
`re.findall(r'[Ii]f (.*), then', s)`.
An example run:

In [207]:
s = ("If I’m not in a hurry, then I should stay. " +
    "On the other hand, if I leave, then I can sleep.")
re.findall(r'[Ii]f (.*), then', s)

['I’m not in a hurry, then I should stay. On the other hand, if I leave']

But I wanted a result: `["I'm not in a hurry", 'I leave']`. That
is, the condition from both sentences. How can this be fixed?

The problem is that the pattern `.*` tries to match as many
characters as possible.
This is called *greedy matching*.
One way of solving this problem is to notice that the two
sentences are separated by a full-stop (.).
So, instead of matching all the characters, we need to match
everything but the dot character.
This can be achieved by using the complement character
class: `[^.]`. The hat character (`ˆ`) in the beginning of a
character class means the complement character class

After the modification the function call looks like this:
`re.findall(r'[Ii]f ([^.]*), then', s)`.
Another way of solving this problem is to use a non-greedy
matching.
The repetition specifiers `+`, `*`, `?`, and `{m,n}` have
corresponding non-greedy versions: `+?`, `*?`, `??`, and `{m,n}?`.
These expressions use as few characters as possible to make
the whole pattern match some substring.
By using non-greedy version, the function call looks like this:
`re.findall(r’[Ii]f (.*?), then’, s)`.



### Functions in the `re` module

Below is a list of the most common functions in the `re` module

* re.match(pattern, str)
* re.search(pattern, str)
* re.findall(pattern, str)
* re.finditer(pattern, str)
* re.sub(pattern, replacement, str, count=0)

Functions match and search return a match object.
A match object describes the found occurence.
The function findall returns a list of all the occurences of
the pattern. The elements in the list are strings.
The function finditer works like findall function except
that instead of returning a list, it returns an iterator whose
items are match objects.
The function sub replaces all the occurences of the pattern in
str with the string replacement and returns the new string.

An example: The following program will replace all ”she”
words with ”he”

```
import re
str = "She goes where she wants to, she's a sheriff."
newstr = re.sub(r'\b[Ss]he\b', 'he', str)
print newstr
```

This will print `he goes where he wants to, he's a sheriff.`

The `sub` function can also use backreferences to refer to the
matched string. The backreferences \1, \2, and so on, refer
to the groups of the pattern, in order.
An example:
```
import re
str = """He is the president of Russia.
He’s a powerful man."""
newstr = re.sub(r'(\b[Hh]e\b)', r'\1 (Putin)', str, 1)
print newstr
```

This will print
```
He (Putin) is the president of Russia.
He’s a powerful man.
```

### Match object

Functions `match`, `search`, and `finditer` use `match` objects
to describe the found occurence.
The method `groups()` of the match object returns the tuple
of all the substrings matched by the groups of the pattern.
Each pair of parentheses in the pattern creates a new group.
These groups are are referred to by indices 1, 2, ...
The group 0 is a special one: it refers to the match created by
the whole pattern.

Let’s look at the match object returned by the call

```
mo = re.search(r'\d+ (\d+) \d+ (\d+)',
'first 123 45 67 890 last')
```

The call `mo.groups()` returns a tuple `(’45’, ’890’)`.
We can access just some individual groups by using the
method `group(gid, ...)`.
For example, the call `mo.group(1)` will return `’45’`.
The zeroth group will represent the whole match:
`’123 45 67 890’`

In addition to accessing the strings matched by the pattern
and its groups, the corresponding indices of the original string
can be accessed:

* The `start(gid=0)` and `end(gid=0)` methods return the start
and end indices of the matched group gid, correspondingly
* The method `span(gid)` just returns the pair of these start
and end indices

The match object mo can also be used like a boolean value:

```python
mo = re.search(...)
if mo:
    # do something
```

will do something if a match was found.
Alternatively, the match object can be converted to a boolean
value by the call `found = bool(mo)`.

### Miscellaneous stuff

If the same pattern is used in many function calls, it may be
wise to precompile the pattern, mainly for efficiency reasons.
This can be done using the 'compile(pattern, flags=0)' function
in the re module. The function returns a so-called RE object.
The RE object has method versions of the functions found in
module 're'.
The only difference is that the first parameter is not the
pattern since the precompiled pattern is stored in the RE
object.

The details of matching operation can be specified using
optional flags.
These flags can be given either inside the pattern or as a
parameter to the compile function.
Some of the more common flags are given in the following
table

| x   | Flag |
|-----|--------------|
|`(?i)` | re.IGNORECASE|
|`(?m)` | re.MULTILINE|
|`(?s)` | re.DOTALL|
-------------------------

The elements on the left can appear anywhere in the pattern
but preferably in the beginning.
On the right there are attributes of the re module that can be
given to the compile function as the second parameter

The `IGNORECASE` flag makes lower- and uppercase
characters appear as equal.
The `MULTILINE` flag makes the special characters `ˆ` and `$`
match the beginning and end of each line in addition to the
beginning and end of the whole string. These flags make `\A`
differ from `ˆ`, and `\Z` differ from `$`.
The `DOTALL` flag makes the character class `.` (dot) also
accept the newline character, in addition to all the other
letters.

When giving multiple flags to the compile function, the flags
can be separated with the `|` sign.
For example, `re.compile(pattern, re.MULTILINE | re.DOTALL)`.
This is equal to `re.compile('(?m)(?s)' + pattern)`.

#### <div class="alert alert-info">Exercise X (integers in brackets)</div>

Write function `integers_in_brackets` that finds from a given string all integers that are enclosed in brackets.

Example run:
`integers_in_brackets("  afd [asd] [12 ] [a34]  [ -43 ]tt [+12]xxx")`
returns
`[12, -43, 12]`.
So there can be whitespace between the number and the brackets, but no other character besides those that make up the integer.
<hr/>

## Basic file processing

A file can be opened with the `open` function. The call `open(filename, mode="r")` will return a *file object*, whose type is `file`. This file object can be used to refer to a file on disk. For example, when we want to read from or write to a file, we can used the methods `read` and `write` of the file object. After the file object is no longer needed, a call to the `close` method should be made.

We can control what kind of operations we can perform on a file with the *mode* parameter of the `open` function. Different options include opening a file for reading or writing,
whether the file should exists already or be created with the
call to open, etc. Here's a list of all the opening modes:

| Mode | Description |
| ---- | ----------- |
| `r`  | read-only mode, file must exist |
| `w`  | write-only mode, creates, or overwrites an existing file |
| `a`  | write-only mode, write always appends to the end |
| `r+` | read/write mode, file must already exist |
| `w+` | read/write mode, creates, or overwrites an existing file |
| `a+` | read/write mode, write will append to end |

In the end of the mode string either the letter `t` or `b` can be appended. These stand for text mode and binary mode. If this letter is not given, the file type is text mode by default. 

For binary mode the contents of the file are not interpreted in any way, and the read and write methods handle bytes. (A byte consists of 8 bits and can be used to represent a number in the range 0 to 255.)

In the text mode two interpretations happen

* On Windows operating system the end of line in files is encoded by two characters. When the file is read these two charactes are converted to `'\n'` character. During writes to a file this conversion happens in the opposite direction.
* One character is encoded in the file as one or more bytes. This conversion happens automatically during read and write operations. One common encoding between bytes and characters is utf-8. In this encoding, the Finnish character `'ä'`, for example, is encoded as the following sequence of bytes:

In [208]:
"ä".encode("utf-8")

b'\xc3\xa4'

Above the two bytes were expressed as hexadecimals. In decimal notation they would be 195 and 164. (Both in the range from 0 to 255.)

In [209]:
list("ä".encode("utf-8"))              # Show as a list of integers

[195, 164]

What is the utf-8 encoding of the letter `'a'`?

During this course we will only consider files containing text, so the default text mode is fine for us.

### Some common file object methods
* `read(size)` will read size characters/bytes as a string
* `write(string)` will write string/bytes to a file
* `readline()` will read a string until and including the next newline character is met
* `readlines()` will return a list of all lines of a file
* `writelines()` will write a list of lines to a file
* `flush()` will try to make sure that the changes made to a file are written to disk immediately

In [210]:
f = open("basics.ipynb", "r") # Let's open this notebook file, 
                              # which is essentially a text file.
                              # So you can open it in a texteditor as well.
        
for i in range(5):            # And read the first five lines
    line = f.readline()
    print("Line %i: %s" % (i, line), end="")
f.close()

Line 0: {
Line 1:  "cells": [
Line 2:   {
Line 3:    "cell_type": "markdown",
Line 4:    "metadata": {},


It is easy to forget to close the file. One can use a *context manager* to solve this problem. A context manager is created with the `with` statement. After the indented block of the with statements exits, the file will be automatically closed.

In [211]:
with open("basics.ipynb", "r") as f:          # the file will be automatically closed,
                                              # when the with block exits
    for i in range(5):
        line = f.readline()
        print("Line %i: %s" % (i, line), end="")

Line 0: {
Line 1:  "cells": [
Line 2:   {
Line 3:    "cell_type": "markdown",
Line 4:    "metadata": {},


The `file` object is iterable. This means that we can iterate through the lines in the file using a for loop, like in the below example:

In [212]:
max_len = 0
with open("basics.ipynb", "r") as f:
    for line in f:    # iterates through all the lines in the file
        if len(line) > max_len:
            max_len = len(line)
print("The longest line in this file has length %i" % max_len)

The longest line in this file has length 1002


### Standard file objects
Python has automatically three file objects open:

* `sys.stdin` for *standard input*
* `sys.stdout` for *standard output*
* `sys.stderr` for *standard error*
To read a line from a user (keyboard), you can call `sys.stdin.readline()`. To write a line to a user (screen), call `sys.stdout.write(line)`. The standard error is meant for error messages only, even though its output often goes to the same destination as standard output.

The print function uses the file `sys.stdout` and input function uses the file `sys.stdin`. An example of usage:

In [213]:
import sys
import random
i=random.randint(-10,10)
if i >= 0:
    sys.stdout.write("Got a positive integer.\n")
else:
    sys.stderr.write("Got a negative integer.\n")

Got a positive integer.


These standard file objects are meant to be a basic input/output mechanism in textual form. The destinations of the file objects can be changed to point
somewhere else than the usual keyboard and screen. Very often these are redirected to some files. For example, it is usual to point the stderr to a file where all
error messages are logged.

## sys module

We saw above that the `sys` module contains the three file objects `sys.stdin`, `sys.stdout`, and `sys.stderr`. It has also few other useful attributes. The attribute `sys.path` is the list of folders that Python uses to look for imported modules. The list `sys.argv` contains the so called *command line parameters*. For example in Linux if you are using the terminal, then you can run your program with the command `python3 programname.py param1 param2 ...`. After Python has started your program, the command line parameters are visible as follows. The name of the program is in `sys.argv[0]`. The rest of the command line parameters are after the program name in this list: `sys.argv[1]=="param1"`, `sys.argv[2]=="param2"`, and so on. The command line parameters can be useful in adjusting the behaviour of your program. A few examples of these will be in the following exercises. (The terminal window is a textual interface to your computer instead of the usual graphical interface.)

The function `sys.exit` can be used to exit immediately your program. The integer parameter given to this function is the return value of the program. Usually the return value 0 means that the program ran successfully, and non-zero integer means that an error occurred. This return value is accessible from the terminal window from where you started the program.

#### <div class="alert alert-info">Exercise X (word frequencies)</div>

Create function `word_frequencies` that gets a filename as a parameter and returns a dict with the word frequencies. In the dictionary the keys are the words and the corresponding values are the number of times that word occurred in the file specified by the function parameter. Read all the lines from the file and split the lines into words using the `split()` method. Further, remove punctuation from the words using the `strip('"[],.:?!')` method call.

Test this function in the main function using the file `alice.txt`. The output format should be like:
MISSING!!!!
<hr/>

#### <div class="alert alert-info">Exercise X (summary)</div>

Part 1.

Create a function called `summary` that gets a filename as a parameter. The input
file should contain a floating point number on each line of the file. Make your program read these
numbers and then return a triple containing the sum, average, and standard deviation of these numbers for the file.
As a remainder, the formula for corrected sample standard deviation is
\\(\sqrt{\frac{\sum_{i=1}^n (x_i - \overline x)^2}{n-1}}\\),
where $\overline x$ is the average.

Example of usage from the command line:
`python3 src/summary.py src/example.txt src/example2.txt`
or
`python3 summary.py example.txt example2.txt`
if you are in the folder src.

The output should look like this:
```
File: src/example.txt Sum: 51.400000 Average: 10.280000 Stddev: 8.904606
File: src/example2.txt Sum: 5446.200000 Average: 1815.400000 Stddev: 3124.294045
```

Part 2.

If some line doesn’t represent a number, you can just ignore that line. You can achieve this with the *try-except* block. An example of recovering from an exceptional situation:
```python
try:
    x = float(line)                 # The float constructor raises ValueError exception if conversion is no possible
except ValueError:
    # Statements in here are executed when the above conversion fails
```
We will cover more about exceptions later in the course.
The `main` function should call the function summary for each filename in the list `sys.argv[1:]` of command line parameters.
<hr/>

#### <div class="alert alert-info">Exercise X (file count)</div>

Part 1.

Create a function `file_count` that gets a filename as parameter and return a triple of numbers. The function should read the file, count the number of lines, words, and characters in the file, and return a triple with these count in this order. You get division into words by splitting at whitespace, you don't have to remove punctuation.

Part 2.

Create a main function that calls `file_count` for each filename in the list of command line parameters sys.argv[1:].
For call `python3 src/file_count file1 file2 ...`
the output should be
```
?      ?       ?       file1
?      ?       ?       file1
...
```
The fields are separated by tabs (`\t`). The fields are in order: linecount, wordcount, charactercount, filename.
<hr/>

#### <div class="alert alert-info">Exercise X (file extensions)</div>

Part 1.

Write function `file_extensions` that gets as a parameter a filename.
It should read through the lines from this file. Each line contains a filename.
Find the extension for each filename. The function should return a pair, where the
first element is a list containing all filenames with no extension (no period (.) appears in the filename).
The second element of the pair is a dictionary with extensions as keys and corresponding values are lists with filenames having that extension.

Sounds a bit complicated, but hopefully the next example will clarify this.
If the file contains the following lines
```
file1.txt
mydocument.pdf
file2.txt
archive.tar.gz
test
```
then the return value should be the pair:
`(["test"], { "txt" : ["file1.txt", "file2.txt"], "pdf" : ["mydocument.pdf"], "gz" : ["archive.tar.gz"] } )`

Part 2.

Write a main method that calls the `file_extensions` function with "src/filenames.txt" as the argument. Then output a line for each extension with the number of files with that extension.
With the example in part 1, the output should be
```
1 files with no extension
gz 1
pdf 1
txt 2
```
Had there been no filenames without extension then the first line would have been `0 files with no extension`. In the printout list the extensions in alphabetical order.
<hr/>

## Objects and classes

Python is an object-oriented programming language like Java
and C++.
But unlike Java, Python doesn’t force you to use classes,
inheritance and methods.
If you like, you can also choose the structural programming
paradigm with functions and modules.

Every value in Python is an object.
Objects are a way to combine data and the functions that
handle that data.
This combination is called encapsulation.
The data items and functions of objects are called attributes,
and in particular the function attributes are called methods.
For example, the operator `+` on integers calls a method of
integers, and the operator `+` on strings calls a method of
strings.

Functions, modules, methods, classes, etc are all first class
objects. This means that these objects can be

* stored in a container
* passed to a function as a parameter
* returned by a function
* bound to a variable

One can access an attribute of an object using the *dot
operator*: `object.attribute`.
For example: if `L` is a list, we can refer to the method `append`
with `L.append`. The method call can look, for instance, like
this: `L.append(4)`.
Because also modules are objects in Python, we can interpret
the expression `math.pi` as accessing the data attribute `pi` of
module object `math`.

Numbers like 2 and 100 are instances of type `int`. Similarly,
`"hello"` is an instance of type `str`.
When we write `s=set()`, we are actually creating a new
instance of type `set`, and bind the resulting instance object to
`s`.

A user can define his own data types.
These are called classes.
A user can call these classes like they were functions, and they
return a new instance object of that type.
Classes can be thought as recipes for creating objects.

An example of class definition:
```python
class MyClass(object):
    """Documentation string of the class"""

    def __init__(self, param1, param2):
        "This initialises an instance of type ClassName"
        self.b = param1 # creates an instance attribute
        c = param2      # creates a local variable of the function
        # statements ...
    
    def f(self, param1):
        """This is a method of the class"""
        # some statements
    
    a=1 # This creates a class attribute
```

The class definition starts with the class statement.
With this statement you give a name for your new type, and
also in parentheses list the base classes of your class.
The next indented block is the class body.
After the whole class body is read, a new type is created.
Note that no instances are created yet.
All the attributes and methods of the class are defined in the
class body.

The example class has two methods: `__init__` and `f`.
Note that their first parameter is special: `self`. It
corresponds to this variable of C++ or Java.
`__init__`
does the initialisation when an instance is created.
At instantiation with `i=MyClass(2,3)` the parameters
`param1` and `param2` are bound to values 2 and 3, respectively.
Now that we have an instance `i`, we can call its method `f`
with the dot operator: `i.f(1)`.
The parameters of `f` are bound in the following way:
`self=i` and `param1=1`.

There are differences in how an assignment inside a class
creates variables.
The attribute `a` is at class level and is common for all
instances of the class `MyClass`.
The variable `c` is a local variable of the function `__init__`, and
cannot therefore be used outside the function.
The attribute `b` is specific to each instance of `MyClass`. Note
that `self` refers to the current instance.
An example: for objects `x=MyClass(1,0)` and
`y=MyClass(2,0)` we have `x.b != y.b`, but `x.a == y.a`.

All methods of a class have a mandatory first parameter which
refers to the instance on which you called the method.
This parameter is usually named `self`.
If you want to access the class attribute `a` from a method of
the class, use the fully qualified form `MyClass.a`.
The methods whose names both begin and end with two
underscores are called *special methods*. For example, `__init__`
is a special method. These methods will be discussed in detail
later.

### Instances

We can create instances by calling a class like it were a
function: `i = ClassName(...)`.
Then parameters given in the call will be passed to the
`__init__` function.
In the `__init__` method you can create the instance specific
attributes.
If `__init__` is missing, we can create an instance without
giving any parameters. As a consequence, the instance has no
attributes.
Later you can (re)bind attributes with the assignment
`instance.attribute = new value`.

If that attribute did not exist before, it will be added to the
instance with the assigned value.
In Python we really can add or delete attributes to/from an
existing instance.
This is possible because the attribute names and the
corresponding values are actually stored in a dictionary.
This dictionary is also an attribute of the instance and is
called `dict`.
Another standard attribute in addition to dict is called
`__class__`. This attribute stores the class of the instance.
That is, the type of the object

### Attribute lookup

Suppose `x` is an instance of class `X`, and we want to read an
attribute `x.a`.
The lookup has three phases:
* First it is checked whether the attribute `a` is an attribute of
the instance `x`
* If not, then it is checked whether `a` is a class attribute of `x`’s
class `X`
* If not, then the base classes of `X` are checked

If instead we want to bind the attribute `a`, things are much
simpler.
`x.a = value` will set the instance attribute.
And `X.a = value` will set the class attribute.
Note that if a base of `X`, the class `X`, and the instance `x` each
have an attribute called `a`, then `x.a` hides `X.a`, and `X.a` hides
the attribute of the base class.

#### <div class="alert alert-info">Exercise X (prepend)</div>

Create a class called `Prepend`. We create an instance of the class by giving a string as a parameter
to the initializer. The initializer stores the parameter in an instance attribute `start`. The class
also has a method `write(s)` which prints the string `s` prepended with the `start` string.
An example of usage:
```python
p=Prepend("+++ ")
p.write("Hello");
```
Will print
```
+++ Hello
```

Try out the class in the main function.
<hr/>

### Inheritance

Inheritance allows us to reuse the code of an existing class `B`
in creating a new class `C`.
Let’s recap how the attribute lookup worked for classes.
When looking for an attribute, the lookup procedure starts
with the instance dictionary, and continues with the class
attributes.
If both fail, then the attribute is searched from the base
classes and, recursively, from their base classes.

So, it may look like we access an attribute of a class `C`, when
in reality we are accessing the attribute of its base class `B`.
In this case we say that the class `C` inherits the attribute from
its base class `B`.
If we have attributes with the same name in both the class
and its base class, the attribute of the base class is hidden.
We say that the class `C` overrides the attribute of the base
class `B`.
Terminology: `B` is a base class and `C` is a derived class.

Example:

In [214]:
class B(object):
    def f(self):
        print("Executing B.f")
    def g(self):
        print("Executing B.g")
    
class C(B):
    def g(self):
        print("Executing C.g")
        
x=C()
x.f() # inherited from B
x.g() # overridden by C

Executing B.f
Executing C.g


A derived class is sometimes also called a *subclass* and the
base class is called *super class*.
The inheritance relation of two classes `B` and `C` can be tested
with function `issubclass`:
`issubclass(C,B)==True` but `issubclass(B,C)==False`
Function `isinstance(obj, cls)` allows us to test whether
an instance has type `cls` or has an ancestor class of type `cls`.
Let’s create instances `x=C()` and `y=B()`.
Now we have `isinstance(x,B)==
isinstance(x,C)==isinstance(y,B)==True`.
But `isinstance(y,C)==False`.

![inheritance_hierarchy.svg](inheritance_hierarchy.svg)

`object` should be a base class or an ancestor class of every
other class.
This means that `isinstance(x, object)==True` for all
instances `x`.

By deriving from an existing class we can modify and/or
extend its behaviour, without touching the original class.
For example, if we want to add one method to a list class,
we can use inheritance. Therefore we have to only code the
part that has changed and reuse the rest of the code of type
list.
Another use of inheritance is to create conceptual hierarchies.
For instance, later we will learn about the exception hierarchy
of Python.
Third use would be to use classes to create interfaces. There
can be several classes that have same interface (that is, they
offer the same attributes), but their behaviour or
implementation can be very different. This allows changing a
part of your program with minimal changes required elsewhere
in the code.

If in the definition of the method `C.f` we need to call the
corresponding method of class `A`, we can use the fully qualified
call `A.f(...)`.
This is called delegation.
It is useful, for instance, when you want to call the init
method of the base class from the init of the derived
class to initialise the base class attributes.

### Special methods

We have already encountered one special method, namely the
`__init__` method.
This method sets the instance attributes to some initial value.
Its first parameter is `self`, and the subsequent parameters
are the ones that were passed to the call of the class.
The `__init__` method should return no value.
In the following slides the main general purpose special
methods are introduced.
They are executed when certain operations on objects are
performed.

In the following, `C` is a class and `x` and `y` are its instances.
`__hash__` returns an int value, with the following
requirement: `x==y` implies `x.__hash__() == y.__hash__()`.
The value is used in storing objects in dictionaries and sets.
The instances `x` and `y` must be immutable
A class with `__call__` method makes its instances callable.
I.e. the call `x(a,b, ...)` will result in calling this special
method with the given parameters.
The method `__del__` gets called when the corresponding
instance gets deleted.
Method `__new__` is used to control the creation of new
instances. It can be used, for example, to create classes that
have only one instance.

The method `__str__` is called when the print statement needs
to print the value of an instance. It returns a string. The
print-format expression calls this for conversion `%s`.
The method `__repr__` is called when the interactive interpreter
prints the value of an evaluated expression, and when the
conversion `%r` for print-format expression is used. Returns a
canonical representation string that (at least in theory) can be
used to recreate the original object.
Special methods `__eq__`, `__ge__`, `__gt__`, `__le__`, `__lt__`, and
`__ne__` get called when the corresponding operators `x==y`,
`x>=y`, `x>y`, `x<=y`, `x<y`, and `x!=y` are used.

If you want the instances of your class to support the numeric
operations (like +, -, *, /, etc), you must define a set of
special methods in you class.
For example, the expression x+y will result in a call
x. add (y) which should return the result of the operation.
Here are a few of the most common numerical special
methods:

|Method|Description|
|---|------------|
|`__add__` | addition (+) |
|`__sub__` | subtraction (-) |
|`__mul__` | multiplication (*) |
|`__truediv__` | division (/) |
|`__floordiv__` | division (//) |
-----------------------

The corresponding augmented assignments += -= *= /=
have special methods iadd , isub , imul , idiv.
The conversion functions complex(), float(), int() and
long() call the following special methods:

|Method|Description|
|------|-----------|
|`__complex__` | convert to a complex number|
|`__float__` | convert to a float|
|`__int__` | convert to an integer|

In addition to the normal methods of containers, like the
append method of the list, there are several operations that
are handled by calls to special methods of the container class.
The test whether `x` is a member of container `c` is done by the
operation `x in c`. The corresponding special method call is
`x.__contains__(y)`.
Deletion of an element of container `c` can be done with the
operation `del c[key]`. This will result in the method call
`x.__delitem__`.

Reading an item of a container `c` is done with the operation
`c[key]`. The corresponding method call is
`c.__getitem__(key)`.
Similarly, setting an item with `c[key]=value` results in the
call `c.__setitem__(key,value)`.
The number of elements in a container `c` can be queried with
the function call `len(c)`. This function call actually calls the
special method `c.__len__`.
The call `iter(c)` will call the special method `__iter__`. More
about the purpose of this function in the next few slides.


#### <div class="alert alert-info">Exercise X (rational)</div>


Create a class `Rational` whose instances are rational numbers. A new rational number can be
created with the call to the class. For example, the call `r=Rational(1,4)` creates a rational
number “one quarter”. Make the instances support the following operations:
`+` `-` `*` `/` `<` `>` `==`
with their natural behaviour. Make the rationals also printable so that from the printout we can
clearly see that they are rational numbers.
<hr/>

## Exceptions

When an error occurs, what can we do?

* Print an error message
* Stop the execution of a program
* Indicate the error by returning a special value, like -1 or None
* Ignore the error
* ...

These solutions tend to combine the indication of a problem
and the reaction to the problem indication.
The behaviour of the program in error situations cannot the
changed, they are fixed in the implementation of the function.
When an erroneous situation is noticed, it may not be clear
how to handle the situation.
Usually the user or an instance that called a function knows
what to do.

Most modern computer languages have a system called
exception handling. This system separates the recognition of errors and the
handling of these situations. We can signal an error or anomalous situation by raising an
exception. Exceptions can be raised in Python with the `raise` statement:

* `raise` instance
* `raise` exception class [, expression]

In the second form, if the expression exists, it is a tuple of
parameters given to exception class.

The functions of the Python standard library raise exceptions
in error situations. Sometimes exceptions aren’t really errors. For example, when
an iterator runs out of elements, it will signal this by raising
the `StopIteration` exception.
Another less erroneus exception is the `Warning` exception.

The general form of exception catching statement is the following:

```
try:
    # here are the statements that can cause exceptions
except (Exceptionname1, Exceptionname2, ...):
    # here we handle the exceptions
else:
    # this gets executed if try-block caused no exceptions
finally:
    # this is always executed, clean-up code
```

Usually, just the try and except parts are needed.

In [215]:
L=[1,2,3]
try:
    print(L[3])
except IndexError:
    print("Index does not exist")

Index does not exist


In [216]:
def compute_average(L):
    n=len(L)
    s=sum(L)
    return float(s)/n # error is noticed here !!!
mylist=[]
while True:
    try:
        x=float(input("Give a number (non-number quits): "))
        mylist.append(x)
    except ValueError:
        break
try:
    average=compute_average(mylist)
    print("Average is", average)
except ZeroDivisionError:
    # and the error is handled here
    if len(mylist) == 0:
        print("Tried to compute the average of empty list of numbers")
    else:
        print("Something strange happened")

Give a number (non-number quits): 12
Give a number (non-number quits): a
Average is 12.0


### Exception hierarchy

In Python exceptions are objects, like all values in Python.
These objects are instantiated from exception classes.
Exception classes form naturally hierarchies:

* New exception classes can be made by inheriting from existing exception classes and extending them
* The root of this hierarchy is the class Exception
* Python defines several base classes to derive from, and several ready-to-use exception classes

![exception hierarchy](exception_hierarchy.svg)

### Too general exception specifications

The exception hierarchy allows to catch multiple similar
exceptions by catching their common base class.
This feature has to be used carefully. Over-general exception
specification, like `except Exception:`, can hide the real
reason for an error. Example of this:

In [217]:
import sys
s=input("Give a number: ")
s=s[:-1] # strip the \n character from the end
try:
    x=int(s)
    sys.stdout.wr1te("You entered %d\n" % x)
except Exception:
    print("You didn’t enter a number")

Give a number: 12
You didn’t enter a number


In the previous example, if the user doesn’t enter a string that
represents an integer, a `ValueError` is raised by the int
function. Instead of catching the `ValueError`, we catch the root of the
exception hierarchy, namely `Exception`. This results in catching all possible exceptions.
But this will cause one typing error in the program to go undetected.
Change the exception specification from `Exception` to `ValueError` to see what this error is.

### What is the error handling policy in Python

Python uses a different approach to error checking than many
other common languages.
Instead of trying to beforehand check that all the inputs are of
correct type and then contents of input variables are sensible
for some operations, Python first tries the operations and then
checks whether they caused any exceptions.
This is partly what duck typing is about: a function works for
a set of inputs if all the operations in the function body make
sense for those inputs.
So, that’s why the parameters of functions aren’t specified to
be of any certain type.

#### <div class="alert alert-info">Exercise X (extract numbers)</div>

Write a function `extract_numbers` that gets a string as a parameter. It should return a list of numbers that can be both ints and floats. Split the string to words at whitespace using the `split()` method. Then iterate through each word, and initially try to convert to an int. If unsuccesful, then try to convert to a float. If not a number then skip the word.

Example run:
`print(extract_numbers("abd 123 1.2 test 13.2 -1"))`
will return
`[123, 1.2, 13.2, -1]`
<hr/>