# Notebook №6. Python programming for data collection and analysis

Performed by Movenko Konstantin, IS/b-21-2-o

## Sorting. String formatting

### `**kwargs`

Let's talk about one way to pass function arguments using dictionaries. Recall two how to pass arguments:

In [1]:
# define function that prints it's arguments 
def myfunc(x=0, y=1):
    print("x =", x)
    print("y =", y)

In [2]:
myfunc(12, 19) # test the function by calling it with specific arguments

x = 12
y = 19


In [3]:
myfunc(y=2, x=5) # assigning function arguments by name

x = 5
y = 2


As the second example shows, arguments can be passed by specifying their names. Let's say we want write a function that takes an indefinite number of named arguments (we don't even know which ones). This can be done like this:

In [4]:
# define function that prints a indefinite number of arguments
def new_func(**kwargs):
    print(kwargs)

In [5]:
new_func(x=1, y=5, z=8, s="Some string") # test the function

{'x': 1, 'y': 5, 'z': 8, 's': 'Some string'}


The two asterisks in the definition of `new_func()` say the following: "all named parameters, passed to this function should be placed in the `kwargs` dictionary." As you can see, this is exactly what works: for example, the parameter `x=1` turned into the entry `'x': 1` in the `kwargs` dictionary. However, this happens only with “ownerless” parameters: if the function had a separate parameter `x`, then it wouldn't hit `kwargs`. For example:

In [6]:
# define function that prints a dictionary of arguments. Also the function takes the param x
def other_func(x, **kwargs):
    print(kwargs)
    
other_func(x=10, y=5) # test the function

{'y': 5}


### Sorting

Sorting - that is, the arrangement of the elements of the list in some particular order - common programming task. There are two main tools to sort lists in Python. The first is the `sort()` method, which sorts *in place*, that is, inside the list. For example:

In [7]:
my_list = [6, 9, 2, 7, 12, 8] # create a list of integers

In [8]:
my_list.sort() # sort the list

In [9]:
my_list # print sorted list

[2, 6, 7, 8, 9, 12]

The `sort()` method *changes* the original list (and therefore, by the way, can only work with lists - tuples do not have such a method). If you want to create a new list instead, you should use the `sorted()` function.

In [10]:
my_list = [6, 9, 2, 7, 12, 8] # create list of integers

In [11]:
sorted_list = sorted(my_list) # create new sorted list
sorted_list # print new sorted list

[2, 6, 7, 8, 9, 12]

So we created a new list. The old one remained unchanged.

In [12]:
my_list # print original list

[6, 9, 2, 7, 12, 8]

The `sorted()` function can be applied not only to lists, but also to immutable sequences, such as tuples. The output is always a list.

In [13]:
my_tuple = (7, 1, 2, 6) # create a tuple of integrers
print(sorted(my_tuple)) # print sorted tuple as list

[1, 2, 6, 7]


### Sorting strings

You can sort lists consisting not only of numbers, but also of more complex objects - if only they were able to compare them with each other. For example, strings can be compared with each other - they are ordered in *lexicographic order*, that is, “alphabetically” and the way they would go in a dictionary (meaning a regular paper dictionary, not a Python data type).

In [14]:
"abcd" < "b" # compare two strings

True

In [15]:
"abcd" < "addd" # compare two another strings

True

In [16]:
"a" < "aa" #  # compare two another strings

True

This is how sorting a list of strings looks like:

In [17]:
str_list = ["Bob", "Alice", "Bill", "Weigu"] # create a list of strings
str_list.sort() # sort that list
str_list # print whole sorted list

['Alice', 'Bill', 'Bob', 'Weigu']

### Sorting and loops

You can use the `sorted()` function in conjunction with the `for` statement to process the elements of a list in some particular order. For example, we have a dictionary and we want to display its elements in ascending order of the key value. Then you must explicitly ask Python to do so, as the following example shows:

In [18]:
gradebook = {'Bob': 3, 'Alice': 5, 'Weigu': 4, 'Bill': 2} # create a dictionary
# print each pair of key and value
for k in gradebook:
    print(k, gradebook[k])

Bob 3
Alice 5
Weigu 4
Bill 2


As you can see, the elements are not ordered. This is how they can be ordered when outputting:

In [19]:
# do the same but sort dictionary first (by key)
for k in sorted(gradebook):
    print(k, gradebook[k])

Alice 5
Bill 2
Bob 3
Weigu 4


### More complex sorting examples

You can sort the list in reverse order (descending). For this, the `reverse` parameter is used.

In [20]:
sorted([4, 8, 1, 7], reverse=True) # sort the list of integers on descend

[8, 7, 4, 1]

You can sort not only numbers and strings, but also more complex objects. For example, consider such a table (implemented as a list of tuples), in which the names of students and their grades for several papers are recorded.

In [21]:
# create a list of tuples
names = [("Bob", 8, 4, 9),
         ("Alice", 7, 8, 9),
         ("Weigu", 7, 5, 3),
         ("Dan", 6, 4, 3)]

In [22]:
names.sort() # sort that list

In [23]:
names # print sorted list

[('Alice', 7, 8, 9), ('Bob', 8, 4, 9), ('Dan', 6, 4, 3), ('Weigu', 7, 5, 3)]

Judging by the result, it is logical to assume that the sorting was performed by the first element — the student's name. Indeed, tuples are compared in much the same way as strings, lexicographically. First, the first elements are compared:

In [24]:
('a', 8) < ('b', 7) # compare tuples by first field

True

If the first element matches, then the second elements are compared, and so on.

In [25]:
('a', 8) < ('a', 7) # compare tuples by second field

False

But what if we wanted to sort the tuples in the names list not by the first element, but by the second or something else? To do this, you must use the `key` parameter, which specifies the sort key. Before we do that, we need to say a few words about how you can pass another function as a parameter to one function.

### Digression: functions as function arguments

Consider this function:

In [26]:
# define the function that call the argument-function with argument '2'
def superfunc(f):
    return f(2)

It takes some function `f` as an argument, calls this function, passes the number 2 to it as an argument, and returns the result that `f` returned.

For example:

In [27]:
from math import sqrt # import specific function from module
superfunc(sqrt) # call function with sqrt as parameter

1.4142135623730951

We imported the `sqrt()` function from the `math` module, and then passed the sqrt function as a parameter to the `superfunc()` function. Please note that there are no parentheses when passing after the `sqrt` function: this is because we do not *call* it, but *pass* it to another function. The `superfunc` function took our `sqrt` function and called it, passing it the number 2 as a parameter. That is, he calculated the root of two.

You can imagine that `sqrt` is a recipe written down on a piece of paper. We pass it in the form of such a piece of paper to the `superfunc` function and it somehow uses it. Let's hand over another piece of paper - she uses it. For example:

In [28]:
# define the function that increases the value of the argument by 1
def plusodin(x):
    return x + 1

In [29]:
superfunc(plusodin) # pass it to other function

3

If we try to pass something else to the `superfunc` function - for example, a string or a number - nothing will work (it expects a function).

In [30]:
superfunc("sqrt") # call the function with incorrect argument type

TypeError: 'str' object is not callable

### Sort keys

Let's return to the problem of sorting a table represented as a list of tuples. To sort such a list by the second element, you must first create a function that will return the second element of the tuple (or list) passed to it.

In [31]:
# define function that returns first element from collection as an argument
def get_second_element(x):
    return x[1]

Let's see how it works:

In [32]:
get_second_element([7, 8, 4, 2]) # test the function

8

We now pass this function as the `key` parameter to the `sort()` method (the `sorted()` function will also work):

In [33]:
names.sort(key=get_second_element) # call the sort method with the key parameter

In [34]:
names # print sorted list

[('Dan', 6, 4, 3), ('Alice', 7, 8, 9), ('Weigu', 7, 5, 3), ('Bob', 8, 4, 9)]

It can be seen that now the rows are sorted by the second column (the first score): `Dan` has the lowest (6), `Bob` has the highest (8), and `Alice` and `Weigu` have the same (7).

A natural question arises: how are the lines corresponding to `Alice` and `Weigu` ordered in this case? Answer: in the order in which they appeared in the original list. This is handy if you want to sort first by one parameter and then by another: just sort sequentially, first by the *second* parameter, and then by the first.

In order not to define a function of type `get_second_element` each time, you can use a ready-made one: for this, you need to import the special `itemgetter` function:

In [35]:
# import specific function from module
from operator import itemgetter

In [36]:
sorted(names, key=itemgetter(2)) # sort by third column and print a result

[('Dan', 6, 4, 3), ('Bob', 8, 4, 9), ('Weigu', 7, 5, 3), ('Alice', 7, 8, 9)]

In [37]:
sorted(names, key=itemgetter(3)) # sort by fourth column and print a result

[('Dan', 6, 4, 3), ('Weigu', 7, 5, 3), ('Alice', 7, 8, 9), ('Bob', 8, 4, 9)]

Let's say we want to sort by the third column, and if the third column gives the same rating, then alphabetically. This can be done like this: *first* we sort alphabetically, and *then* - by the third column.

In [38]:
print(names) # print the list
names.sort(key=itemgetter(0)) # call the sort method with the key parameter
print(names) # print sorted by first column list
names.sort(key=itemgetter(2)) # call the sort method with the key parameter
print(names) # print sorted by third column list

[('Dan', 6, 4, 3), ('Alice', 7, 8, 9), ('Weigu', 7, 5, 3), ('Bob', 8, 4, 9)]
[('Alice', 7, 8, 9), ('Bob', 8, 4, 9), ('Dan', 6, 4, 3), ('Weigu', 7, 5, 3)]
[('Bob', 8, 4, 9), ('Dan', 6, 4, 3), ('Weigu', 7, 5, 3), ('Alice', 7, 8, 9)]


More details about sorting can be found in the [official tutorial](https://docs.python.org/3/howto/sorting.html) and we'll move on to the next topic.

### String formating

Often you need to insert the value of some variables into some line. An example that we have already met.

In [39]:
# create string and integer variables
name = "Alice"
grade = 5
print("Student", name, "has grade", grade) # print those variables

Student Alice has grade 5


Using `print()`, you can print such a string, but if we wanted to pass it to some other function, then we would have to come up with something else. And this is already something else

There are two common ways to substitute the value of variables into a string (this is often called *interpolation*, although it has nothing to do with the mathematical operation of the same name). The first way is more classic.

In [40]:
new_str = "Student %s has grade %i" % (name, grade) # format the string by using placeholders %
print(new_str) # print the result of formatting

Student Alice has grade 5


The `%` operator is used here, which performs the following operation for strings: it takes a string to the left of it, finds all the *“placeholders”* there - in this case it is `%s` and `%i`, after which it takes the variables listed to the right of it (it can be one variable or a tuple of several variables, as in this case) and substitutes them sequentially - the first variable in place of the first placeholder, the second in place of the second, etc.

The chars in placeholders indicate the type of the variable: in this case, `%s` is a string and `%i` is an integer. Here are some more examples:

In [41]:
print("The number is %i" % 2.3) # print the formatted string (i means integer)
print("The number is %f" % 2) # print the formatted string (f means float)
print("The number is %.2f" % 2.1393) # print the formatted string (two chars after dot)
print("The number is %04i" % 3) # print the formatted string (padded to four digits with zeros)

The number is 2
The number is 2.000000
The number is 2.14
The number is 0003


When using the `%` operator, you need to be careful: it takes precedence over arithmetic operations, so you can get unexpected results if you do not include parentheses:

In [42]:
print("a = %i" % 3*3) # print formatted string (unexpected result)

a = 3a = 3a = 3


Here the following happened: first, the code `"a = %i" % 3` was executed, and then the result was multiplied by 3 (which is equivalent to repeating three times for strings). If you wanted to substitute the result of 3 * 3 execution, then you had to do this:

In [43]:
print("a = %i" % (3*3)) # print formatted string (expected result)

a = 9


The second way to format (the "new") is to use the `format()` method. It works like this:

In [44]:
"hello, {0}, this is {1}, again {0}, {var}".format(7, 9, var="test") # print formatted string by using format() method

'hello, 7, this is 9, again 7, test'

There is no need to explicitly specify data types here (the string representation of the variable is substituted). The same value can be used several times (they can be accessed by numbers and names. However, you can not explicitly specify the numbers - then the variables will be substituted in turn:

In [45]:
"Fist var: {}, the second one: {}".format(8, 1) # print formatted string by using format() method without argument numbers

'Fist var: 8, the second one: 1'

Formatting can be quite complex and no one can remember all the details. Good documentation on this topic (both on the `%` operator and on the `format()` method) is collected [here](https://pyformat.info/).

### Tricks with real numbers

By the way:

In [46]:
print("%f" % (0.1+0.2)) # print formatted string

0.300000


It seems to be nothing unexpected, but let's increase the accuracy ...

In [47]:
print("%.18f" % (0.1+0.2)) # print formatted string (accuracy were increased)

0.300000000000000044


When we asked to display the result with an accuracy of $18$ decimal places, strange significant figures at the end came from somewhere. This is due to the fact that computers use binary number system, and in it numbers like $0.1$ are written as an *infinite* periodic fraction and cannot be represented as a finite fraction. During arithmetic operations, rounding errors occur, which lead to such effects.

Sometimes these effects become dangerous. Do you think `0.1 + 0.2` is `0.3`? Your computer has a different opinion on this matter:

In [48]:
0.1 + 0.2 == 0.3 # unexpected result

False

However, don't despair: you can use common fractions or the special `decimal` module to work with decimals.

In [49]:
# import specific class from module
from fractions import Fraction

In [50]:
Fraction(1, 10) + Fraction(2, 10) # print a sum of two fractions

Fraction(3, 10)

In [51]:
Fraction(1, 3) + Fraction(1,2) # print a sum of two another fractions

Fraction(5, 6)

In [52]:
# import specific class from module
from decimal import Decimal

In [53]:
Decimal("0.1") + Decimal("0.2") # print a sum of two decimal numbers by using Decimal class

Decimal('0.3')

In [54]:
Decimal("0.1") + Decimal("0.2") == Decimal("0.3") # compare two decimal numbers

True

You can read more about decimal and binary fractions in [the official documentation](https://docs.python.org/3/tutorial/floatingpoint.html#tut-fp-issues).