## Data Analysis in Python
_Author: Ioann Dovgopoliy_

## Seminar 5

### Seminar outline

* Multiple assignment
* Split & join
* For loop & list comprehension
* While loop
* Map & reduce
* Summary
* Practice

### Multiple assignment
Multiple assignment is the way to define a few variables simultaneously. For instance, you want to define `a = 5` and `b = 2`. How can you do it? In traditional way:

In [16]:
a = 5
b = 2

But another option is possible:

In [1]:
a, b = 5, 2 # you can specify any number of variables

Why is it useful? Why do not to use a traditional way? Imagine you have a list `some_list = [5, 2]`. You would like to store the first element as variable `a` and the second as variable `b`. Possible way to do it:

In [2]:
some_list = [5, 2] # define list

a = some_list[0] # first element
b = some_list[1] # second element

But it can be shorter and more elegant using multiple assignment! Let's see:

In [3]:
some_list = [5, 2]

a, b = some_list # the way to "unpack" your list (other collections can be unpacked too)

You can change elements of your list simultaneously just like that. For instance, you want to swap first and last elements of the list. Traditional way is rather bulky:

In [14]:
another_list = [5, 1, 0, 8] 

first_elem = another_list[0] # define variable for the first element
last_elem = another_list[-1] # define variable for the last element

another_list[0] = last_elem # reassign the first element to the last element
another_list[-1] = first_elem # reassign the last element to the first element

print(another_list)

[8, 1, 0, 5]


Multiple assignment is an optimal resolution:

In [15]:
another_list = [5, 1, 0, 8]

another_list[0], another_list[-1] = another_list[-1], another_list[0] # just swap them

print(another_list)

[8, 1, 0, 5]


### Split & join
**Issue.** What ways to create list do you remember?

How it can be made from a string using `list()` function:

In [18]:
future_list = 'I want to become a list.'

happy_list = list(future_list)

print(happy_list)

['I', ' ', 'w', 'a', 'n', 't', ' ', 't', 'o', ' ', 'b', 'e', 'c', 'o', 'm', 'e', ' ', 'a', ' ', 'l', 'i', 's', 't', '.']


In this case the list is splitted by _each character_. But can we specify the split, e. g., for example, split the sentence by its words? Of course, we can. We just need to remember how the words in a sentence are divided (spoiler: using whitespaces) and use the nice `.split()` method (it is a **string**, not list method):

In [20]:
future_list.split(' ') # specify separator as an argument (by default it is whitespace)

['I', 'want', 'to', 'become', 'a', 'list.']

In [21]:
future_list.split() # string becomes a list

['I', 'want', 'to', 'become', 'a', 'list.']

If we are able to travel from strings to list, is it possible to go reversely? Here the `.join()` method can help us.

**Notice.** Although `.join()` allows us to go from the list to a string, it is still a **string**, not list method (as well as `.split()`).

Imagine we have some list of persons and want to make it the string where these persons will be enlisted while separated by comma:

In [22]:
persons = ['Igor', 'Georgy', 'Ioann']

**Issue.** We want to get `'Igor, Georgy, Ioann'` from `['Igor', 'Georgy', 'Ioann']`. What 'unificator' should we use?

In [71]:
', '.join(persons) # here we apply join to the string 'unificator' (', ') and pass the list as an argument

'Igor, Georgy, Ioann'

### For loop & list comprehension
For loops are basics of Python. Suppose we have `numbers` list storing some numbers. We want to multiply all the elements of the list by 2. Of course, it can be done like that (or use multiple assignment):

In [27]:
numbers = [1, 2, 3, 4, 5]

numbers[0] = numbers[0] * 2
numbers[1] = numbers[1] * 2
numbers[2] = numbers[2] * 2
numbers[3] = numbers[3] * 2
numbers[4] = numbers[4] * 2

print(numbers) # did not you find it cumbersome?

[2, 4, 6, 8, 10]


The code above is overladen and awkward. We wish we had some instrument able to iterate (go) through the list and to make some identical operation with every element. In this case `for` loop come to us. The syntax is as follows:

In [None]:
for element in sequence: # pseudocode
    some_function(element) # do not forget about indentation (just like in conditional statements)

In our case:

In [29]:
numbers = [1, 2, 3, 4, 5]

for element in numbers:
    print(element * 2) # iterate (go) through numbers, take any item, name it element and proceed multiplication by 2

2
4
6
8
10


**Notice.** You can replace `element` with any name you want. It is just an alias to `<each element>` in `for <each element> in <sequnce>`. Just keep the chosen name in part of the loop after `:`:

In [30]:
numbers = [1, 2, 3, 4, 5]

for num in numbers: # here 'num'
    print(num * 2) # and here 'num'

2
4
6
8
10


The cell above is identical to the first one. Just to clarify:

```
for <each_elem_name> in <sequence>:
    do_something(<each_elem_name>)
```

In translation from Pythonic to human it means: go through `<sequnce>`, take the next in turn element (call it `<each_elem_name>`), `do_something` with it; repeat it for any element in the `<sequnce>`.

Let us return to the idea of creating list with elements from `numbers` list multiplied by 2 (in previous cells we have just printed multiplied numbers). We can obtain this as follows:

In [72]:
numbers = [1, 2, 3, 4, 5]
doubled_numbers = [] # pre-define an empty list

for num in numbers: # here 'num'
    doubled_numbers.append(num * 2) # at each step append multiplied value to doubled_numbers

print(doubled_numbers)

[2, 4, 6, 8, 10]


Of course, `for` loop can also be just to repeat some operation N times. In this case the syntax is just like that:

```
for <num> in <sequence>:
    do_something() # notice that <num> does not occur here
```

Here we just want to repeat `do_something()` N times where N is a length of the `<sequence>`. We do not need to create the sequence any time we want just to repeat some operation:

In [73]:
for i in range(6):
    print('nonsense print')

nonsense print
nonsense print
nonsense print
nonsense print
nonsense print
nonsense print


Let's get familiar with the `range()` function. The function has three arguments: `range(<start>, <stop>, <step>)` (what does look like?). All indexing rules can be applied here. But you are not to specify all the arguments. There are several options:

* `range(<value>)` is `range(<stop>)`;
* `range(<value1>, <value2>)` is `range(<start>, <stop>)`;
* `range(<value1>, <value2>, <value3>)` is `range(<start>, <stop>, <step>)`.

`<stop>` argument has not default values: you must to specify it. `<start>` value is 0 by default (if you not specify it). `step` is by default 1. Example:

In [78]:
print(list(range(10))) # values from 0 (included) to 9 (10 is excluded) with the step 1
print(list(range(5, 10))) # values from 5 (included) to 9 (10 is excluded) with the step 1
print(list(range(2, 10, 2))) # values from 2 (included) to 8 (10 is excluded) with the step 2

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[5, 6, 7, 8, 9]
[2, 4, 6, 8]


You should wrap `list()` around your `range()` if you want to get a list because `range()` returns specific `range`-class object:

In [81]:
numbers = range(1, 6)
print(numbers) # specific range object (should be converted to a list)
print(type(numbers))

range(1, 6)
<class 'range'>


But you are not forced to make the `list` from `range` to use it in `for` loop:

In [83]:
for i in list(range(6)):
    print(i)

0
1
2
3
4
5


Is identical to:

In [87]:
for i in range(6):
    print(i)

0
1
2
3
4
5


Some more examples:

In [86]:
print(list(range(6)))
print(list(range(1, 10, 2)))
print(list(range(10, 5, -1)))
print(list(range(10, -10, -2)))

[0, 1, 2, 3, 4, 5]
[1, 3, 5, 7, 9]
[10, 9, 8, 7, 6]
[10, 8, 6, 4, 2, 0, -2, -4, -6, -8]


We can add some conditions inside the `for` loop. For instance, multiply by 2 only if the number is even:

In [93]:
numbers = [1, 2, 3, 4, 5]
doubled_numbers = [] # pre-define an empty list

for num in numbers: # here 'num'
    if num % 2 == 0:
        doubled_numbers.append(num * 2) # at each step append multiplied value to doubled_numbers
    else:
        print(f'{num} is not even, I cannot multiply it by 2:(')

print(f'\nThe final filtered list is {doubled_numbers}.')

1 is not even, I cannot multiply it by 2:(
3 is not even, I cannot multiply it by 2:(
5 is not even, I cannot multiply it by 2:(

The final filtered list is [4, 8].


Let's again return to the multiplied `numbers` list. Our `for`-solution is nice (unlike the initial):

In [94]:
numbers = [1, 2, 3, 4, 5]
doubled_numbers = [] # pre-define an empty list

for num in numbers: # here 'num'
    doubled_numbers.append(num * 2) # at each step append multiplied value to doubled_numbers

print(doubled_numbers)

[2, 4, 6, 8, 10]


Python would not be Python if there was not the way to do it shorter. Here this way is to use **list comprehension**. The syntax is something like a child of list and `for`-loop:

`[do_something(<elem>) for <elem> in <sequence>]`

Let's apply it to our current task:

In [96]:
numbers = [1, 2, 3, 4, 5]

doubled_numbers = [i * 2 for i in numbers]

print(doubled_numbers)

[2, 4, 6, 8, 10]


You can even insert conditional statements here:

`[do_something(<elem>) for <elem> in <sequence> if <elem> satisfies some <condition>]`

In practice:

In [97]:
numbers = [1, 2, 3, 4, 5]

doubled_numbers = [i * 2 for i in numbers if i % 2 == 0]

print(doubled_numbers)

[4, 8]


**Notice.**

List comprehensions have several advantages in comparison with the traditional `for`-loops:

* code is more elegant;
* and shorter;
* list comprehensions are slightly faster than loops (you will see the benchmarking further).

However:

* sometimes list comprehensions are harder to read;
* you are not able to specify complex conditions and operations in list comprehensions (but you can overcome this defining necessary functions earlier).

### While loop
`while`-loop is the second possible type of loop in Python. Imagine you are trying to ask the user to insert his decision in binary form (`1` for `yes` or `0` for `no`). In this case you are not able to interpret any other symbols in the further code. But how to force the user to do what you want from him and not to admit the error? Of course, using `while`-loop. In the simplest way the syntax is as follows:

```
while <condtion> is not met:
    do_something()
```

`do_something()` will be done iteratively till the condition is not met. Let's try to solve out task using Python:

In [102]:
decision = input('Please, type 1 if you agree and 0 otherwise: ') # ask for the first time
print()

while decision not in ['0', '1']: # in case the user is foolish
    decision = input(f'You should type 0 or 1, not {decision}: ') # politely ask to input correct value

print('\nIf you see this print, we managed to exit the loop. Congratulations!')

Please, type 1 if you agree and 0 otherwise: d

You should type 0 or 1, not d: d
You should type 0 or 1, not d: d
You should type 0 or 1, not d: d
You should type 0 or 1, not d: d
You should type 0 or 1, not d: d
You should type 0 or 1, not d: d
You should type 0 or 1, not d: 5
You should type 0 or 1, not 5: 1

If you see this print, we managed to exit the loop. Congratulations!


Another example. For instance, you are going through the list and dividing `8` by each value of the list. If you remember Math, you know that `0` value is very dangerous for us. So, if we meet `0`, we should stop:

In [52]:
dangerous_list = list(range(10, -1, -1))
dangerous_list

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [108]:
current_index = 0 # begin with 0

while dangerous_list[current_index] != 0: # while the value for the according index is not 0
    print(f'Let us divide 8 by {dangerous_list[current_index]}, got {8 / dangerous_list[current_index]}.')
    current_index = current_index + 1 # increase current_index by 1 to go to the next element

Let us divide 8 by 10, got 0.8.
Let us divide 8 by 9, got 0.8888888888888888.
Let us divide 8 by 8, got 1.0.
Let us divide 8 by 7, got 1.1428571428571428.
Let us divide 8 by 6, got 1.3333333333333333.
Let us divide 8 by 5, got 1.6.
Let us divide 8 by 4, got 2.0.
Let us divide 8 by 3, got 2.6666666666666665.
Let us divide 8 by 2, got 4.0.
Let us divide 8 by 1, got 8.0.


### Map (extra part)
`map` function is the another specific way to proceed some operation with each value of the sequence. Suppose we have `letters` list and want to make each letter uppercase. Two first ways to do it are familiar to you:

In [110]:
letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = []

for l in letters:
    uppered_letters.append(l.upper()) # for loop way

print(uppered_letters)

['A', 'B', 'C', 'D', 'E']


In [111]:
letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = [i.upper() for i in letters] # list comprehension way

print(uppered_letters)

['A', 'B', 'C', 'D', 'E']


The third path (path of the true Jedi) is to use `map()` function. Its syntax:

`map(<some_function>, <sequence>)`

Here `<some_function>` will be applied to each element of the `<sequence>`. You can specify the function manually or use anonimous function:

`lambda <value>: do_something(<value>)`

It means: take `<value>` and `do_something()` with it.

Application to our case:

In [115]:
letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = list(map(lambda x: x.upper(), letters)) # function: take x (element of the sequence) and upper it

print(uppered_letters)

['A', 'B', 'C', 'D', 'E']


Again, you have to wrap `list()` around `map()` because `map()` returns specific `map`-class object:

In [116]:
letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = map(lambda x: x.upper(), letters) # needs to be converted to list

print(uppered_letters)
print(type(uppered_letters))

<map object at 0x0000022EAAE3A3D0>
<class 'map'>


`map` is even faster than list comprehension (but only applying to large sequences). Compare three variants with the small sequence (of length 5):

In [117]:
%%timeit

letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = []

for l in letters: # here 'num'
    uppered_letters.append(l.upper())

497 ns ± 30.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [118]:
%%timeit

letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = [i.upper() for i in letters]

515 ns ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [119]:
%%timeit

letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = list(map(lambda x: x.upper(), letters)) # here map is not efficient

796 ns ± 46.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Try with large sequence (of length 100000):

In [120]:
%%timeit

numbers = list(range(100000))
doubled_numbers = []

for i in numbers:
    doubled_numbers.append(i * 2)

9.34 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [121]:
%%timeit

numbers = list(range(100000))
doubled_numbers = [i * 2 for i in numbers]

6.11 ms ± 459 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [123]:
%%timeit

numbers = list(range(100000))
doubled_numbers = map(lambda x: x * 2, numbers) # map is the most appropriate option

1.31 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### Summary

* multiple assignment is a very powerful way of unpacking collections and simultaneous assignment;
* you can go from the string to the list using `.split()` method and `.join()` to do reversely;
* `for`-loop allows you to apply any operation to each element of the sequence;
* list comprehension is more elegant and shorter than `for`-loop;
* `while`-loop is helpful when you want to repeat some operation until requered condition is not met;
* `map()` function is much more efficient than loops and list comprehensions with large sequences.

### Practice

#### Task 1


A string consisting of Russian letters of different cases is given. Clear all the capital letters.


In [None]:
letters = 'ЫгВЫоЯСремДШНККАыкЩЙФа'
#your code here

#### Task 2

List of lists is wideapsread extension of basic lists in Python since it allows to reprsent two-demiensional data efficiently. Write a program that prints each element of the list of lists separately. Use ```for``` loop.

In [None]:
list_of_lists = [[Marie, 17, Paris], [Daniel, 3, London], [Alina, 7, Moscow]]
#your code here

#### Task 3

The sequence consists of natural numbers and ends with the number 0. Determine the value of the largest element of the sequence.

In [None]:
#your code here