## Data Analysis in Python
_Author: Ioann Dovgopoliy_

## Seminar 5

### Seminar outline

* Multiple assignment
* Split & join
* For loop & list comprehension, range and enumerate
* While loop
* Map & reduce
* Summary
* Practice

### Multiple assignment
Multiple assignment is the way to define a few variables simultaneously. For instance, you want to define `a = 5` and `b = 2`. How can you do it? In traditional way:

In [None]:
a = 5
b = 2

But another option is possible:

In [None]:
a, b = 5, 2 # you can specify any number of variables

Why is it useful? Why do not to use a traditional way? Imagine you have a list `some_list = [5, 2]`. You would like to store the first element as variable `a` and the second as variable `b`. Possible way to do it:

In [None]:
some_list = [5, 2] # define list

a = some_list[0] # first element
b = some_list[1] # second element

But it can be shorter and more elegant using multiple assignment! Let's see:

In [None]:
some_list = [5, 2]

a, b = some_list # the way to "unpack" your list (other collections can be unpacked too)

In [None]:
a, b

You can change elements of your list simultaneously just like that. For instance, you want to swap first and last elements of the list. Traditional way is rather bulky:

In [None]:
another_list = [5, 1, 0, 8]

first_elem = another_list[0] # define variable for the first element
last_elem = another_list[-1] # define variable for the last element

another_list[0] = last_elem # reassign the first element to the last element
another_list[-1] = first_elem # reassign the last element to the first element

print(another_list)

Multiple assignment is an optimal resolution:

In [None]:
another_list = [5, 1, 0, 8]

another_list[0], another_list[-1] = another_list[-1], another_list[0] # just swap them

print(another_list)

In [None]:
small_list = [5, 1, 2]

small_list[1], small_list[2] = small_list[2], small_list[1]

small_list

### Split & join
**Issue.** What ways to create list do you remember?

How it can be made from a string using `list()` function:

In [None]:
future_list = 'I want to become a list.'

happy_list = list(future_list)

print(happy_list)

In this case the string is splitted by _each character_. But can we specify the split, e. g., for example, split the sentence by its words? Of course, we can. We just need to remember how the words in a sentence are divided (spoiler: using whitespaces) and use the nice `.split()` method (it is a **string**, not list method):

In [None]:
future_list.split(' ') # specify separator as an argument (by default it is whitespace)

In [None]:
future_list.split('o')

In [None]:
future_list.split() # string becomes a list

If we are able to travel from strings to list, is it possible to go reversely? Here the `.join()` method can help us.

**Notice.** Although `.join()` allows us to go from the list to a string, it is still a **string**, not list method (as well as `.split()`).

Imagine we have some list of persons and want to make it the string where these persons will be enlisted while separated by comma:

In [None]:
persons = ['Igor', 'Georgy', 'Ioann']

**Issue.** We want to get `'Igor, Georgy, Ioann'` from `['Igor', 'Georgy', 'Ioann']`. What 'unificator' should we use?

In [None]:
', '.join(persons) # here we apply join to the string 'unificator' (', ') and pass the list as an argument

In [None]:
print('\n'.join(persons))

### For loop & list comprehension
For loops are basics of Python. Suppose we have `numbers` list storing some numbers. We want to multiply all the elements of the list by 2. Of course, it can be done like that (or use multiple assignment):

In [None]:
numbers = [1, 2, 3, 4, 5]

numbers[0] = numbers[0] * 2
numbers[1] = numbers[1] * 2
numbers[2] = numbers[2] * 2
numbers[3] = numbers[3] * 2
numbers[4] = numbers[4] * 2

print(numbers) # did not you find it cumbersome?

The code above is overladen and awkward. We wish we had some instrument able to iterate (go) through the list and to make some identical operation with every element. In this case `for` loop come to us. The syntax is as follows:

In [None]:
for element in sequence: # pseudocode
    some_function(element) # do not forget about indentation (just like in conditional statements)

In our case:

In [None]:
numbers = [1, 2, 3, 4, 5]

for element in numbers:
    print(element * 2) # iterate (go) through numbers, take each item, name it element and proceed multiplication by 2

**Notice.** You can replace `element` with any name you want. It is just an alias to `<each element>` in `for <each element> in <sequnce>`. Just keep the chosen name in part of the loop after `:`:

In [None]:
numbers = [1, 2, 3, 4, 5]

for num in numbers: # here 'num'
    print(num * 2) # and here 'num'

In [None]:
numbers = [1, 2, 3, 4, 5]

for num in numbers:
    print(element * 2)

The cell above is identical to the first one. Just to clarify:

```
for <each_elem_name> in <sequence>:
    do_something(<each_elem_name>)
```

In translation from Pythonic to human it means: go through `<sequnce>`, take the next in turn element (call it `<each_elem_name>`), `do_something` with it; repeat it for any element in the `<sequence>`.

Let us return to the idea of creating list with elements from `numbers` list multiplied by 2 (in previous cells we have just printed multiplied numbers). We can obtain this as follows:

In [None]:
numbers = [1, 2, 3, 4, 5]
doubled_numbers = [] # pre-define an empty list

for num in numbers: # here 'num'
    doubled_num = num * 2
    doubled_numbers.append(doubled_num) # at each step append multiplied value to doubled_numbers
    print(doubled_numbers) # show the list at each iteration

# print(doubled_numbers)

Of course, `for` loop can also be just to repeat some operation N times. In this case the syntax is just like that:

```
for <num> in <sequence>:
    do_something() # notice that <num> does not occur here
```

Here we just want to repeat `do_something()` N times where N is a length of the `<sequence>`. We do not need to create the sequence any time we want just to repeat some operation:

In [None]:
for i in numbers:
    print('nonsense print')

Let's get familiar with the `range()` function. The function has three arguments: `range(<start>, <stop>, <step>)` (what does look like?). All indexing rules can be applied here. But you are not to specify all the arguments. There are several options:

* `range(<value>)` is `range(<stop>)`;
* `range(<value1>, <value2>)` is `range(<start>, <stop>)`;
* `range(<value1>, <value2>, <value3>)` is `range(<start>, <stop>, <step>)`.

`<stop>` argument has not default values: you must to specify it. `<start>` value is 0 by default (if you not specify it). `step` is by default 1. Example:

In [None]:
print(list(range(10))) # values from 0 (included) to 9 (10 is excluded) with the step 1
print(list(range(5, 10))) # values from 5 (included) to 9 (10 is excluded) with the step 1
print(list(range(2, 10, 2))) # values from 2 (included) to 8 (10 is excluded) with the step 2

You should wrap `list()` around your `range()` if you want to get a list because `range()` returns specific `range`-class object:

In [None]:
numbers = range(5, 10)
print(numbers) # specific range object (should be converted to a list)
print(type(numbers))

In [None]:
numbers = list(range(5, 10))
numbers

But you are not forced to make the `list` from `range` to use it in `for` loop:

In [None]:
for i in range(6):
    print(i)

In [None]:
for i in list(range(6)):
    print(i)

Is identical to:

In [None]:
for i in range(6):
    print(i)

Some more examples:

In [None]:
print(list(range(6)))
print(list(range(1, 10, 2)))
print(list(range(10, 5, -1)))
print(list(range(10, -10, -2)))

We can add some conditions inside the `for` loop. For instance, multiply by 2 only if the number is even:

In [None]:
numbers = [1, 2, 3, 4, 5]
doubled_numbers = [] # pre-define an empty list

for num in numbers: # here 'num'
    if num % 2 == 0:
        doubled_numbers.append(num * 2) # at each step append multiplied value to doubled_numbers
    else:
        print(f'{num} is not even, I cannot multiply it by 2:(')

print(f'\nThe final filtered list is {doubled_numbers}.')

In [None]:
numbers2 = [6, 8, 1, 0, 9]
filtered_numbers = [] # pre-define an empty list

for num in numbers2: # here 'num'
    if num % 4 == 0:
        filtered_numbers.append(num)

filtered_numbers

Let's again return to the multiplied `numbers` list. Our `for`-solution is nice (unlike the initial):

In [None]:
numbers = [1, 2, 3, 4, 5]
doubled_numbers = [] # pre-define an empty list

for num in numbers: # here 'num'
    doubled_numbers.append(num * 2) # at each step append multiplied value to doubled_numbers

print(doubled_numbers)

Python would not be Python if there was not the way to do it shorter. Here this way is to use **list comprehension**. The syntax is something like a child of list and `for`-loop:

`[do_something(<elem>) for <elem> in <sequence>]`

Let's apply it to our current task:

In [None]:
numbers = [1, 2, 3, 4, 5]

doubled_numbers = [i * 2 for i in numbers]

print(doubled_numbers)

In [None]:
words = ['key', 'sun', 'hand']
new_words = []

for word in words:
    new_words.append(word.upper())

new_words

In [None]:
words = ['key', 'sun', 'hand']

new_words = [word.upper() for word in words]

new_words

You can even insert conditional statements here:

`[do_something(<elem>) for <elem> in <sequence> if <elem> satisfies some <condition>]`

In practice:

In [None]:
numbers = [1, 2, 3, 4, 5]

doubled_numbers = [i * 2 for i in numbers if i % 2 == 0]

print(doubled_numbers)

In [None]:
numbers2 = [6, 8, 1, 0, 9]
filtered_numbers = [] # pre-define an empty list

for num in numbers2: # here 'num'
    if num % 4 == 0:
        filtered_numbers.append(num)

filtered_numbers

In [None]:
numbers2 = [6, 8, 1, 0, 9]

filtered_numbers = [i for i in numbers2 if i % 4 == 0]

filtered_numbers

**Notice.**

List comprehensions have several advantages in comparison with the traditional `for`-loops:

* code is more elegant;
* and shorter;
* list comprehensions are slightly faster than loops (you will see the benchmarking further).

However:

* sometimes list comprehensions are harder to read;
* you are not able to specify complex conditions and operations in list comprehensions (but you can overcome this defining necessary functions earlier).

Let's cover another function which, along with the `range()` function, can be very useful for us. Assume we want to iterate through the list and print each index and its element. It can be done as follows:

In [None]:
nice_list = [5, 9, 3, 0, 4, 7]

list(range(len(nice_list)))

In [None]:
len(nice_list)

In [None]:
nice_list = [5, 9, 3, 0, 4, 7]
indexes = range(len(nice_list))

for i in indexes:
    print(f'Element with index {i} is {nice_list[i]}.')

There is the way to go without specifying `indexes` variable. Of course, with the help of `enumerate()` function. What it does? It is very simple. Assume we have a list `nice_list = [5, 9, 3, 0, 4, 7]`. `enumerate` will take each element of it and create a pair of it and its index in form of `(index, value)`: `[(0, 5), (1, 9), (2, 3), (3, 0), (4, 4), (5, 7)]` (by the way, it is a list of tuples). See:

In [None]:
list(enumerate(nice_list)) # again we need to convert it to list

Now it is possible to apply enumerate to resolve the initial task:

In [None]:
nice_list = [5, 9, 3, 0, 4, 7]

for index, value in enumerate(nice_list): # as we have pair of index and value, we "unpack" them accordingly (remind you about
    # multiple assignment)
    print(f'Element with index {index} is {value}.')

We can observe the pairs explicitly:

In [None]:
nice_list = [5, 9, 3, 0, 4, 7]

for pair in enumerate(nice_list): # as you see, we need to "unpack" the pair
    print(f'Our current pair is {pair}.')

In [3]:
1 / 31 * 60

1.935483870967742

### While loop
`while`-loop is the second possible type of loop in Python. In the simplest way the syntax is as follows:

```
while <condition> is not met:
    do_something()
```

`do_something()` will be executed iteratively till the condition is not met.

Imagine, you with your friend Nicholas come to the decision that you are going to meet near the bar at 7 pm. You come to the bar and see many people. It won't be easy to find Nicholas here. In this case, you, possibly, start to inspect each person one by one till you recognize Nicholas. Let's translate you potential actions to the code:

In [None]:
visitors = ['Andrew', 'Mary', 'Michael', 'Margaret', 'Alex', 'Nicholas', 'Ellen', 'Sophie', 'Max'] # all people near
# the bar

Suppose that you inspect persons in the order as in the `visitors` list. When you notice Nicholas, you stop searching as Nicholas is your only aim. How it will be in Python:

In [None]:
index = 0 # start with the first person (index 0)

while visitors[index] != 'Nicholas': # resume until our person is not Nicholas (or list ends)
    print(f'Shit, it is not Nicholas. This {index + 1}-th person is {visitors[index]}.') # notice order and name
    index += 1 # index += 1 equals index = index + 1 (increase index by 1 to go to the next person)

print()
print('Wonderful, here he is! Finally! It is time to go to the bar now.')

In [None]:
a = 5

a = a + 2
a

In [None]:
a += 2

What enbarrasments could we meet using `while` loop? You could be caught by the *infinite cycle*. It can occur when the condition specified for `while` will never be met. For instance, we forget to increase our `index` by one on each iteration. In this case, we do not move further through the list, and it is not possible to find Nicholas anywhen:

In [None]:
index = 0 # start with the first person (index 0)

while visitors[index] != 'Nicholas': # resume until our person is not Nicholas (or list ends)
    print(f'Shit, it is not Nicholas. This {index + 1}-th person is {visitors[index]}.') # notice order and name

print()
print('Wonderful, here he is! Finally! It is time to go to the bar now.') # we will never come here is index will always be
# 1, e. g. our eternal person here is Andrew

You have to avoid such situations. To do it, before using `while` always formulate the exact condition for stop and check that the condition can be met even anywhen in your code.

**Example.** Imagine you are trying to ask the user to insert his decision in binary form (`1` for `yes` or `0` for `no`). In this case you are not able to interpret any other symbols in the further code. But how to force the user to do what you want from him and not to admit the error? Of course, using `while`-loop:

In [1]:
decision = input('Please, type 1 if you agree and 0 otherwise: ') # ask for the first time
print()

while decision not in ['0', '1']: # in case the user is foolish
    decision = input(f'You should type 0 or 1, not {decision}: ') # politely ask to input correct value

print('\nIf you see this print, we managed to exit the loop. Congratulations!')

Please, type 1 if you agree and 0 otherwise: 9

You should type 0 or 1, not 9: sh
You should type 0 or 1, not sh: s
You should type 0 or 1, not s: f
You should type 0 or 1, not f: d
You should type 0 or 1, not d: d
You should type 0 or 1, not d: 1

If you see this print, we managed to exit the loop. Congratulations!


**Another example**. For instance, you are going through the list and dividing `8` by each value of the list. If you remember Math, you know that `0` value is very dangerous for us. So, if we meet `0`, we should stop:

In [None]:
dangerous_list = list(range(10, -1, -1))
dangerous_list

In [None]:
current_index = 0 # begin with 0

while dangerous_list[current_index] != 0: # while the value for the according index is not 0
    print(f'Let us divide 8 by {dangerous_list[current_index]}, got {8 / dangerous_list[current_index]}.')
    current_index = current_index + 1 # increase current_index by 1 to go to the next element

`while`-loop syntax can be extended this way:

```
while <condition> is not met:
    do_something()
else:
    do_another()
```

`do_another()` will be executed when the condition is met, e. g. exiting loop. So, the cell above can be rewrited as follows:

In [None]:
decision = input('Please, type 1 if you agree and 0 otherwise: ') # ask for the first time
print()

while decision not in ['0', '1']: # in case the user is foolish
    decision = input(f'You should type 0 or 1, not {decision}: ') # politely ask to input correct value
else:
    print('\nIf you see this print, we managed to exit the loop. Congratulations!')

You probably mind why do we need `else` if `do_another()` action can be specified simply after the loop. It makes sense in combination with the `break` instruction. After the execution of `break` instruction, loop is terminated completely, and `else` part will not be performed. Recall our task about Nicholas. Perhaps, during searching you meet your friend, Margaret. You are so happy that you even forget for some time about Nicholas and simple interrupt looking-for. Let me show it in the code:

In [None]:
index = 0 # start with the first person (index 0)

while visitors[index] != 'Nicholas': # resume until our person is not Nicholas (or list ends)
    if visitors[index] == 'Margaret': # specify condition in case of Margaret encounter
        print('Wow, it is Margaret! I have not seen her for a so long time... I have to go and speak to her.')
        break # excurse from the search and LEAVE THE CYCLE (if Margaret is met)
    else: # do not meet either Nicholas or Margaret; resume to search
        print(f'Shit, it is not Nicholas. This {index + 1}-th person is {visitors[index]}.') # notice order and name
        index += 1 # index += 1 equals index = index + 1 (increase index by 1 to go to the next person)
else: # we will never reach this 'else' point because insidious Margaret managed to meet us before the Nicholas
    print()
    print('Wonderful, here he is! Finally! It is time to go to the bar now.')

When using `break` in `while`-loop, the syntax may be like:

```
while <condition> is not met:
    if <another condition> is met:
        break # since then, NO iterations will be performed as you BREAK the cycle
else:
    do_another()
```

It is worth to also cover the `continue` instruction. After the `continue` instruction is executed, the next cycle iteration begins, irrelative of further instructions. Let's change our situation slightly. Imagine you have quarreled with Margaret recently, and now you are trying not to think about her. Consequently, if you see her, you will not think 'Shit, it is not Nicholas...', but just glance to the next person. Here the `continue` instruction helps us:

In [None]:
index = 0 # start with the first person (index 0)

while visitors[index] != 'Nicholas': # resume until our person is not Nicholas (or list ends)
    if visitors[index] == 'Margaret': # specify condition in case of Margaret encounter
        index += 1 # go further at once
        print('*it is Margaret, but we are offended, we do not think about here*')
        continue # see Margareth, but DO NOT THINK ABOUT HER (offended)
    print(f'Shit, it is not Nicholas. This {index + 1}-th person is {visitors[index]}.') # omit this (thanks to continue)
    index += 1 # omit this (thanks to continue)
else:
    print()
    print('Wonderful, here he is! Finally! It is time to go to the bar now.')

**Notice.** We omitted Margaret (her number is 4).

When using `continue` in `while`-loop, the syntax may be like:

```
while <condition> is not met:
    if <another condition> is met:
        continue
    do_something() # if if <another condition> is met, 'continue' is executed, and this row is omitted
else:
    do_another()
```

**Issue.** We all know how to take as input `int`, `float` or `str`:

In [2]:
int_input = int(input())
float_input = float(input())
str_input = input() # remember, input() function returns string by default

print()
print(int_input, float_input, str_input)

5
6.8
string

5 6.8 string


But how to take list (or any collection) as input? One possible way is to:

* enter list elements one by one;
* choose the criterion for stop (for instance, `-1`);
* use `while` loop.

Example:

In [3]:
one_by_one_list = []

elem = int(input()) # here we collect int elements, but can collect elements of any type

while elem != -1:
    one_by_one_list.append(elem)
    elem = int(input()) # define the next element

print()
print(one_by_one_list)

8
3
5
6
-1

[8, 3, 5, 6]


It is nice, but what if we want just to enter the sequence of elements (for instance, divided by whitespace) that would be converted to a list? For this task we should remember about `.split()` method:

In [4]:
sequence = input()
sequence

8 3 5 6


'8 3 5 6'

In [5]:
sequence = input() # enter numbers divided by whitespace
split_list = sequence.split()

print(split_list) # isn't it easier than the previous way?

8 3 5 6
['8', '3', '5', '6']


Or even shorter:

In [6]:
split_list = input().split()

print(split_list)

8 3 5 6
['8', '3', '5', '6']


You probably noticed that our 'numbers' are still strings. It can be easily resolved using **list comprehension**:

In [7]:
split_list = input().split()
split_list = [int(elem) for elem in split_list]

split_list

8 3 5 6


[8, 3, 5, 6]

Or even shorter!

In [8]:
split_list = [int(elem) for elem in input().split()] # one row using list comprehension equals 5 rows using while loop

print(split_list)

8 3 5 6
[8, 3, 5, 6]


<b style="color: red;">Notice.</b> It may be very useful for **HA3**.

### Map (unnecessary part)
`map` function is the another specific way to proceed some operation with each value of the sequence. Suppose we have `letters` list and want to make each letter uppercase. Two first ways to do it are familiar to you:

In [None]:
letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = []

for l in letters:
    uppered_letters.append(l.upper()) # for loop way

print(uppered_letters)

In [None]:
letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = [i.upper() for i in letters] # list comprehension way

print(uppered_letters)

The third path (path of the true Jedi) is to use `map()` function. Its syntax:

`map(<some_function>, <sequence>)`

Here `<some_function>` will be applied to each element of the `<sequence>`. You can specify the function manually or use anonimous function:

`lambda <value>: do_something(<value>)`

It means: take `<value>` and `do_something()` with it.

Application to our case:

In [None]:
letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = list(map(lambda x: x.upper(), letters)) # function: take x (element of the sequence) and upper it

print(uppered_letters)

Again, you have to wrap `list()` around `map()` because `map()` returns specific `map`-class object:

In [None]:
letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = map(lambda x: x.upper(), letters) # needs to be converted to list

print(uppered_letters)
print(type(uppered_letters))

`map` is even faster than list comprehension (but only applying to large sequences). Compare three variants with the small sequence (of length 5):

In [None]:
%%timeit # magic expression to measure time need for the current cell's code execution

letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = []

for l in letters: # here 'num'
    uppered_letters.append(l.upper())

In [None]:
%%timeit

letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = [i.upper() for i in letters]

In [None]:
%%timeit

letters = ['a', 'b', 'c', 'd', 'e']
uppered_letters = list(map(lambda x: x.upper(), letters)) # here map is not efficient

Try with large sequence (of length 100000):

In [None]:
%%timeit

numbers = list(range(100000))
doubled_numbers = []

for i in numbers:
    doubled_numbers.append(i * 2)

In [None]:
%%timeit

numbers = list(range(100000))
doubled_numbers = [i * 2 for i in numbers]

In [None]:
%%timeit

numbers = list(range(100000))
doubled_numbers = map(lambda x: x * 2, numbers) # map is the most appropriate option

### Summary

* multiple assignment is a very powerful way of unpacking collections and simultaneous assignment;
* you can go from the string to the list using `.split()` method and `.join()` to do reversely;
* `for`-loop allows you to apply any operation to each element of the sequence;
* list comprehension is more elegant and shorter than `for`-loop;
* `while`-loop is helpful when you want to repeat some operation until requered condition is not met;
* `map()` function is much more efficient than loops and list comprehensions with large sequences.

### Practice

#### Task 1
Iterate through the list of lists and create a new list (or change the existing) where the first and the second elements of each sub-list are swapped. List of lists is pre-defined for you:

In [None]:
list_of_lists = [
    [5, 6, 1, 2],
    [8, 3, 0, 9],
    [4, 2, 3, 1],
    [1, 1, 0, 8],
    [0, 3, 3, 0]
]

for sublist in list_of_lists:
    sublist[0], sublist[1] = sublist[1], sublist[0]

print(list_of_lists)

#### Task 2
Write the code that takes a sentence, removes any `.`-s and `,`-s, splits it by whitespace and prints each word in a new string in the reverse order (bonus task: after the sentence print the name of the person to whom the expression belongs):

In [None]:
example_sentence = 'Ceterum censeo, Carthaginem esse delendam.'
# your code here

#### Task 3
Write the code that takes a sentence, removes any `.`-s and `,`-s, splits it by whitespace and prints each word in a new string in the reverse order (you should use `.join()` method doing this task):

In [None]:
example_sentence2 = 'Non est potestas super terram.'
# your code here

#### Task 4
Print the result of multiplication by `3` for any member of sequence from `0` to `30` if the remainder of the division of the member by `3` is `2`:

In [None]:
# your code here

#### Task 5
Create list of such members using `for`-loop:

In [None]:
# your code here

#### Task 6
Create list of such members using list comprehension:

In [None]:
# your code here

#### Task 7
For each member of the list above print the result of division of the member by its index (unless the index is `0`) using `enumerate()` function:

In [None]:
# your code here