# Notebook №5. Python programming for data collection and analysis

Performed by Movenko Konstatin, IS/b-21-2-o

## Dictionaries

Consider this problem: we have information about students' grades in a certain subject and we want to be able to work with this information — for example, by the name of a student to determine
what grade he received. We could try to solve this problem by creating two lists — one with the names of students and the other with grades:

In [1]:
students = ["Vasya", "Nick", "Peter", "Anna"] # new list of strings (names)
grades = [5, 4, 2, 3] # new list of integers (marks)

It would be nice if we could have a data type in which elements are numbered not by natural numbers, but by arbitrary objects. This data type exists: in Python it is called a *dictionary*.

This is how you can create a dictionary in Python:

In [2]:
gradebook = {"Vasya": 5, "Nick": 4, "Peter": 2, "Anna": 3} # new dictionary string-int

This is similar to creating a list, but there are a number of differences. First, we used curly brackets instead of square brackets to show that we are creating a dictionary. Secondly, the dictionary consists of entries, each entry consists of two parts: a *key* and a *value*. The key and value are separated by a colon. For example, we have the entry `"Anna": 3` with the key `"Anna"` and the value `3`. In total, our `gradebook` dictionary now contains four entries, the keys of which are the names of students, and the values are their grades.

In [3]:
gradebook # print the dictionary

{'Vasya': 5, 'Nick': 4, 'Peter': 2, 'Anna': 3}

Note that Python reordered the entries in the dictionary when printing. In fact, the order of output of entries in the dictionary is arbitrary: entries inside the dictionary have no order. Therefore, you cannot access, for example, the "first record", but you can access the record with this key:

In [4]:
gradebook['Anna'] # print value with specific key

3

In [5]:
gradebook['Vasya'] # print value with specific key

5

You can change the value of an entry, just like you can change a list item.

In [6]:
gradebook['Anna'] = 5 # change value with specific key

In [7]:
gradebook # print the dictionary

{'Vasya': 5, 'Nick': 4, 'Peter': 2, 'Anna': 5}

You can add a new entry.

In [8]:
gradebook['Innokenty'] = 4 # add new entry

In [9]:
gradebook # print updated dictionary

{'Vasya': 5, 'Nick': 4, 'Peter': 2, 'Anna': 5, 'Innokenty': 4}

If we try to access a record that does not exist, we will receive an error message:

In [10]:
gradebook['Alice'] # accessing a nonexistent entry

KeyError: 'Alice'

Often we want to be able to request a record, and if there is no record, get some "default value", not an error. To do this, use the `get()` method instead of square brackets.

In [None]:
gradebook.get('Alice') # accessing a nonexistent entry with get method

`None` has returned here:

In [11]:
print(gradebook.get('Alice')) # print what gotten by nonexistent key entry

None


In [12]:
gradebook.get('Vasya') # accessing an existent entry with get method

5

It would be possible to pass `get()` the second argument, and then if there is no such key in the dictionary, it will be returned.

In [13]:
gradebook.get('Alice', 'No such student') # get entry by specific key or default value if not exist

'No such student'

In [14]:
gradebook.get('Vasya', 'No such student') # get entry by specific key or default value if not exist

5

You can get a list of all the dictionary keys:

In [15]:
gradebook.keys() # get a list of all dictionary keys

dict_keys(['Vasya', 'Nick', 'Peter', 'Anna', 'Innokenty'])

In fact, it's not really a list, but this thing behaves almost like a list and you can make a list out of it. Similarly with a list of all dictionary values.

In [16]:
gradebook.values() # get a list of all values in dictionary

dict_values([5, 4, 2, 5, 4])

Dictionary keys can be not only lines. Let's say we want to create a dictionary in which the keys are numbers. There is nothing easier:

In [17]:
squares={1:1, 2:4, 3:9} # create a dictionary int-int

In [18]:
squares # print the dictionary

{1: 1, 2: 4, 3: 9}

In [19]:
squares[1] # get value by specific key

1

In [20]:
squares[2] # get value by specific key

4

In the previous two lines, `squares` behaves roughly like a list, but if you look closely, you can see that this is not a list, but still a dictionary.

In [21]:
squares # print the dictionary again 

{1: 1, 2: 4, 3: 9}

For example, any non-empty list has an element with index 0, but `squares` does not have such:

In [26]:
squares[0] # get a value by specific non-existent key 

KeyError: 0

## Entry iteration in dictionary

How to process information in the dictionary? To iterate through all the elements of the list , you could use the `for` loop . And what happens if you feed him a dictionary instead of a list? Let's try:

In [27]:
# print each dictionary's key on new line
for k in gradebook:
    print(k)

Vasya
Nick
Peter
Anna
Innokenty


Clear! The `for` loop in this case iterates through all the *keys* of our dictionary. And knowing the key, you can get and value:

In [28]:
# print each key and it's value on new line
for k in gradebook:
    print("The Student", k, "has mark", gradebook[k])

The Student Vasya has mark 5
The Student Nick has mark 4
The Student Peter has mark 2
The Student Anna has mark 5
The Student Innokenty has mark 4


However, there is a more elegant way to get the key and value of the next record at once: use `items()`.

In [29]:
# print each key and it's value on new line, iterating by the items() method
for k, v in gradebook.items():
    print("The Student", k, "has mark", v)

The Student Vasya has mark 5
The Student Nick has mark 4
The Student Peter has mark 2
The Student Anna has mark 5
The Student Innokenty has mark 4


How does this code work? Here the `items()` method is used, which returns a list (more precisely, an iterator) consisting of tuples of the form `(key, value)`.

In [30]:
list(gradebook.items()) # create list from dictionary tuples

[('Vasya', 5), ('Nick', 4), ('Peter', 2), ('Anna', 5), ('Innokenty', 4)]

In this case, the `for` operator understands that it is necessary to select the next tuple each time the loop passes and assign its first element (that is, the key) to the variable `k`, and the second element (that is, the value) to the variable `v` (of course, these variables could be called differently). We have already met with similar behavior when discussing the `enumerate` construction.

This is how you can find all records with a given value - for example, all students who received mark $4$:

In [31]:
# print all keys (students) where their value equals 4
for k, v in gradebook.items():
    if v==4:
        print(k)

Nick
Innokenty


Note that this "search by value" requires iterating over all entries in the dictionary, and if the dictionary large, it will take a lot of time - although the "search by key" will still be be performed quickly. By the way, you can quickly check if there is an entry in the dictionary with the given key:

In [32]:
"Nick" in gradebook # true if the key exists in the dictionary

True

In [33]:
"Alice" in gradebook # true if the key exists in the dictionary

False

If we wanted to search among the values, then we would have to explicitly specify this using the method `values()`:

In [34]:
1 in gradebook.values() # true if the value exists in the dictionary

False

In [35]:
4 in gradebook.values() # true if the value exists in the dictionary

True

In general, the `in` operator is not limited to use with dictionaries: it can used, for example, with lists:

In [36]:
5 in [1, 2, 3, 5, 8] # true if the value exists in the list

True

In [37]:
6 in range(1,5) # true if the number belongs to the range

False

## Dictionary creation and `zip()` function 

There are different ways to create dictionaries. For example, you can create an empty dictionary and gradually fill it with elements:

In [38]:
my_dict = {} # create empty dictionary

In [39]:
my_dict[1] = 1 # add new entry with specific key
my_dict['hello'] = 'world' # add another new entry with another specific key

In [40]:
my_dict # print whole dictionary

{1: 1, 'hello': 'world'}

Note that elements of different types coexist perfectly in the same dictionary (in a given case, strings and integers).

You can create a dictionary in another way by passing the `dict()` function a list consisting of key-value pairs (in makes some sense, it's the inverse of the `items()` method):

In [41]:
dict([('hello','world'), ('one', 'two')]) # create new dictionary from the specific list

{'hello': 'world', 'one': 'two'}

Let's say we have two lists, one contains the names of students, and the other contains their grades. How can from these lists to create a dictionary for which names would be keys, and estimates would be values?

But like this:

In [42]:
# create two new lists
students = ["Vasya", "Nick", "Peter", "Anna"]
grades = [5, 4, 2, 3]

new_gradebook = list(zip(students, grades)) # create the list of pairs 
new_gradebook # print gotten list

[('Vasya', 5), ('Nick', 4), ('Peter', 2), ('Anna', 3)]

This uses the handy `zip()` function, which is not limited to creating dictionaries. Like a zipper, it zips together (hence the name) multiple lists. For example, `zip()` makes a list of pairs from a pair of lists:

In [43]:
list(zip([1, 2, 3], ['a', 'b', 'c'])) # turn pair of lists into list of pairs

[(1, 'a'), (2, 'b'), (3, 'c')]

This construction can be used when we need to iterate over the elements of two related lists. For example, this is how you can display information about which student is which grade has without using dictionaries:

In [44]:
# print "student-mark" pairs 
for student, grade in zip(students, grades):
    print(student, "has grade", grade)

Vasya has grade 5
Nick has grade 4
Peter has grade 2
Anna has grade 3


The `zip()` function can also be used with more than two lists:

In [45]:
list(zip([1, 2, 3, 4], [5, 6, 7, 8], ['a', 'b', 'c', 'd'])) # create a list of tuples consisting of three elements

[(1, 5, 'a'), (2, 6, 'b'), (3, 7, 'c'), (4, 8, 'd')]

If any of the lists is shorter, then `zip()` will “cut off” the rest of the lists:

In [46]:
list(zip([1, 2, 3], ['a', 'b'])) # print list of pairs from lists with different length

[(1, 'a'), (2, 'b')]

## What objects can be dictionary keys

So far, we have considered dictionaries whose keys are strings and numbers. In fact, more complex objects can also be keys. For example, imagine such implementation of a fragment of the addition table as a dictionary:

In [47]:
sums = {(2, 3): 5, (4, 1): 5, (5, 7): 12} # create a dictionary of pairs "tuple-int"

In [48]:
sums # print the dictionary

{(2, 3): 5, (4, 1): 5, (5, 7): 12}

Here, the keys are tuples consisting of two numbers, and the values are the sums of these numbers.

In [49]:
sums[(2, 3)] # get entry by specific key

5

In [50]:
sums[(4, 1)] # get entry by specific key

5

In [51]:
sums[(5, 7)] # get entry by specific key 

12

At this point, an important difference between tuples and lists shows up: the latter cannot be keys. dictionaries as they may change.

In [52]:
sums = { [1,2]: 3} # trying to create a dictionary of "list-int"

TypeError: unhashable type: 'list'

This concludes our brief introduction to dictionaries and moves on to the next topic.

## List Comprehensions

Previously, we often encountered such a problem: given a list in which numbers are written, but in the form lines. Create a new list where the numbers would be numbers. We could solve this problem with using a loop:

In [53]:
str_list = ["1", "5", "12", "7"] # create a list of strings
int_list = []                    # create an empty list

# parse to int each string in str_list and append it to int_list
for s in str_list:
    int_list.append(int(s))
    
print(int_list) # print the gotten list

[1, 5, 12, 7]


Three lines are responsible for creating a new list. Writing them every time is rather dreary, and the creators of Python came up with (more precisely, borrowed from functional programming languages, and they borrowed it from mathematicians) much more elegant syntax. It's set up like this:

In [54]:
int_list = [int(s) for s in str_list] # list сomprehension

The square brackets around the expression should suggest that we are creating a list (because when we need to create a list, we usually enclose its elements in square brackets). Expression inside brackets should be read literally:

*a list of `int(s)` elements for (`for`) `s` elements from (`in`) `str_list` list*

In [55]:
int_list # print gotten list of ints

[1, 5, 12, 7]

See? The quotes have disappeared - we have a list consisting of numbers. Magic! Source List `str_list` has not changed:

In [56]:
str_list # print original list of strings

['1', '5', '12', '7']

Similarly, you can apply any operation to the elements of the list. For example, let's take all elements from `int_list` to square:

In [57]:
[x**2 for x in int_list] # create a list of squares

[1, 25, 144, 49]

or double all elements of the list:

In [58]:
double_list = [x*2 for x in int_list] # create a list of doubled values in int_list

In [59]:
double_list # print gotten list

[2, 10, 24, 14]

or add 1 to them:

In [60]:
[x+1 for x in int_list] # create a list of incremented values in int_list

[2, 6, 13, 8]

or cast them to floating point numbers:

In [61]:
[float(x) for x in int_list] # print list created by list comprehension that cast each integer to float-point

[1.0, 5.0, 12.0, 7.0]

As you can see, you can do anything with list items! However, that's not all. In syntax list inclusions can be filtered. For example, we need only those elements which are greater than 6. We can choose them in this way:

In [62]:
[x for x in int_list if x > 6] # print elements of int_list which are greater than 6

[12, 7]

When we write `x for x` here we mean that we just need to substitute in a new list elements of the old one, without doing anything with them (only choosing the ones you need). But you can somehow modify:

In [63]:
[x**2 for x in int_list if x > 6] # print squares of the elements that are greater than 6

[144, 49]

Now let's solve the following problem: there are two lists with numbers, and we want to find their element-wise sum.

In [64]:
# create two lists of integers
X = [2, 5, 8]
Y = [1, 3, 100]

It can be solved in this way (to sort through the elements of two lists at the same time, we use the `zip()` construct discussed above):

In [65]:
Z = [] # create empty list

# fill the list
for x, y in zip(X, Y):
    Z.append(x + y)
    
print(Z) # print the list

[3, 8, 108]


But with list comprehensions, the same code looks much prettier:

In [66]:
[x + y for x, y in zip(X, Y)] # do the same by using list comprehension

[3, 8, 108]

By the way, you can use a list-like syntax to create dictionaries:

In [67]:
squared = {i: i**2 for i in range(10)} # create a dictionary by using list comprehension syntax
squared # print gotten dictionary

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

## `map()` function

List inclusions have an analogue, which is now considered not very convenient, but sometimes encountered: the `map()` function.

In [68]:
str_list # print original list of strings

['1', '5', '12', '7']

This is how it is solved using `map()`:

In [69]:
int_list = list(map(int, str_list)) # cast each string to integer by using map and create list of integers

The `map()` function takes two arguments. It takes a function as its first argument, after that it applies this function to each of the elements of the list. In general, records of the form `list(map(int, str_list))` and `[int(x) for x in str_list]` are almost equivalent.

When the action to be applied already exists as a function (as in the case of `int`), then the `map()` construction looks even more concise than the list inclusion. But if we something less trivial needs to be done, list comprehensions are clearly simpler:

In [70]:
[int(x) + 1 for x in str_list] # create list of integers by using list comprehension

[2, 6, 13, 8]

To implement this with `map()` , you need to declare a new function that will return the value of the expression `int(x)+1` and pass it to `map()`.

In [71]:
# define the function that cast the argument to integer and increment it
def my_func(x):
  return int(x)+1

In [72]:
list(map(my_func, str_list)) # apply my_func to each element in specific list and save results to a new list

[2, 6, 13, 8]

For brevity, *lambda* functions can be used, but this approach is much less transparent than list inclusions, and its use is currently deprecated.

## Some words about efficiency

Using list comprehensions is not only nice, but also useful: they work more efficiently than loop code.

In [73]:
# import specific functions from modules
from random import random
from math import sqrt

N = 10000 # store an integer
mylist = [random() for _ in range(N)] # create list of random numbers by using list comprehension

In [74]:
%%timeit

# create list of squares of random numbers using loop
newlist = []
for x in mylist:
    newlist.append(sqrt(x))

1.1 ms ± 7.43 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [75]:
%%timeit

# do the same by using list comprehension
newlist = [sqrt(x) for x in mylist]

950 µs ± 7.28 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [76]:
%%timeit

# do the same by using map function
newlist = list(map(sqrt, mylist))

835 µs ± 4.65 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Complex data structures

Lists allow you to store a range of values, but often you need to be able to work with more complex structures - for example, with tables. Some programming languages have *two-dimensional arrays*. An analogue of a two-dimensional array in Python is a "list of lists", that is, such a list whose elements are other lists. We have already met with something similar.

Consider an example: a table that records the results of several homework tasks for several students. (Let's say we assigned some numbers to the students and therefore we don't need to know who's name is.) It can be written as a list of lists, for example, line by line:

In [77]:
table = [["HW1", "HW2", "HW3", "HW4"], [4, 3, 4, 4], [3, 4, 3, 4], [4, 5, 5, 4]] # create two-dimensional list

Here, each element of the `table` list is a row of our table, that is, it is also a list. For example, this is how you can find out what is written in the third row and fourth column of our table:

In [78]:
table[2][3] # get value at third row fourth column

4

What happened here? We first called the third row of the table with

In [79]:
table[2] # get third row from the table

[3, 4, 3, 4]

And then from this third row, the fourth element was selected with the help of `[3]` . Could it be write it down in more detail:

In [80]:
row = table[2] # store given row to the variable
print(row[3]) # print value in the row

4


This is how you can print all the elements of the table line by line:

In [81]:
# print each row in table 
for row in table:
  print(*row)

HW1 HW2 HW3 HW4
4 3 4 4
3 4 3 4
4 5 5 4


Now suppose that we still want to know which student received which grade. Then we could instead of a list of lists, use a dictionary whose lists would be values

In [82]:
gradebook = {'Bill': [4, 3, 2], 'Alice': [3, 4, 5], 'Bob': [5, 5, 4]} # create a dictionary of pairs string-list

This is how you can see what grade Bob got on the second homework:

In [83]:
gradebook['Bob'][1] # print the second mark in the list with key 'Bob'

5

That's all for today! :)