# D1 - 01 - Variables and Data Structures

## Content
- How do I use a jupyter notebook?
- What are variables and what can I do with them?
- Which data structures are available?

## Jupyter notebooks...
... are a single environment in which you can run code interactively, visualize results, and even add formatted documentation. This text for example lies in a **Markdown**-type cell. To run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>.

## Variables
Let's have a look at a code cell which will show us how to handle variables:

In [1]:
a = 1
b = 1.5

Click with you mouse pointer on the above cell and run it with <kbd>&#x21E7;</kbd>+<kbd>&#x23ce;</kbd>. You now a assigned the value $1$ to the variable `a` and $1.5$ to the variable `b`. By running the next cell, we print out the contents of both variables along with their type:

In [2]:
print(a, type(a))
print(b, type(b))

1 <class 'int'>
1.5 <class 'float'>


`a=1` represents an **integer** while `b=1.5` is a **floating point number**.

Next, we try to add, subtract, multiply and divide `a` and `b` and print out the result and its type:

In [3]:
c = a + b
print(c, type(c))

2.5 <class 'float'>


In [4]:
c = a - b
print(c, type(c))

-0.5 <class 'float'>


In [5]:
c = a * b
print(c, type(c))

1.5 <class 'float'>


In [6]:
c = a / b
print(c, type(c))

0.6666666666666666 <class 'float'>


Python can handle very small floats...

In [7]:
c = 1e-300
print(c, type(c))

1e-300 <class 'float'>


...as well as very big numbers:

In [8]:
c = int(1e20)
print(c, type(c))

100000000000000000000 <class 'int'>


Note that, in the last cell, we have used a type conversion: `1e20` actually is a `float` which we cast as an `int` using the same-named function. Let's try thsi again to convert between floats and integers:

In [9]:
c = float(1)
print(c, type(c))

1.0 <class 'float'>


In [10]:
c = int(1.9)
print(c, type(c))

1 <class 'int'>


We observe that a `float` can easily be made into an `int` but in the reverse process, trailing digits are cut off without propper rounding.

Here is another example how Python handles rounding: we can choose between two division operators with different behavior.

In [11]:
c = 9 / 5
print(c, type(c))

1.8 <class 'float'>


In [12]:
c = 9 // 5
print(c, type(c))

1 <class 'int'>


The first version performs a usual floating point division, even for integer arguments. The second version performs an integer division and, like before, trailing digits are cut.

The last division-related operation is the modulo division:

In [13]:
c = 3 % 2
print(c, type(c))

1 <class 'int'>


For exponentiation, Python provides the `**` operator:

In [14]:
c = 2**3
print(c, type(c))

8 <class 'int'>


If we want to "update" to content of a variable, e.g. add a constant, we could write

```Python
c = c + 3
```

For such cases, however, Python provides a more compact syntax:

In [15]:
c += 3
print(c)

11


The versions `-=`, `*=`, `/=`, `//=`, `%=`, and `**=` are also available.

Now we shall see how Python stores variables. We create a variable `a` and assign its value to another variable `b`:

In [16]:
a = 1
b = a
print(a, id(a))
print(b, id(b))

1 94701830363648
1 94701830363648


When we now `print` the values and `id`s of both variables, we see that `a` and `b` share the same address: both are referencing the same **object**.

In [17]:
a += 1
print(a, id(a))
print(b, id(b))

2 94701830363680
1 94701830363648


If, however, we modify `a`, we see that `a` changes its avlues as well as its address while `b` remains unchanged. This is because the built-in data types `float` and `int` are **immutable**: you cannot change a `float`, you can only create a new one.

Here is a nice property of Python that allows easy swapping of variables:

In [18]:
print(a, b)

2 1


If we want to swap `a` and `b`, we do not need a third (swapping) variable as we can use two or even more variables on the left of the assignment operator `=`:

In [19]:
a, b = b, a
print(a, b)

1 2


Finally: text. A variable containing text has the type `str` (**string**). We use either single `'` or double `"` quotes.

In [21]:
a = 'some text...'
print(a, type(a))

some text... <class 'str'>


When we add two strings, they are simply concatenated:

In [22]:
b = a + " and some more"
print(b)

some text... and some more


### Playground: variables

Time to get creative! Create some variables, add or subtract them, cast them into other types, and get a feeling for their behavior...

In [70]:
variable = [1,2,3]
variable *=2
print(variable)

set_variable = {3,4,2}
set_2_variable = {6,7,8}
set_variable = list(set_variable)
set_2_variable = tuple(set_2_variable)
print(set_variable, set_2_variable)

int_variable = 3
int_variable *= 2
print(int_variable)

[1, 2, 3, 1, 2, 3]
[2, 3, 4] (8, 6, 7)
6


In [84]:
import numpy as np
str_var = [2,3,4]
print(str_var*2)
print(np.array(str_var)*2)


str_var_2 = [5,6,7]
print(str_var+str_var_2)
print(str_var - str_var_2)

print(str_var)

set_var = {3,1}
set_var_2 = {2,0}
#print(set_var*2)
#print(set_var+set_var_2)

[2, 3, 4, 2, 3, 4]
[4 6 8]
[2, 3, 4, 5, 6, 7]


TypeError: unsupported operand type(s) for -: 'list' and 'list'

## Data structures

Apart from the basic data types `int`, `float`, and `str`, Python provides more complex data types to store more than one value. There are several types of such structures which may seem very similar but differ significantly in their behavior:

In [92]:
a = ['one', 'two', 'three', 'four']
b = ('one', 'two', 'three', 'four')
c = {'one', 'two', 'three', 'four'}

print(a, id(a), type(a))
print(b, id(b), type(b))
print(c, id(c), type(c))

['one', 'two', 'three', 'four'] 140237876026440 <class 'list'>
('one', 'two', 'three', 'four') 140237875023128 <class 'tuple'>
{'four', 'two', 'three', 'one'} 140237885966824 <class 'set'>


Now, we have created a `list`, a `tuple`, and a `set`; each containing four strings.

We create a new variable `d` from the `list` in `a` and modify the first element of `d`:

In [93]:
d = a
print(d[0])
d[0] = 'ONE'
print(a, id(a), type(a))

one
['ONE', 'two', 'three', 'four'] 140237876026440 <class 'list'>


The `print` statement tells us that the change of `d`did change the content of `a`, but not its address. This means, a `list` is **mutable** and `a` and `d` are both pointing to the same, changeable object.

In [94]:
d += ['five']
print(a, id(a), type(a))

['ONE', 'two', 'three', 'four', 'five'] 140237876026440 <class 'list'>


We can also add another `list` via `+` ...

In [95]:
d.append('six')
print(a, id(a), type(a))

['ONE', 'two', 'three', 'four', 'five', 'six'] 140237876026440 <class 'list'>


... or another element via the `append()` method.

#### Exercise

What is the difference between adding two lists and using the `append` method?

In [99]:
var = ['e', 'f']
d.append(var)
d.extend(var)
print(d)

['ONE', 'two', 'three', 'four', 'five', 'six', ['e', 'f'], ['e', 'f'], 'e', 'f']


#### Exercise

Can you access the first element of a `set` like we did for a `list`?

In [98]:
print(type(c))
c[0]

<class 'set'>


TypeError: 'set' object does not support indexing

#### Exercise

Can you modify the first element of a tuple like we did for a `list`?

In [104]:
print(b, type(b)) #TUPLE IS IMMUTABLE OBJECT
b[0] = 'OOONE'


('one', 'two', 'three', 'four') <class 'tuple'>


TypeError: 'tuple' object does not support item assignment

A set can be modified by adding new elements with the `add()` method; the `+` operator does not work:

In [105]:
c.add('five')
print(c, id(c), type(c))

{'three', 'four', 'two', 'five', 'one'} 140237885966824 <class 'set'>


We also observe that the address does not change: `set`s are, like `list`s, mutable. Why should we use a `set` instead of a list if we cannot access elements by index? Let's see how `list`s and `set`s behave for non-unique elements:

In [106]:
d = ['one', 'one', 'one', 'two']
print(d)

['one', 'one', 'one', 'two']


In [107]:
d = {'one', 'one', 'one', 'two'}
print(d)

{'two', 'one'}


While a `list` (like a `tuple`) exactly preserves all elements in order, a `set` contains only one for each different element. Thus, a `set` does not care how often an element is given but only if it is given at all.

Finally: a `tuple` is like a `list`, but **immutable**.

What is the mater with **mutable** and **immutable** objects?

- A **mutable** object is cheap to change but lookups of individual elements are expensive.
- An **immutable** object cannot be changed (only remade which is expensive) but lookups of elements is cheap.

`list`s, `tuple`s, and `set`s can be converted into each other:

In [108]:
a_tuple = ('one', 'two', 'three')
print(a_tuple, type(a_tuple))

a_list = list(a_tuple)
print(a_list, type(a_list))

('one', 'two', 'three') <class 'tuple'>
['one', 'two', 'three'] <class 'list'>


In [109]:
a_tuple = ('one', 'two', 'three')
print(a_tuple, type(a_tuple))

a_set = set(a_tuple)
print(a_set, type(a_set))

('one', 'two', 'three') <class 'tuple'>
{'two', 'three', 'one'} <class 'set'>


In [110]:
a_list = ['one', 'two', 'three']
print(a_list, type(a_list))

a_set = set(a_list)
print(a_set, type(a_set))

['one', 'two', 'three'] <class 'list'>
{'two', 'three', 'one'} <class 'set'>


In [111]:
a_list = ['one', 'two', 'three']
print(a_list, type(a_list))

a_tuple = tuple(a_list)
print(a_tuple, type(a_tuple))

['one', 'two', 'three'] <class 'list'>
('one', 'two', 'three') <class 'tuple'>


#### Exercise

Take the given list and remove all multiple occurences of elements, i.e., no element may occur more than once. It is not important to preserve the order of elements.

In [118]:
a_list = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
print(a_list, id(a_list))
a_list = set(a_list)
print(id(a_list))
a_list = list(a_list)
print(a_list, id(a_list))



[1, 2, 2, 3, 3, 3, 4, 4, 4, 4] 140237875995848
140237875213032
[1, 2, 3, 4] 140237876582216


Let's have a closer look at indexing for `list`s and `tuple`s:

In [113]:
a = list(range(10)) #generator - range
b = tuple(range(10)) 

print(a, type(a))
print(b, type(b))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] <class 'list'>
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) <class 'tuple'>


In both cases, we access the first element by appending `[0]` to the variable name...

In [119]:
print(a[0], b[0])

0 0


... and the second element by appending `[1]`:

In [120]:
print(a[1], b[1])

1 1


Likewise, we access the last or second to last element using `[-1]` or `[-2]`:

In [121]:
print(a[-1], b[-1])
print(a[-2], b[-2])

9 9
8 8


Using `[:5]` we get all elements up to the index $5$ (excluded)...

In [122]:
print(a[:5])

[0, 1, 2, 3, 4]


... or starting from index $5$ until the end:

In [123]:
print(a[5:])

[5, 6, 7, 8, 9]


We can give both a start and end index to access any range...

In [41]:
print(a[2:7])

[2, 3, 4, 5, 6]


... and if we add another `:` and a number $>1$, this acts as a step size:

In [124]:
print(a[2:7:2]) #step column is a step size

[2, 4, 6]


A negative step size (and inverted start end indices) allows us to select a backwards defined range:

In [43]:
print(a[7:2:-2]) #beginning from end

[7, 5, 3]


This always follows the same pattern: `start:end:step`.

With this `slicing`, we can easily reverse an entire `list`:

In [125]:
print(a[::-1]) #default first - beg, last - end

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


#### Exercise

Coinvince yourself that the above indexing patterns also work for a `tuple`.

In [136]:
tup_var = (1,2,3,4,5)
print(type(tup_var))
print(tup_var[::-1])
print(tup_var[2:5])
print(tup_var[::2])
print(tup_var[::-1])
print(tup_var[::-2])
print(tup_var[4:1:-2])

<class 'tuple'>
(5, 4, 3, 2, 1)
(3, 4, 5)
(1, 3, 5)
(5, 4, 3, 2, 1)
(5, 3, 1)
(5, 3)


A remark on strings: `str`-type objects behave like a tuple...

In [137]:
c = 'this is a sentence'
print(c, type(c))
print(c[::-1])

this is a sentence <class 'str'>
ecnetnes a si siht


... the are immutable but elements can be (read-)accessed by index and `sclicing`.

Let us now revisit `set`s:

In [139]:
a = set('An informative Python tutorial')
b = set('Nice spring weather')

print(a)
print(b)

{'l', 'i', 'u', 'n', 'm', 'a', 'e', 'y', 'o', 'h', 'f', 'v', 'r', 'A', 'P', 't', ' '}
{'i', 'p', 'c', 'n', 'e', 'w', 'a', 'h', 'N', 's', 'g', 'r', 't', ' '}


Each `set` stores all letters used in the above sentences (but not the number of occurences). To make things easier to read, we pass each `set` through the `sorted()` function which sorts the sequence of letters:

In [140]:
print(sorted(a))
print(sorted(b))

[' ', 'A', 'P', 'a', 'e', 'f', 'h', 'i', 'l', 'm', 'n', 'o', 'r', 't', 'u', 'v', 'y']
[' ', 'N', 'a', 'c', 'e', 'g', 'h', 'i', 'n', 'p', 'r', 's', 't', 'w']


A very nice feature of `set`s that `list`s and `tuple`s do not have is that we can use them with the bitwise operators `&`(logical and), `|` (logical or), and `^` (logical xor).
Thus, we can easily get the intersection of two sets...

In [143]:
print(sorted(a & b)) #intersection

[' ', 'a', 'e', 'h', 'i', 'n', 'r', 't']


... their union...

In [144]:
print(sorted(a | b)) #or - объединение

[' ', 'A', 'N', 'P', 'a', 'c', 'e', 'f', 'g', 'h', 'i', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'v', 'w', 'y']


... or all letters which appear in only one of the two sentences:

In [147]:
print(sorted(a ^ b))

['A', 'N', 'P', 'c', 'f', 'g', 'l', 'm', 'o', 'p', 's', 'u', 'v', 'w', 'y']


There is a fourth type of data structure, a `dict` (dictionary), which has some resemblance to `set`s. A `dict` contains pairs of `keys` and `values`, and for the `keys`, a `dict` behave like a `set`; i.e. no `key` may appear more than once. The `values` can then be accessed by their `keys` instead of indices. You can create a `dict` either with the same-named function...

In [150]:
a = dict(one=1, two=2, three=3)
print(a, type(a))

{'one': 1, 'two': 2, 'three': 3} <class 'dict'>


... or with a `set`-like syntax:

In [151]:
b = {'four': 4, 'five': 5, 'six': 6}
print(b, type(b))

{'four': 4, 'five': 5, 'six': 6} <class 'dict'>


We convince ourselfs that `dict`s are mutable...

In [152]:
c = a
c.update(zero=0)
print(a, type(a))

{'one': 1, 'two': 2, 'three': 3, 'zero': 0} <class 'dict'>


... and see how we can access an element by key:

In [155]:
print(a['two'])

2


To get all `keys` of a `dict` we can use the `keys()` method:

In [156]:
print(a.keys())

dict_keys(['one', 'two', 'three', 'zero'])


To repeat what we have done three cells before: we can add more `key`-`value` pairs with the `update()` method which accepts the same syntax as the `dict` function as well as entire `dict` objects:

In [157]:
a.update(b) #update для добавления в словарь
print(a)

{'one': 1, 'two': 2, 'three': 3, 'zero': 0, 'four': 4, 'five': 5, 'six': 6}


A very useful feature of Python is that `list`s, `tuple`s, and `dict`s can be nested:

In [158]:
a = [0, 1, 2, ('one', 'two'), {5, 5, 5}]
print(a)

[0, 1, 2, ('one', 'two'), {5}]


In [159]:
b = dict(some_key=list(range(10)))
print(b)

{'some_key': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}


And, while a tuple itself is immutable, any mutable object within can be changed:

In [161]:
c = ('one', list(range(10))) #tuple не меняется, но если его элемент изменяемый, например, лист, то этот элемент можно менять
print(c)
c[1].append(10)#адрес tuple-а остается таким же
print(c)

('one', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
('one', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])


#### Exercise

Build your own nested structure of all the data types shown in this notebook. Can you create a set which contains another set, list, tuple or dictionary?

In [177]:
new_set = dict({'two':2, 'three':3}, [1,2,3])
print(new_set['two'])

TypeError: dict expected at most 1 arguments, got 2

In [186]:
new_set_2 = {(2,3,4), (2,3,4)}

new_dict = {'key':{'two':2, 'three':3}} #
print(new_dict['key']['two'])


2


In [170]:
new_list = [0,1,2, {3,4,5}, {'one':1, 'two':2}]
print(new_list[4])

{'one': 1, 'two': 2}
