# Notebook №7. Information systems

Performed by Movenko Konstantin, IS/b-21-2-o

## Features of working with mutable data types

First, I'd like to discuss some of the subtleties when working with lists, dictionaries and other mutable data types. Let's consider a simple example of a function call.

In [1]:
# function which increments its argument and returns result
def my_func(x):
    x = x + 1
    return x

This function increments its argument by one and returns whatever the result is

In [2]:
y = 5                                   # assign new variable y
print("Function returns", my_func(y))   # print result of prev. function with y
print("y =", y)                         # print current value of y

Function returns 6
y = 5


Nothing unexpected. The function did not change the variable `y` and wasn't supposed to. Let's now try to do something similar with the list.

In [3]:
# function which appends number 1 to the list
def other_func(my_list):
    my_list.append(1)
    return my_list

This function takes a list as an argument, appends element 1 to the end of it and returns the result. Let's try calling it:

In [4]:
this_list = [6, 9, 33]                             # create a list
print("this_list before function:", this_list)     # list before working with the function
print("function returns:", other_func(this_list))  # result of the function with a list
print("this_list after function:", this_list)      # list after working with the function

this_list before function: [6, 9, 33]
function returns: [6, 9, 33, 1]
this_list after function: [6, 9, 33, 1]


Oh. Something strange has happened. The function `other_func` modified the list that was passed to it, although it was created outside this function and was not defined as global inside the function (in general, it was working with a different variable inside the function).

Why did this happen? To understand this issue, the easiest way is to look at the visualization.

In [5]:
%load_ext tutormagic

In [6]:
%%tutor --lang python3

# function which appends number 1 to the list
def other_func(my_list):
    my_list.append(1)
    return 

this_list = [6, 9, 33]  # create a new list
other_func(this_list)   # call the function with the created list

## Two sorts

Let's look at two more examples that show the difference between the two scenarios - these will again involve lists. Do the two visualizations and find what the difference is. After that, read on.

In [7]:
%%tutor --lang python3

# function for sorting a list (with "sorted" function)
def return_sorted(my_list):
    my_list = sorted(my_list)
    return my_list

this_list = [33, 1, 55]                                  # create a list
print("this_list before function:", this_list)           # list before working with the function 
print("function returned:", return_sorted(this_list))    # result of the function with a list
print("this_list after function:", this_list)            # list after working with the function

In [8]:
#%%tutor --lang python3

# function for sorting a list (with "sort" method)
def sort_and_return(my_list):
    my_list.sort()
    return my_list

this_list = [33, 1, 55]                                  # create a list
print("this_list before function:", this_list)           # list before working with the function 
print("function returned:", return_sorted(this_list))    # result of the function with a list
print("this_list after function:", this_list)            # list after working with the function

this_list before function: [33, 1, 55]


NameError: name 'return_sorted' is not defined

So, what's the difference?

The functions `return_sorted()` and `sort_and_return()` do approximately the same job: take a list as input, sort it and return. At the moment when the input to the function happens (Step 5) the situation in both fragments is identical: outside the function there is the variable `this_list`, inside the function there is a variable `my_list`, they both point to the same list. The key difference occurs in the next step. The `return_sorted()` function uses the `sorted()` function, which creates a new list. Then it assigns (`=`) the result of executing `sorted()` to the variable `my_list`. This causes the variable with this name to refer to the the new object (Step 7), while the old one (which is still referenced by `this_list`) remains intact. `sort_and_return()` works quite differently - it uses the `sort()` method, which sorts the list *in place*, within itself. In this case, no new object is created, no assignment operation is used and the `my_list` variable continues to refer to the same list as before. It just turns out to be sorted.

Note the similarity between `return_sorted()` and `my_func()` from the example above: both use the assignment operator. The difference is that in the case of a numeric variable, there is no other way to deal with it because numbers are immutable, while in the case of a list, the developer has a choice: he can create a new object and assign it to the old variable, or he can modify an existing object without creating a new one.

## Creating a copy

Now suppose that we want to write a function that takes a list as input, returns the same list, but with one added element, and does not change the original list itself. This can be done, for example, like this:

In [9]:
# function which appends variable to the copy of the list
def return_append(L, a):
    new_L = L.copy()
    new_L.append(a)
    return new_L

In [10]:
outer_list = [7, 54, 69]                                   # create a list
print("outer_list before funciton", outer_list)            # list before working with the function
print("function returned", return_append(outer_list, 55))  # result of the function with a list as argument
print("outer_list after function", outer_list)             # original list after working with the function

outer_list before funciton [7, 54, 69]
function returned [7, 54, 69, 55]
outer_list after function [7, 54, 69]


The main trick here is to use `L.copy()` - recall that this method creates a copy of the of the existing list. Then we perform the assignment operation again (that is, now `new_L` is the name for the copy of the list `L`, not the list `L` itself) and we can do whatever we want with this `new_L`. The old list will not change.

## Not just function calls

Problems similar to those discussed above do not occur only when calling functions. Let's start with simple example with a loop.

In [11]:
some_list = [7, 9, 11] # create a list
for x in some_list:    # attempt to change the list in a loop
    x = x + 1    
print(some_list)       # print the list after loop

[7, 9, 11]


The `some_list` list hasn't changed, and that's no surprise. But now let's take a look a slightly more complicated situation with the list of lists.

In [12]:
table = [[1, 5], [7, 9]]  # create a list of lists (matrix)
for row in table:         # attempt to change inner lists in a loop
    row.append(77)
print(table)              # print the list after loop

[[1, 5, 77], [7, 9, 77]]


And "oh" again. What happened? Let's take a look at the visualizer.

In [13]:
%%tutor --lang python3

table = [[1, 5], [7, 9]]  # create a list of lists (matrix)
for row in table:         # attempt to change inner lists in a loop
    row.append(77)
print(table)              # print the list after loop

When the first step of the loop (Step 3) is executed, the first element of the  `table` list is written to the `row` variable. However, this element is itself a list - or rather a reference to a list. The next step (Step 4) adds an element `77` to the list. Then `row` becomes a reference to the second element of the list `table`. Element `77` is added to it as well.

Pay attention to the parallel with the previous plot: here, too, a call to the list method is involved, which changes this list *in place*.

## Puzzle

What do you think will happen if you execute the following code? Try to execute it, look at the result and try to explain it.

In [14]:
# create a list with 5 inner empty lists, having the same refence
A = [[]]*5

# append number 1 to the list referenced in A[0]
A[0].append(1)

# number 1 was appended to each list with this reference
print(A)

[[1], [1], [1], [1], [1]]


## Changing an iterated object in a loop

In the example above, we changed the contents of the "inner" lists, but the `table` list itself remained unchanged: it did not change the number of elements and the elements remained references to the same list-rows as before. But is it possible to change the list itself during iterations? As it turns out, you can. Although in most cases it is better not to do it. Before we look at the example, let us remind you of how the `pop()` method works.

In [15]:
L = [6, 9, 44, 8]   # create a list
print(L.pop())      # remove thye last item from the list and print it
print(L)            # print the updated list

8
[6, 9, 44]


It removes the last item from the list and returns the same. Let's apply it now as follows:

In [16]:
L = [7, 8, 9, 10]                  # create a list
for x in L:                        # loop which makes list 1 item shorter with each iteration (removing half of the items)
    print("Pop element", L.pop())  
    print(x)                       
print(L)                           # print updated list

Pop element 10
7
Pop element 9
8
[7, 8]


The loop is executed twice: by the time the loop finishes processing element 8, elements 9 and 10 from the list will have already been deleted, there will be no unprocessed items left in the list and the loop will stop.

You can guess what will happen only by looking at the code very carefully. This means that the code is not very good: by looking at good code, you can figure out what it will do.

The situation is different with dictionaries.

In [17]:
d = {1:2, 3:4}          # create a dictionary          
for k, v in d.items():  # iterate dictionary by it's items
    del d[3]            # try to delete item from dictionary while processing (creates error)
    print(k, v)

1 2


RuntimeError: dictionary changed size during iteration

Here, the command `del d[3]` removes the element with key `3` from the dictionary. Since the order of iterating of dictionary elements is not defined, no one knows how to correctly continue iterating after the dictionary size has been changed. Therefore, such operation is forbidden.

However, this does not mean that it is forbidden to change the dictionary value when executing a loop. For example,
we want to add the number 1 to all the values. The following naive method expectantly won't work:

In [18]:
d = {1:2, 3:4}          # create a dictionary          
for k, v in d.items():  # iterate dictionary by it's items
    v = v + 1           # try to increment iterated item
print(d)                # print "updated" dictionary

{1: 2, 3: 4}


In fact, this problem should be solved in the following way:

In [19]:
d = {1:2, 3:4}        # create a dictionary          
for k, in d:          # iterate dictionary by keys
    d[k] = d[k] + 1   # increment value by key
print(d)              # print updated dictionary

TypeError: cannot unpack non-iterable int object

## Sets

Another basic data type in Python is a set. It matches to the mathematical concept of a set - that is, a collection of some elements. Each element can either be part of set, or not.

*If you are currently enrolled in our Python course, you are belong to a set of students. You can't be "twice in the course": each element can only belong to a set once.*

In [20]:
my_set = {6, 9, 11, 11, 9, 'hello'} # create a set (mentioning identical values)

In [21]:
my_set # print created set (no dublicates in the set)

{11, 6, 9, 'hello'}

As can be seen from this simple example, the elements of the set are also not ordered.

In [22]:
{6, 9, 11, 11, 9, 'hello'} == {9, 'hello', 11, 6} # comparing sets (order of elements means nothing)

True

This is how you can check whether an element lies in the set.

In [23]:
9 in my_set   # check element in set (True)

True

In [24]:
10 in my_set  # check element in set (False)

False

Of course, the `in` operator doesn't work only for sets. For example, `4 in [2, 4, 8, 10]` will return `True`. However, for lists this operation is slow - or rather, *massive*: the larger the list, the more comparison operations are needed to see if a particular element is in it. In the case of sets, the time for checking practically does not grow with the increase of the number of elements of the set. the number of elements of the set.

You can do different operations with sets — we are familiar with them in math courses. For example, the union and intersection of two sets gives a new set.

In [25]:
{6, 8, 9} | {6, 11, 7}  # union of two sets

{6, 7, 8, 9, 11}

In [26]:
{6, 8, 9} & {6, 11, 7}  # intersection of two sets

{6}

*Note again: the order of the elements in the set is undefined. If you need output the elements of the set in some predefined order, you can turn it into a sorted list using the `sorted()` function.*

In [27]:
s = {"Hello", "World", "Aaaaa", "Test", "Guest", "Aaaaa", "Zzzzz","Zz","Q"} # creating a set
print(s)                                                                    # print the set
print(sorted(s))  # here the output is no longer a set, but a list: note the square brackets

{'World', 'Aaaaa', 'Test', 'Zzzzz', 'Zz', 'Q', 'Guest', 'Hello'}
['Aaaaa', 'Guest', 'Hello', 'Q', 'Test', 'World', 'Zz', 'Zzzzz']


## Example of using sets

Let's say we ask a user to enter a command, but we want to give the user the ability to enter the same command in different ways. For example, to stop a program the user could enter the word `stop` or `STOP` or `Stop` or just the letter `s` or `S`. We can handle this case by with several conditions connected by `or`:

In [28]:
s = 'stop'

# check a variable for equality to one of the values
if s == 'stop' or s == 'Stop' or s == 'STOP' or s == 'S' or s == 's':
    print("Okay, stopping")

Okay, stopping


Or we can create a set for all possible variations of the `stop` command and check if our command is included in this set:

In [29]:
s = 'stop'

# check a variable for equality to one of the set elements
STOPS = {'stop', 'Stop', 'STOP', 'S', 's'}
if s in STOPS:
    print("Okay, stopping")

Okay, stopping


However, in this place, probably, instead of a set, it would be possible to use just a list.

## A little more about the strings

I was going to tell you about methods for working with strings for a long time. In general, there are a lot of methods and I won't tell you about all of them, but we will discuss some of them now.

In [30]:
s = "hello world, hello"
new_s = s.replace("hello", "Hi") # replace one substring with another
print(new_s)                     # updated string
print(s)                         # original string

Hi world, Hi
hello world, hello


This is how, for example, you can replace a substring in a string. Note: string is an immutable data type, so, unlike list methods like `append()`, string methods never change the string itself (this is not possible at all), but instead create a new string and return the result.

If you wanted to replace only the first few occurrences (that is, only the first word `hello`, but not the second one), you could add a third argument to the `replace` method - it shows how many times the replacement should be done.

In [31]:
# replace "hello" with "Hi" in the line only 1 time
"hello world, hello".replace("hello", "Hi", 1)

'Hi world, hello'

This is how you can find a substring in a string:

In [32]:
# index() function returns the index of the first character of the substring in the string
s.index("world")

6

In [33]:
# find() function does the same
s.find("world")

6

Both methods return the index of the first character of the substring. The difference is that if `index()` cannot find the substring at all, it will generate an exception, and if `find()` has the same problem, it will return `-1` as the index.

By the way, you can also check if a substring is included in a string with this way:

In [34]:
# check substring in the string with "in" operation
"world" in s

True

And this is how you can calculate how many substrings occur in a string:

In [35]:
# count() function counts all occurrences of substring 
s.count("o")

3

## File input-output

We are starting to work with files. Now we will discuss only reading and writing. How to run files for execution is a separate story - there is a `subprocess` method for that, we will get to it someday (maybe). Also, for beginning, we'll talk about text or text-like files (for example, a Python code or CSV file are all text). There are also binary files which are useless to read "with your eyes" - we will talk about some of them separately.

Let's say we want to read a file.

In [36]:
f = open("func.txt") # open a text file "func.txt" for reading
s = f.read()         # read text from the file
f.close()            # close the file
print(s)             # print what was read in the file

Hello, world!
Welcome to func.txt file!
Nice to see you there!



What happened here? First, we opened for reading a `func.txt` file, lying in our current working directory. To find out which directory is working, you can do the following:

In [37]:
import os
os.getcwd() # print the curret directory adress

'C:\\Users\\kosta\\Documents\\Учёба\\СДЕЛАНО\\5 семестр\\ИТ'

The `open()` function returned an object of `file` type - a variable that can be used to work with the file. Then we read the contents of the file into a line `s`, after that we closed the file. Closing files is very useful: if you forget to close a file, another application will not be able to open it (for example, to write something to it).

The `read()` function counts the entire file into one big string variable. This is not always convenient (given that strings in Python are immutable and therefore not always work efficiently), so there are various other scenarios for working with files. For example, you can read the contents of a file into a list, broken down by line.

In [38]:
f = open("func.txt")   # open a text file "func.txt" for reading
lines = f.readlines()  # read text from file, broken down by lines
f.close()              # close the file 

In [39]:
print(lines)           # print what was read in the file

['Hello, world!\n', 'Welcome to func.txt file!\n', 'Nice to see you there!\n']


Note that each of the lines is wrapped with a newline character `\n` — they were present in the file and we honestly read them from it. This is how you can output a file by lines, numbering them:

In [40]:
# output lines of the file, numbering them
for i, line in enumerate(lines, 1):
    print(i, line, end="")

1 Hello, world!
2 Welcome to func.txt file!
3 Nice to see you there!


Another way to do this is not to create a separate list, but to iterate a file object right away.

In [41]:
f = open("func.txt")             # open a text file "func.txt" for reading
for i, line in enumerate(f, 1):  # iterate the file by its lines
    print(i, line, end="")       # print the line with it's number
f.close()                        # close the file 

1 Hello, world!
2 Welcome to func.txt file!
3 Nice to see you there!


This method is more preferable if the file is large. In this case it may not be possible to read the whole file into memory, but it is quite possible to process it one line at a time.

There are, however, some tricks here. Consider, for example, the following code:

In [42]:
f = open("func.txt")           # open a text file "func.txt" for reading
for line in f:                 # print the file's content by lines
    print(line, end="")       
print("----The next one----")
for line in f:                 # try to iterate file's lines again
    print(line, end="")  
f.close()                      # close the file 

Hello, world!
Welcome to func.txt file!
Nice to see you there!
----The next one----


What happened here? Why didn't the second loop execute at all (nothing is printed after the line `----The next one----`)? Very simple: the variable `f`, although it pretends to be a list of lines when we iterate it, is actually not. In fact, when we open a file, we remember the position at which we read the file. Initially it points to the very beginning of the file, but it shifts with each iteration. When we read the whole file, further attempts to read something from it will lead to nothing: the pointer of the current position has moved to the very end and the file is over.

However, it is possible to go back to the beginning: to do this, you need to use the `seek()` method.

In [43]:
f = open("func.txt")           # open a text file "func.txt" for reading
for line in f:                 # print the file's content by lines
    print(line, end="")       
print("----The next one----")
f.seek(0)                      # return the pointer of the file reading to the beginning
for line in f:                 # print the file's content by lines again
    print(line, end="")  
f.close()                      # close the file 

Hello, world!
Welcome to func.txt file!
Nice to see you there!
----The next one----
Hello, world!
Welcome to func.txt file!
Nice to see you there!


## Writing to files

To create a file and write something to it, you need to open it *for recording*. This is done by passing the second argument to the `open` function — here you need to write the line "w" (from *write*).

***Attention!** If the file you are trying to open for writing already exists, it **will be deleted without any warning.***

You can write information to a file that is open for writing, for example, using the `write()` method.

In [44]:
f = open("other.txt", "w")  # open a text file for recording
f.write("Hello\n")          # write to a file string "Hello\n"
f.close()                   # close the file

Let's check what happened:

In [45]:
f = open("other.txt")
f.read()

'Hello\n'

We see that we have indeed written the line `Hello\n` to the file `other.txt`. Note that here we opened the file for writing and didn't assign the file object to any variable, but immediately called the `read()` method from it. In this case, the file will be closed automatically some time after this command is executed (the system gives a warning that we have not closed the file explicitly - in some cases this may cause some problems).