# 2 Loops

In [None]:
import pathlib  # After installing the package, we still need to import it to actually be able to use it in our code.

infile = pathlib.Path('UD_English-GUM', 'en_gum-ud-dev.conllu')
with open(infile) as f:               
    for line in f:                    # Loops over individual lines in the file.  <<<<<<<<<<<
        line = line.strip()           
        if line.startswith('#'):      # Checks if a line starts with '#'.         <<<<<<<<<<<
           continue                   
        if line:                      # Otherwise, if we read a non-empty line, ...   <<<<<<<<<<<
            print(line)               
        else:                         # The only other possible case is that we encounter an empty line.      <<<<<<<<<<
            break                     # Then we break out of the loop.            <<<<<<<<<<<

In Python, objects are "things" that have values. They are referred to in code via "expressions". Can you think of a few examples?

In contrast to objects and expressions, control structures are certain keywords or combinations of keywords that tell the Python Interpreter program where to go next.
There are loops and branches.

Loops execute something repeatedly. Branches execute something only under certain conditions.

Note: Spaces are important! Indentation shows the structure of the code.

## Blocks

A "block" is a grouping of statements.
Instructions of the same block must be indented by the same number of the same type of whitespace characters (blank/tab).
Best practice: always stick to the same type of whitespace! Using an IDE (e.g. VSCode) makes your life easier.

## For-loops

For-loops are used to iterate over iterable objects, usually collections like lists.

In [None]:
my_list = ['apples', 'bananas', 'citrus']  # This is a list. What does it consist of?

# For-loops are declared with the "for" and "in" keywords.
for item in my_list:   # The thing that comes between the "for" and the "in" is a variable, 
                       # which in each iteration of the loop takes the respective next value.
    print(item)        # It can be accessed inside the loop.

for item in ['apples', 'bananas', 'citrus']:  # You can also declare the list directly in the loop declaration.
    print(item)

Here is another example:

In [None]:
weekdays = ['Tuesday', 'Thursday']
for day in weekdays:
   print("Today is a", day)

## Exercise

What is the output of the following program?

In [None]:
fruits = ["apple", "banana", "melon"]
for i in range(2, 6, 2):
    for f in fruits:
        print(str(i) + " " + f + "s")

## Ranges

A special type of iterable that is very useful for looping are ranges.

In [None]:
for i in range(0,5):
   print(i)

For now, you can imagine that `range(0,5)` creates a list: `[0, 1, 2, 3, 4]`

Note: `range(start, end)` - The end point is not included in the sequence.

`range(start, end, step)` All arguments must be integers (whole numbers).

`range(0,10,2)` returns `[0, 2, 4, 6, 8]`

`range(10,0,-2)` returns `[10, 8, 6, 4, 2]`

As we can see in our working example at the beginning of this notebook and the ones from yesterday, file objects are iterable, too! They directly let us iterate over individual lines.

## If-statements

Another very important but thankfully also very intuitive control structure we see in the working example is If-else-branches.

First, the expression after "if" is evaluated as a logical expression, resulting in a truth value, `True` or `False`.

In [None]:
# Logical operators

a = True
b = False

not a     # inverses value (not True -> False, not False -> True)
a and b   # logical and (remember truth tables?)
a or b    # logical or

# comparison operators

x = 3
y = 4

x == y    # True if values are equal, False otherwise
x < y     # less than
x > y     # greater than
x <= y    # less than or equal
x >= y    # greater than or equal

## Exercises

What are the values of these expressions?

`(True or False) and False`

`1+4 >= 2*2`

```
hypothesis = True
p = False
q = True
not ((not hypothesis) or (p or q))
```

In [None]:
# Test your solutions here by copy+pasting! You can also try your own complex expressions!

Let's put it all together!

What are the values of a, b and c after executing the following piece of code?

In [None]:
a = b = 2
c = False
if not c:
    if b < a:
        b += 5
        a = b-1
    elif a < b:
        c = True
    else:
        if a+b < 4:
            c = False
        a = 11
        b = 2.2
print(a,  b,  c)

What do the following two scripts return for

a) x = True; y = True

b) x = False; y = True

c) x = True; y = False

d) x = False; y = False

In [None]:
if x:
    print("Hello")
if y:                 # This check is completely independent of the previous one.
    print("World")
else:                 # The "else" branch directly depends on the previous "if".
    print("Bye bye")

In [None]:
if x:
    print("Hello")
elif y:             # "elif" is short for else+if, so it depends on the previous "if".
    print("World")
else:
    print("Bye bye")

## While-loops

```
while expr:
   block
```

Evaluate `expr`.

If `False`: continue program after loop (next statement with same indent as `while`)

If `True`: execute statements of block. Then go back to line 1.

In [None]:
a = 8
b = 1
while a > 1:
   b += 3
   a = a / 2
print(a, b)

What does this program output?

# Lists

Lists are defined with square brackets and commas: `[1, 2, 3]`

Here are a few useful things you can do with them.

In [None]:
my_list = ['apples', 'bananas', 'citrus']  # declaring a list

print(my_list[0])  # accessing the first element (counting starts at 0)
print(my_list[0:2])  # accessing the elements at indexes 0 through 1 (end index is excluded)

In [None]:
print(len(my_list))  # computing the length of the list

In [None]:
number_list = [1,3,5,3,2,3,5,6,7,5,3,2]

print(set(number_list))  # reducing a list to the set of only the uniquely occuring values

If all the values in the list are numbers, you can even do some statistics!

In [None]:
print(max(number_list))   # extracting the maximum value
print(min(number_list))   # extracting the minimum value

In [None]:
my_sum = sum(number_list)  # computing the sum of the values
my_avg = my_sum / len(number_list)  # What does this do?

print(my_sum)
print(my_avg)

Lists can be changed! 

To add something to a list, use `my_list.append(item)`. It will get added at the end.

There are a bunch of different ways to remove something from a list, depending on whether you want to get rid of a specific position (index) in the list (without knowing the value that is there), or a specific value (without knowing where it is in the list).

In [None]:
my_list.append('dolphin')
my_list.append(5)  # You can combine items of different types (e.g. strings and numbers) in the same list.

print(my_list)

In [None]:
del my_list[2]  # removes item at index 2 (third item)
print(my_list)

In [None]:
my_list.pop()  # removes last item
print(my_list)

In [None]:
new_list = [5,3,6,2,7,1,2,3,4,5]
print(new_list)

new_list.remove(3)  # removes first occurence of the given value
print(new_list)

# Tuples

Tuples are similar to lists in many ways: They also store collections of objects of arbitrary types, keeping track of their order.
You can access individual indexes and slices from a start index to an end index in the same way.

The first obvious difference is that tuples use round parentheses rather than square brackets: `(1,2,3)`

In [None]:
my_tuple = ('apples', 'bananas', 'citrus')  # declaring a tuple

print(my_tuple[0])  # accessing the first element (counting starts at 0)
print(my_tuple[0:2])  # accessing the elements at indexes 0 through 1 (end index is excluded)

print(len(my_tuple))  # computing the length of the list

number_tuple = (1,3,5,3,2,3,5,6,7,5,3,2)

print(set(number_tuple))  # reducing a list to the set of only the uniquely occuring values

print(max(number_tuple))   # extracting the maximum value
print(min(number_tuple))   # extracting the minimum value

my_sum = sum(number_tuple)  # computing the sum of the values
my_avg = my_sum / len(number_tuple)  # What does this do?

print(my_sum)
print(my_avg)

The other major difference is that tuples cannot be changed after they have been declared!

In [None]:
my_tuple.append('dolphin')  # throws an error

del my_tuple[2]  # throws an error

my_tuple.remove('apple')  # throws an error

# Exercises

Extract a list of all genres from `en_gum-ud-dev.conllu`! How many unique genres are there?

In [None]:
# Your code here




# Example solution (read and understand the code first, before running it!)
#
# genres = []
# 
# with open(infile, encoding='utf-8') as f:
#     for line in f:
#         line = line.strip()
#         if line.startswith('# meta::genre'):
#             meta, genre = line.split(' = ')
#             genres.append(genre)
# 
# genres = set(genres)
# print(genres)
# print(len(genres))

How many sentences are there in `en_gum-ud-dev.conllu`? How many in `en_gum-ud-train.conllu`?

In [None]:
# Your code here






# Example solution (read and understand the code first, before running it!)
#
# infile = pathlib.Path('UD_English-GUM', 'en_gum-ud-dev.conllu')
# 
# n = 0
# 
# with open(infile, encoding='utf-8') as f:
#     for line in f:
#         line = line.strip()
#         if not line:
#         # if line.startswith('# text ='):
#             n += 1
# 
# print(n, 'sentences in dev')
# 
# infile = pathlib.Path('UD_English-GUM', 'en_gum-ud-train.conllu')
# 
# n = 0
# 
# with open(infile, encoding='utf-8') as f:
#     for line in f:
#         line = line.strip()
#         if not line:
#             n += 1
# 
# print(n, 'sentences in train')

What is the average, maximum, mininum sentence length in terms of number of word tokens in `en_gum-ud-train.conllu`?

In [None]:
# Your code here

What is the average sentence length in terms of number of word tokens in `en_gum-ud-train.conllu` __per genre__?

In [None]:
# Your code here