# List comprehension, generators, iteration exercises

## 1. Comprehension

Convert the following for loops into comprehensions:

In [None]:
l = []
for i in range(-5, 10, 2):
    l.append(i-2)

In [None]:
l = []
for i in range(100):
    if i % 10 == 4:
        l.append(i)

In [None]:
l1 = [12, 1, 0, 13, -3, -4, 0, 2]
l2 = []

for e in l1:
    if e % 2 == 1:
        l2.append(e)

In [None]:
l1 = [12, 1, 0, 13, -3, -4, 0, 2]
l2 = []

for e in l1:
    if e % 2 == 1:
        l2.append(True)
    else:
        l2.append(False)

In [None]:
l1 = [3, 5, 7, 11, 13, 17, 19]
l2 = [2, 4, 6, 8, 10]

products = []

for x in l1:
    for y in l2:
        products.append(x*y)

In [None]:
l1 = [3, 5, 7, 11, 13, 17, 19]
l2 = [2, 4, 6, 8, 10]

products = []

for x in l1:
    for y in l2:
        if (x + y) % 3 == 0:
            products.append(x*y)

In [None]:
fruits = ["apple", "plum", "pear", "avocado"]

mtx = []
for fruit in fruits:
    row = []
    for i, c in enumerate(fruit):
        row.append(c*(i+1))
    mtx.append(row)
    
mtx

In [None]:
text = "ababaacdsadb"

char_freqs = {}

for c in text:
    try:
        char_freqs[c] += 1
    except KeyError:
        char_freqs[c] = 1
        
char_freqs

In [None]:
d1 = {"a": 1, "b": 3, "c": 2}
d2 = {"a": 2, "b": 1}

d3 = {}

for key in set(d1.keys()) | set(d2.keys()):
    max_val = max(d1.get(key, 0), d2.get(key, 0))
    d3[key] = max_val

d3

## 2. Generators

The following piece of code downloads a small sample of the Hungarian Webcorpus. We will work on this in later exercises.

The corpus contains a single word-per-line and sentence boundaries are denoted by empty lines.

The file has 4 columns separated by TABs:
1. original word
2. lemma (stemmed word)
3. morphological analysis
4. morphological analysis candidates.

Take a look at the file before continuing.

In [None]:
import os
import urllib.request

fn = 'web2-4p-9-17'
zipname = fn + '.zip'

if not os.path.exists(zipname):
    print("Downloading corpus")
    webcorp_url = "http://avalon.aut.bme.hu/~judit/resources/webcorp_parts/web2-4p-9-17.zip"
    u = urllib.request.URLopener()
    u.retrieve(webcorp_url, zipname)

if not os.path.exists(fn):
    from zipfile import ZipFile
    with ZipFile(zipname) as myzip:
        myzip.extractall()

## 2.1. Write a generator function that yields one sentence at a time as a list of tokens. Make sure to yield the very last sentence of the file as well.

In [None]:
def read_sentences(filename):
    # TODO
    
sentence = next(read_sentences(fn))

assert(len(sentence) == 19)
assert isinstance(sentence, list)

sentences = read_sentences(fn)

import types
assert isinstance(sentences, types.GeneratorType)

sentences = list(sentences)

assert(len(sentences) == 90764)

## 2.2 Write a generator function that yields one sentence at a time but skips short sentences. The length limit should be a parameter of the generator which defaults to 5.

In [None]:
def read_long_sentences(filename, min_length=5):
    # TODO
    
sentences = read_long_sentences(fn)
assert isinstance(sentences, types.GeneratorType)

sentences = list(sentences)
assert len(sentences) == 85163

sentences = read_long_sentences(fn, 15)

sentences = list(sentences)
assert len(sentences) == 50059

## 3. Binary search tree

Create a binary search tree class for integers. Write tests for your solution as well.

Implement the following:
- iteration protocol for the tree. Traversal should be in-order (increasing order).
- sum(tree) - sum of all the elements
- min(tree), max(tree) - smallest, largest element
- len(tree) - number of nodes