<div style="text-align: right">
    <i>
        LIN 537: Computational Lingusitics 1 <br>
        Fall 2019 <br>
        Alëna Aksënova
    </i>
</div>

# Notebook 5: list comprehensions and dictionaries

This notebook shows a way to dynamically construct list by defining a _list comprehension_, and then introduced a new basic Python data type: dictionary.
 An optional advanced section introduces `map`.

## List comprehensions

Let's say we have the following task. We are given a list of words:

In [None]:
words = ["sky", "water", "air", "nature", "forest", "ice"]

We want to create a list where we will collect lengths of the words from `words`. Using Python knowledge that we built so far, the code can look like the one below.

In [None]:
lengths = []

for w in words:
    lengths.append(len(w))
print(lengths)

### Basics of list comprehensions

**List comprehensions** allow to construct a new list while looping over some other list that already exists in memory.
So now, instead of the following code

    new_list = []
    for item in some_container:
      new_list.append(function(item))
      
we can use its shorter and more memory-efficient version:
    
    new_list = [function(item) for item in some_container]

In [None]:
lengths = [len(w) for w in words]
print(lengths)

Or, for example, we can create a new list where every single word from the old list would be reversed.

In [None]:
reversals = [w[::-1] for w in words]
print(reversals)

Or maybe you want to create a list where all letters in `words` are masked:

In [None]:
masked_words = ["*" * len(w) for w in words]
print(masked_words)

And, of course, it is possible to map all values of the variable `w` to something that does not depend on `w` at all:

In [None]:
masked_words_revisited = ["WORD" for w in words]
print(masked_words_revisited)

**Practice 1:** create a list that will contain last letters of every word in `words`.

**Practice 2:** You are given two lists: `words` and `new_indices`.

In [None]:
words = ["sky", "water", "air", "nature", "forest", "ice"]
new_indices = [3, 0, 5, 1, 2, 4]

Write a list comprehension that would yield the following list:

    ['nature', 'sky', 'ice', 'water', 'air', 'forest']

### Adding conditions to list comprehensions

So far the list comprehensions we wrote were using the following logic:
  
  1. consider an item from some container;
  2. based on the value of item, create some new item;
  3. add this new item to a new list.
  
We have the following two lists defined: `swadesh` and `words`.

In [None]:
swadesh = ["fish", "bird", "dog", "house", "tree", "seed"]
words = ["bird", "laptop", "puppy", "house", "seed", "Python"]

The task is to make a copy of `words` that will contain only the words that are also included in the Swadesh list.

In [None]:
words_copy = []
for w in words:
    if w in swadesh:
        words_copy.append(w)
print(words_copy)

It is also possible to do it using a list comprehension. The syntax will be the following:

    new_list = [function(item) for item in some_container if condition]

In [None]:
words_copy = [w for w in words if w in swadesh]
print(words_copy)

**Practice.** Make a copy of the list `swadesh` that will contain words that end with a vowel.

In [None]:
swadesh = ["fish", "bird", "dog", "house", "tree", "seed"]

# your code

### More complicated list comprehensions

It is also possible to add `else` and a second `for` loop in list comprehensions. However, more "overloaded" the list comprehension is, less readable it is. 

My general advice would be not to use a list comprehension if you are see that it looks very scary and unreadable. :)

However, consider the following examples of list comprehensions and their "unfolded" versions.

**Example 1.** Make a copy of the list `words`. Leave the words unmasked if those words are included in the `swadesh` list, otherwise mask them as "UNK".

In [None]:
swadesh = ["fish", "bird", "dog", "house", "tree", "seed"]
words = ["bird", "laptop", "puppy", "house", "seed", "Python"]

No list comprehension:

In [None]:
words_1 = []
for w in words:
    if w in swadesh:
        words_1.append(w)
    else:
        words_1.append("UNK")
print(words_1)

With a list comprehension:

In [None]:
words_2 = [w if (w in swadesh) else "UNK" for w in words]
print(words_2)

**Example 2.** Imagine we want to create a list of letters all letters from `words` while preserving their order.

In [None]:
letters_1 = []
for w in words:
    for letter in w:
        letters.append(letter)
print(letters)

It is also possible to do using a list comprehension:

In [None]:
letters_2 = [letter for w in words for letter in w]
print(letters_2)

Bonus way to get the same output:

In [None]:
letters_3 = list("".join(words))
print(letters_3)

## Dictionaries

So far we know one way to uniquely identify an item within an object: _by index_. In order to be able to access something by index, objects need to be ordered.

When the objects are not ordered, we can uniquely identify the values by their name, or _by key_. **Dictionary** is a data type (`dict`) that associates every _value_ with its _key_.

    dictionary = {key_1:value_1, key_2:value_2, ..., key_n:value_n}
    
### Requirements for keys and values

The **keys** must be unique (of course, because they are used _instead_ of indices). The data type of the key must be _immutable_. **Immutable** objects cannot be modified directly after they are created. Remember how difficult it is to modify a string and how easy it is to modify a list? Strings are _immutable objects_, and lists are _mutable_.

The **values** can be anything, and they can be repeated as well.

In [None]:
int_keys = {37: "hello", 9: "world"}
float_keys = {48.2: "hello", 3.0: "world"}
string_keys = {"hello": "world", "goodbye": "earth"}
bool_keys = {True: "hello", False: "world"}

However, lists are mutable, and therefore they cannot be used as keys. (Mutable objects are _unhashable_, or "changeable".)

In [None]:
list_keys = {[1, 2]: "Hello"}

As we learned earlier, keys must be _unique_. If several different dictionary items are defined with the same key, only the item that was mentioned the last will be included in the dictionary.

In [None]:
int_keys = {42: "hello", 0: "world", 42: "again"}
print(int_keys)

**Question:** what is the maximal size of a dictionary where all the keys are of the type `bool`?

**Dictionary with ISO 639 language codes**

For example, consider the following dictionary with some of the [ISO 639](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) codes of languages.

In [None]:
iso_639 = {'ny': ['Chichewa', 'Chewa', 'Nyanja'], 
           'zh': ['Chinese'], 
           'cs': ['Czech'], 
           'da': ['Danish'], 
           'dv': ['Divehi', 'Maldivian']}

Values can be accessed by keys in the same way as we were accessing them by indices before:

In [None]:
print("The value of \"ny\" is", iso_639["ny"])
print("The value of \"da\" is", iso_639["da"])

**Question.** How to access the string "Catalan"?

Adding a new item to the dictionary is extremely easy.
    
    dictionary[new_key] = new_value

In [None]:
iso_639["ru"] = ["Russian"]
print(iso_639)

### Iterating over dictionaries

One thing to be beware of is that iterating over the dictionary actually iterates over the keys.

In [None]:
for language in iso_639:
    print(language, end=" ")

**Practice.** Modify a for loop that produces the following output:

    ny -> ['Chichewa', 'Chewa', 'Nyanja']
    zh -> ['Chinese']
    cs -> ['Czech']
    da -> ['Danish']
    dv -> ['Divehi', 'Maldivian']
    ru -> ['Russian']

One way to iterate over the `key:value` pairs is to apply `.items()` method to the dictionary. It turns the key and the value into a _tuple_. (Remember `zip` and `enumerate`? Tuples are basically _immutable_ versions of lists.)

In [None]:
for pair in iso_639.items():
    print("Pair:", pair)
    print("Key:", pair[0])
    print("Value:", pair[1], "\n")

### Methods defined for dictionaries

I exemplify here the following dictionary methods:
  * method `.keys()` returns a collection of keys;
  * method `.values()` returns a collection of values;
  * method `.clear()` removes all items from a dictionary;
  * operator `del` deletes an item by its key.
  
Click on [this link](https://www.programiz.com/python-programming/methods/dictionary) if you want to learn more.

First of all, it is possible to separate a dictionary into independent collections of keys and list of values.

In [None]:
zip_codes = {"Stony Brook": [11733, 11790, 11794],
             "Port Jefferson": [11777],
             "Lake Grove": [11755, 11790]
            }

Method `.keys()` returns a collection of keys from a dictionary, and it can easily be typecasted into a list.

In [None]:
print("Keys:            ", zip_codes.keys())
print("Keys (as a list):", list(zip_codes.keys()))

Method `.values()` returns a collection of values, and it also can easily be typecasted into a list.

In [None]:
print("Values:            ", zip_codes.values())
print("Values (as a list):", list(zip_codes.values()))

**Practice 1.** Create a list of keys in `zip_codes` using a list comprehension.

**Practice 2.** Now create a list of values in `zip_codes` using a list comprehension.

The operator `del` removes an item from a dictionary. When we were looking at the list methods, we saw the `del` deletes an item by its index. Dictionary items do not have indices, but they have keys, so `del` deletes an object from a dictionary _by key_.

In [None]:
del zip_codes["Stony Brook"]
print(zip_codes)

Finally, method `.clear()` wipes the dictionary:

In [None]:
zip_codes.clear()
print(zip_codes)

**Example.** We are given lists `fruits` and `prices`.

In [None]:
fruits = ["banana", "apple", "apple", "peach", "kiwi", "kiwi", "kiwi"]
prices = ["$1.20", "$0.87", "$0.48", "$2.9", "$0.93", "$1.48", "$1.05"]

It means that the only observed price of a banana is $\$1.20$. In different stores, apples cost $\$0.87$ and $\$0.48$, and so on. We want to create a dictionary that will store all the prices in the following way:

    {'banana': ['$1.20'], 'apple': ['$0.87', '$0.48'], 'peach': ['$2.9'], 'kiwi': ['$0.93', '$1.48', '$1.05']}

In [None]:
fruit_prices = {}
for i in range(len(fruits)):
    if fruits[i] not in fruit_prices:
        fruit_prices[fruits[i]] = [prices[i]]
    else:
        fruit_prices[fruits[i]].append(prices[i])
print(fruit_prices)

**Practice.** An easy way to calculate a sum of all numbers before a certain integer (excluding that integer) is the following one:

In [None]:
a = 8
print("1 + 2 + 3 + 4 + 5 + 6 + 7 =", sum(range(a)))

b = 16
print("1 + 2 + 3 + ... + 14 + 15 =", sum(range(b)))

Create a dictionary where keys will be natural numbers from $1$ to $10$. For every key $n$, its value is the sum of all numbers from $0$ up to $n-1$.

    sums = {1: 0, 2: 1, 3: 3, 4: 6, 5: 10, 6: 15, 7: 21, 8: 28, 9: 36, 10: 45}

## Advanced section: `map`

There is a way to "map" every single item of a given iterable to a new value using `map`. 
    
    new_iterable = map(function, old_iterable)
    
The parameter `function` represents any function such as `len`, `sum`, `range`, or others. `map` creates a _map_ object that can be easily converted to a list. (You can also use a function name that you will define by yourself, and we will learn how to do it soon!)

In [None]:
lengths = map(len, words)
print("Map object:", lengths)
print("Map object:", list(lengths))

**A very advanced question:** You are given the `initial_range`.

In [None]:
initial_range = list(range(12))
print(initial_range)

Find a way to create the following list (the shape of the representation doesn't matter):

        [[],
         [0],
         [0, 1],
         [0, 1, 2],
         [0, 1, 2, 3],
         [0, 1, 2, 3, 4],
         [0, 1, 2, 3, 4, 5],
         [0, 1, 2, 3, 4, 5, 6],
         [0, 1, 2, 3, 4, 5, 6, 7],
         [0, 1, 2, 3, 4, 5, 6, 7, 8],
         [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
         [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
         
You might need to use `map` twice!

# Homework 5

**Due on Sunday, October 6th, 11.59pm**

Send your notebook (don't forget to save your solutions!) to <alena.aksenova@stonybrook.edu> with the subject **\[CompLing1\] Homework 5**.

**Problem 1.** Below you will find several codes that build lists. Rewrite these codes using list comprehensions.

**Subproblem A.** No list comprehension:

In [None]:
words = ["apple", "banana", "kiwi", "orange"]
masked_words = []

for w in words:
    masked_words.append("UNK")
    
print(masked_words)

List comprehension:

In [None]:
# your code

**Subproblem B.** No list comprehension:

In [None]:
names = ["andrew", "Mary", "jimmy", "Noam"]
title_names = []

for name in names:
    if name.istitle():
        title_names.append(name)
        
print(title_names)

List comprehension:

In [None]:
# your code

**Subproblem C.** No list comprehension:

In [None]:
numbers = range(30)
even = []

for i in numbers:
    if i % 2 == 0:
        even.append(i)
        
print(even)

List comprenension:

In [None]:
# your code

**Problem 2.** You are given the following lists: `text` and `words`.

In [None]:
text = ['a', 'infinity', 'reflection', 'with', 'like', 'big', 'briefly', 'into', 'children', 'which', 
        'fruit', 'picking', 'there', 'try', 'little', 'around', 'appearances', 'appeared', 'all', 
        'crossed', 'basis', 'improbability', 'their', 'discworld', 'black', 'to', 'death', 'future', 
        'only', 'my', 'robe', 'things', 'for', 'it', 'existed', 'said', 'sake', 'sometimes', 'right', 
        'way', 'that', 'country', 'chessboard', 'quoth', 'well', 'domestic', 'skull', 'wonderful', 
        'hooded', 'or', 'empty', 'bottom', 'mirror', 'himself', 'rather', 'over', 'every', 'triangle', 
        'roses', 'border', 'orbiting', 'was', 'from', 'show', 'be', 'pecked', 'bones', 'just', 'universe', 
        'me', 'triangular', 'gets', 'worth', 'have', 'climbed', 'service', 'fluttered', 'top', 'but', 
        'grey', 'claws', 'at', 'rats', 'creep', 'own', 'pattern', 'point', 'white', 'than', 'dark', 
        'therefore', 'frame', 'this', 'not', 'the', 'could', 'mind', 'turtle', 'scrabble', 'better', 
        'industries', 'looked', 'an', 'cherubs', 'life', 'anything', 'more', 'small', 'and', 'of', 'his', 
        'on', 'skulls', 'elephants', 'in', 'thoughts', 'seen', 'nearest', 'expectantly', 'other', 'side', 
        'shape', 'total', 'so', 'world', 'look', 'sun']

words = ["shape", "lingusitics", "every", "even", "world", "chessboard", "water", "sake"]

Create a dicionary where words from the list `words` are the keys, and their values are `True` or `False` depending on these words being present in `text` or not. Expected output:
    
    {'shape': True, 'lingusitics': False, 'every': True, 'even': False, 'world': True, 
     'chessboard': True, 'water': False, 'sake': True}

**Problem 3. (optional)** You are given the following list of numbers:

In [None]:
numbers = [1, 2, 3, 4, 5]

Find a way to get the following output:

    [2.0, 1.5, 1.3333333333333333, 1.25]
    
Notice, that $2 / 1 = 2$, $3 / 2 = 1.5$, and so on. Use a list comprehension if you can.