# Dictionaries
It is a collection of *`key-value`* pairs, we use dictionaries to map a key to it's value.


In [20]:
point = { 'x': 1,
          'y': 2 }

print(type(point))
print(point)

<class 'dict'>
{'x': 1, 'y': 2}


In the example above,  we are using a string and integer for it's value. We can also use a built-in `dict` function to create new dictionaries.

In [21]:
point = dict(x=1, y=2)
print(type(point))
print(point)

<class 'dict'>
{'x': 1, 'y': 2}


We can only use `immutable types` (numbers, strings, and tuples) `as` a `key`, often we use strings and numbers. Value can be of any type.

In [22]:
english_to_spanish = dict(one = 'uno', two = 'dos', three = 'tres')
print(english_to_spanish)


{'one': 'uno', 'two': 'dos', 'three': 'tres'}


The order of the key-value pairs is not the same. In fact, if you type the same example on your computer, you might get a different result. In general, the order of items in a dictionary is unpredictable.

In a dictionary, index is a name of key. Dictionaries are collection of key-value pairs, we cannot access an item using a numeric index as we do with list.

In [23]:
point = dict(x=1, y=2)
print(point['x']) 
print(point['y'])

1
2


If the key isn’t in the dictionary, you get an exception (`KeyError`):

In [24]:
point = dict(x=1, y=2)
print(point['z']) 

KeyError: 'z'

There are two workarounds to avoid raising an exception.

1. One Solution is to check for the existance of a `key` using `in` operator.

2. The other solution is to use `get` method. In `get()` method, if a key doesn't exist it returns `None` or *default value* (specifid as 2nd argument).

In [None]:
# using 'in' operator
point = dict(x=1, y=2)

# Entering invalid key
if 'z' in point:
    print(point['z'])

# Entering valid key
if 'x' in point:
    print(point['x'])

1


In [None]:
# using 'get()' method
point = dict(x=1, y=2)

# Entering invalid key
print(point.get('z'))

# Entering valid key
print(point.get('x'))

None
1


The `len` function works on dictionaries; it returns the number of key-value pairs:

In [None]:
point = dict(x=1, y=2)
print("Number of key-value pairs: ",len(point))

Number of key-value pairs:  2


We can also use `keys` method to retrieve keys of a dictionary. This is helpful if you want to sort the keys in alphabetical order. You first make a list of the keys in the dictionary using the `keys` method available in dictionary objects, and then sort that list.

In [None]:
counts = dict(chuck = 1, annie = 42, jan = 100)
lst = list(counts.keys())
lst.sort()
print(lst)


['annie', 'chuck', 'jan']


To see whether something appears as a value in a dictionary, you can use the method `values`, which returns the values as a type that can be converted to a list, and then use the `in` operator:

In [None]:
point = dict(x=1, y=2)

coordinates = point.values()
print(type(coordinates))

coordinates = list(coordinates)
print(type(coordinates))

print(1 in coordinates)

<class 'dict_values'>
<class 'list'>
True


The `in` operator uses different algorithms for lists and dictionaries. For lists, it uses a *`linear search algorithm`*. As the list gets longer, the search time gets longer in direct proportion to the length of the list. For dictionaries, Python uses an algorithm called a *`hash table`* that has a remarkable property: the `in` operator takes about the same amount of time no matter how many items there are in a dictionary.

## Loops and Dictionaries
If you use a dictionary as the sequence in a `for` statement, it traverses the keys of the dictionary.

In [None]:
counts = dict(chuck = 1, annie = 42, jan = 100)
for key in counts:
    print(key, counts[key])

chuck 1
annie 42
jan 100


You can also use the `items` method of a dictionary to retrieve key and its value in `for` loop. `items` method puts the key-value pair in a tuple and then you can unpack it into key and value.

In [None]:
counts = dict(chuck = 1, annie = 42, jan = 100)
for key, value in counts.items():
    print(key, value)

chuck 1
annie 42
jan 100


If we wanted to find all the entries in a dictionary with a value above 10, we could write the following code:

In [None]:
counts = dict(chuck = 1, annie = 42, jan = 100)
for key, value in counts.items():
    if value > 10:
        print(key, value)

annie 42
jan 100


If you want to print the keys in alphabetical order, you first make a list of the keys in the dictionary using the `keys` method available in dictionary objects, and then sort that list and loop through the sorted list, looking up each key and printing out key-value pairs in sorted order as follows:

In [None]:
counts = dict(chuck = 1, annie = 42, jan = 100)
keys = list(counts.keys())
print(keys)
keys.sort()
for key in keys:
     print(key, counts[key])

['chuck', 'annie', 'jan']
annie 42
chuck 1
jan 100


First you see the list of keys in unsorted order that we get from the `keys` method. Then we see the key-value pairs in order from the `for` loop.

## Dictionary as a Set of Counters
Suppose you are given a string and you want to count how many times each letter appears. 

You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item.

Advantage of this implementation is that we don’t have to know ahead of time which letters appear in the string and we only have to make room for the letters that do appear.

Here is what the code might look like:

In [None]:
word = 'brontosaurus'
characters_count = dict() # empty dictinary

for char in word:
    if char not in characters_count:
        characters_count[char] = 1
    else:
        characters_count[char] += 1

print(characters_count)

{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}


The `for` loop traverses the string. Each time through the loop, if the character (`char`) is not in the dictionary, we create a new item with key `char` and the initial value 1 (since we have seen this letter once). If `char` is already in the dictionary we increment `characters_count[char]`.

We can also use `get` method to write our loop more concisely. Because the `get` method automatically handles the case where a key is not in a dictionary, we can reduce four lines down to one and eliminate the `if` statement.

In [None]:
word = 'brontosaurus'
characters_count = dict() 

for char in word:
    characters_count[char] = characters_count.get(char, 0) + 1 # 0 is a default value

print(characters_count)

{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}


## Dictionaries and Files
One of the common uses of a dictionary is to count the occurrence of words in a file with some written text. Let’s start with a very simple file of words taken from the text of Romeo and Juliet.

For the first set of examples, we will use a shortened and simplified version of the text with no punctuation (`romeo.txt` in Text Files folder). Later we will work with the text of the scene with punctuation included.



In [None]:
path = "Text Files/romeo.txt"
file_handle = open(path)

characters_count = dict()

for line in file_handle:
    words = line.split()
    for word in words:
        if word not in characters_count:
            characters_count[word] = 1
        else:
            characters_count[word] += 1

file_handle.close()
print(characters_count)

{'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1, 'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3, 'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1, 'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1, 'grief': 1}


We can rewrite the above program when using `get` method as:

In [10]:
path = "Text Files/romeo.txt"
file_handle = open(path)

characters_count = dict()

for line in file_handle:
    words = line.split()
    for word in words:
        characters_count[word] = characters_count.get(word, 0) + 1

file_handle.close()
print(characters_count)

{'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1, 'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3, 'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1, 'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1, 'grief': 1}


We have two for loops. The outer loop is reading the lines of the file and the inner loop is iterating through each of the words on that particular line. The combination of the two nested loops ensures that we will count every word on every line of the input file.

### Advanced Text Parsing
In the above example using the file romeo.txt, we made the file as simple as possible by removing all punctuation by hand. The actual text has lots of punctuation (in `romeo_punc.txt`)

In [3]:
path = "Text Files/romeo_punc.txt"
file_handle = open(path)

for line in file_handle:
    print(line.rstrip())

But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,


Since the Python `split` function looks for spaces and treats words as tokens separated by spaces, we would treat the words “soft!” and “soft” as *different* words and create a separate dictionary entry for each word.

Also since the file has capitalization, we would treat “who” and “Who” as different words with different counts.

We can solve both these problems by using the string methods `lower`, `punctuation`, and `translate`. The `translate` is the most subtle of the methods. Here is the documentation for `translate`:

`line.translate(str.maketrans(fromstr, tostr, deletestr))`

*Replace the characters in `fromstr` with the character in the same position in `tostr` and delete all characters that are in `deletestr`. The `fromstr` and `tostr` can be empty strings and the `deletestr` parameter can be omitted.*

We will not specify the tostr but we will use the deletestr parameter to delete all of the punctuation.


In [13]:
import string
path = "Text Files/romeo_punc.txt"
file_handle = open(path)

characters_count = dict()

for line in file_handle:
    line = line.translate(line.maketrans('', '', string.punctuation))
    line = line.lower()
    words = line.split()
    for word in words:
        characters_count[word] = characters_count.get(word, 0) + 1

file_handle.close()
print(characters_count)

{'but': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1, 'window': 1, 'breaks': 1, 'it': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3, 'juliet': 1, 'sun': 2, 'arise': 1, 'fair': 1, 'kill': 1, 'envious': 1, 'moon': 1, 'who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1, 'grief': 1}
