# DICTIONARIES

The function dict creates a new dictionary with no items

In [1]:
eng2sp = dict()

To add items to the dictionary, you can use square brackets:

In [2]:
eng2sp['one'] = 'uno'
print eng2sp

{'one': 'uno'}


You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value.

In [33]:
eng2sp = {'one': ['uno', 'uno1'], 'two': 'dos', 'three': 'tres'}
eng2sp

{'one': ['uno', 'uno1'], 'three': 'tres', 'two': 'dos'}

In [4]:
# The len function works on dictionaries; it returns the number of key-value pairs:
len(eng2sp)

3

The in operator works on dictionaries; it tells you whether something appears as a key in the dictionary (appearing as a value is not good enough)

In [5]:
'one' in eng2sp

True

To see whether something appears as a value in a dictionary, you can use the method values, which returns the values as a list, and then use the in operator.

In [6]:
vals = eng2sp.values()
'uno' in vals

True

## Dictionary as a set of counters

Suppose you are given a string and you want to count how many times each letter appears.

**METODA 1**

In [8]:
word = 'brontosaurus'
d = dict()
for i in word:
    if i not in d:
        d[i] = 1
    else:
        d[i] = d[i] + 1
print d

{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}


**EXPLICATIA 1**

The for loop traverses the string. Each time through the loop, if the character i is not in the dictionary, we create a new item with key i and the initial value 1 (since we have seen this letter once). If c is already in the dictionary we increment d[c] by 1.

Dictionaries have a method called **get** that takes a key and a default value. If the key appears in the dictionary, get returns the corresponding value; otherwise it returns the default value.

In [13]:
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
print (counts.get('jan', None))

100


In [8]:
print (counts.get('tim', None))

None


We can use **get** to write our histogram loop more concisely. Because the get method automatically handles the case where a key is not in a dictionary, we can reduce four lines down to one and eliminate the if statement.

**METODA 2**

In [19]:
word = 'brontosaurus'
d = dict()
for i in word:
    d[i] = d.get(i,0) + 1
print d

{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}


The use of the get method to simplify this counting loop ends up being a very commonly used “idiom” in Python

**METODA 3**

In [25]:
import collections
collections.Counter('brontosaurus')

Counter({'o': 2, 's': 2, 'r': 2, 'u': 2, 'a': 1, 'b': 1, 'n': 1, 't': 1})

## Dictionaries and files

One of the common uses of a dictionary is to count the occurrence of words in a file with some written text.

We will write a Python program to read through the lines of the file, break each line into a list of words, and then loop through each of the words in the line, and count each word using a dictionary

In [None]:
fname = raw_input('Enter the file name: ')
try:
    fhand = open(fname)
except:
    print 'File cannot be opened', fname
    exit()

counts = dict()
for line in fhand:
    words = line.split()
    for i in words:
        if i not in words:
            counts[i] = 1
        else: counts[i] += 1

print counts

It is a bit inconvenient to look through the dictionary to find the most common words and their counts, so we need to add some more Python code to get us the output that will be more helpful.

## Looping and dictionaries

Use a dictionary as the sequence in a for statement --> it traverses the keys of the dictionary. This loop prints each key and the corresponding value:

In [1]:
counts = { 'chuck': 1 , 'annie': 42, 'jan': 100}
for key in counts:
    print key, counts[key]

jan 100
chuck 1
annie 42


We want to find all the entries in a dictionary with a value above ten:

In [2]:
counts = { 'chuck': 1 , 'annie': 42, 'jan': 100}
for key in counts:
    if counts[key] > 10:
        print [key], counts[key]

['jan'] 100
['annie'] 42


If you want to print the keys in alphabetical order, you first make a list of the keys in the dictionary using the keys method available in dictionary objects, and then sort that list and loop through the sorted list

In [11]:
counts = { 'chuck': 1 , 'annie': 42, 'jan': 100}
lst = counts.keys()
print lst
lst.sort()
for key in lst:
    print key, counts[key]

['jan', 'chuck', 'annie']
annie 42
chuck 1
jan 100


## Text parsing

Since the Python split function looks for spaces and treats words as tokens separated by spaces, we would treat the words “soft!” and “soft” as different words and create a separate dictionary entry for each word. Also since the file has capitalization, we would treat “who” and “Who” as different words with different counts.

We can solve both these problems by using the string methods lower, punctuation, and translate we will use the "deletechars" parameter to delete all of the punctuation:

**string.translate(s, table[, deletechars])**

In [5]:
import string
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [None]:
import string

fname = raw_input('Enter the file name: ')
try:
    fhand = open(fname)
except:
    print 'File cannot be opened', fname
    exit()

counts = dict()
for line in fhand:
    line = line.translate(None, string.punctuation)
    line = line.lower()
    words = line.split()
    for i in words:
        if i not in words:
            counts[i] = 1
        else: counts[i] += 1

print counts

Looking through this output is still unwieldy and we can use Python to gives us exactly what we are looking for, but to do so, we need to learn about Python tuples