# Lesson 9: Dictionary

The dictionary data structures allows us to store multiple values in an object and look up the values by their key.

**TOC HERE** 

## What is a Dictionary?

A *dictionary* is like a list, but more general. In a list, the index positions have to be integers; in a dictionary, the indices can be (almost) any type.

You can think of a dictionary as a mapping between a set of indices (which are called *keys*) and a set of *values*. *Each key maps to a value*. The association of a key and a value is called a *key-value pair* or sometimes an *item*.

As an example, we'll build a dictionary that maps from English to Spanish words, so the keys and the values are all strings.

The function `dict` creates a new dictionary with no items. Because `dict` is the name of a built-in function, you should avoid using it as a variable name.

Video: Dictionaries - Part 1 

### <https://www.youtube.com/embed/yDDRMb-1cxI>

In [1]:
eng2sp = dict()
print(eng2sp)

{}


In [2]:
eng2sp = {}
print(eng2sp)

{}


The curly brackets, `{}`, represent an empty dictionary. To add items to the dictionary, you can use square brackets:

In [3]:
eng2sp['one'] = 'uno'
print(eng2sp)

{'one': 'uno'}


This output format is also an input format. For example, you can create a new dictionary with three items.

In [6]:
eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
print(eng2sp)

{'one': 'uno', 'two': 'dos', 'three': 'tres'}


Since Python 3.7x the order of key-value pairs is the same as their input order, i.e. dictionaries are now ordered structures.

But that doesn't really matter because the elements of a dictionary are never indexed with integer indices. Instead, you use the keys to look up the corresponding values:

In [7]:
print(eng2sp['two'])

dos


The key `'two'` always maps to the value "dos" so the order of the items doesn't matter.

If the key isn't in the dictionary, you get an exception:

In [6]:
print(eng2sp['four'])

KeyError: 'four'

The `len` function works on dictionaries; it returns the number of key-value pairs:

In [8]:
# number of key:value pairs inside our dictionary
len(eng2sp)

3

The `in` operator works on dictionaries; it tells you whether something appears as a key in the dictionary (appearing as a value is not good enough).

In [11]:
'one' in eng2sp # True
'uno' in eng2sp

False

To see whether something appears as a value in a dictionary, you can use the method `values`, which returns the values as a type that can be converted to a list, and then use the `in` operator:

In [12]:
vals = list(eng2sp.values())
'uno' in vals

True

The `in` operator uses different algorithms for lists and dictionaries. For lists, it uses a linear search algorithm. As the list gets longer, the search time gets longer in direct proportion to the length of the list. For dictionaries, Python uses an algorithm called a hash table that has a remarkable property: the `in` operator takes about the same amount of time no matter how many items there are in a dictionary. I won't explain why hash functions are so magical, but you can read more about it at <https://wikipedia.org/wiki/Hash_table>

### Exercise 1
Download a copy of the file <http://www.py4e.com/code3/words.txt>

Write a program that reads the words in *words.txt* and stores them as keys in a dictionary. It doesn't matter what the values are. Then you can use the `in` operator as a fast way to check whether a string is in the dictionary. 

In [19]:
def ex_09_01():
    count = 0 
    word_dict = {}
    fhand = open("words.txt")
    for line in fhand:
            words = line.split()
            for word in words:
                  count += 1 
                  if word in word_dict: continue
                  word_dict[word] = count
    print(word_dict)

    if "Writing" in word_dict:
          print("True")
    else: 
          print("False")
        
ex_09_01()

{'Writing': 1, 'programs': 2, 'or': 3, 'programming': 4, 'is': 5, 'a': 6, 'very': 7, 'creative': 8, 'and': 9, 'rewarding': 10, 'activity': 11, 'You': 12, 'can': 13, 'write': 14, 'for': 16, 'many': 17, 'reasons': 18, 'ranging': 19, 'from': 20, 'making': 21, 'your': 22, 'living': 23, 'to': 24, 'solving': 25, 'difficult': 27, 'data': 28, 'analysis': 29, 'problem': 30, 'having': 32, 'fun': 33, 'helping': 35, 'someone': 36, 'else': 37, 'solve': 38, 'This': 41, 'book': 42, 'assumes': 43, 'that': 44, '{\\em': 45, 'everyone}': 46, 'needs': 47, 'know': 49, 'how': 50, 'program': 52, 'once': 55, 'you': 56, 'program,': 60, 'will': 62, 'figure': 63, 'out': 64, 'what': 65, 'want': 67, 'do': 69, 'with': 70, 'newfound': 72, 'skills': 73, 'We': 74, 'are': 75, 'surrounded': 76, 'in': 77, 'our': 78, 'daily': 79, 'lives': 80, 'computers': 82, 'laptops': 85, 'cell': 87, 'phones': 88, 'think': 91, 'of': 92, 'these': 93, 'as': 95, 'personal': 97, 'assistants': 98, 'who': 99, 'take': 101, 'care': 102, 'things

In [5]:
# Counting words using idiom 
def ex_09_01(): 
    word_dict = {}
    fhand = open("words.txt")
    for line in fhand:
            words = line.split()
            for word in words:
                  word_dict[word] = word_dict.get(word, 0) + 1 
    print(word_dict)

    if "Writing" in word_dict:
          print("True")
    else: 
          print("False")
        
ex_09_01()

{'Writing': 1, 'programs': 2, 'or': 1, 'programming': 1, 'is': 2, 'a': 3, 'very': 2, 'creative': 1, 'and': 5, 'rewarding': 1, 'activity': 1, 'You': 1, 'can': 4, 'write': 1, 'for': 1, 'many': 2, 'reasons': 1, 'ranging': 2, 'from': 2, 'making': 1, 'your': 2, 'living': 1, 'to': 16, 'solving': 1, 'difficult': 1, 'data': 1, 'analysis': 1, 'problem': 2, 'having': 1, 'fun': 1, 'helping': 1, 'someone': 1, 'else': 1, 'solve': 1, 'This': 1, 'book': 1, 'assumes': 1, 'that': 4, '{\\em': 1, 'everyone}': 1, 'needs': 1, 'know': 2, 'how': 2, 'program': 1, 'once': 1, 'you': 4, 'program,': 1, 'will': 1, 'figure': 1, 'out': 1, 'what': 2, 'want': 1, 'do': 5, 'with': 2, 'newfound': 1, 'skills': 1, 'We': 2, 'are': 3, 'surrounded': 1, 'in': 2, 'our': 5, 'daily': 1, 'lives': 1, 'computers': 5, 'laptops': 1, 'cell': 1, 'phones': 1, 'think': 1, 'of': 5, 'these': 1, 'as': 1, 'personal': 1, 'assistants': 1, 'who': 1, 'take': 1, 'care': 1, 'things': 3, 'on': 2, 'behalf': 2, 'The': 1, 'hardware': 1, 'current-day': 

## Dictionary as a set of counters 

Video: Dictionaries Part 2 
### <https://youtu.be/LRSIuH94XM4>

Suppose you are given a string and you want to count how many times each letter appears. There are several ways you could do it:

1. You could create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional.

2. You could create a list with 26 elements. Then you could convert each character to a number (using the built-in function `ord`), use the number as an index into the list, and increment the appropriate counter.

3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item.

Each of these options performs the same computation, but each of them implements that computation in a different way.

An *implementation* is a way of performing a computation; some implementations are better than others. For example, an advantage of the dictionary implementation is that we don’t have to know ahead of time which letters appear in the string and we only have to make room for the letters that do appear.

Here is what the code might look like:

In [20]:
word = 'brontosaurus'
d = dict()
for c in word:
    if c not in d:
        d[c] = 1
    else:
        d[c] = d[c] + 1
print(d)

{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}


We are effectively computing a *histogram*, which is a statistical term for a set of counters (or frequencies).

The `for` loop traverses the string. Each time through the loop, if the character `c` is not in the dictionary, we create a new item with key `c` and the initial value 1 (since we have seen this letter once). If `c` is already in the dictionary we increment `d[c]`.

Here’s the output of the program:

The histogram indicates that the letters “a” and “b” appear once; “o” appears twice, and so on.

Dictionaries have a method called `get` that takes a key and a default value. If the key appears in the dictionary, `get` returns the corresponding value; otherwise it returns the default value. For example:

In [24]:
counts = {'chuck': 1, 'annie': 42, 'jan': 100}
print(counts.get('jan', 0))
print(counts.get('tim', 0))

100
0


We can use `get` to write our histogram loop more concisely. Because the `get` method automatically handles the case where a key is not in a dictionary, we can reduce four lines down to one and eliminate the `if` statement.

In [26]:
word = 'brontosaurus'
d = dict()
for c in word:
    d[c] = d.get(c,0) + 1 
print(d)

{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}


The use of the `get` method to simplify this counting loop ends up being a very commonly used “idiom” in Python and we will use it many times in the rest of the book. So you should take a moment and compare the loop using the `if` statement and `in` operator with the loop using the `get` method. They do exactly the same thing, but one is more succinct.

## Dictionaries in files 

One of the common uses of a dictionary is to count the occurrence of words in a file with some written text. Let's start with a very simple file of words taken from the text of *Romeo and Juliet*. 

For the first set of examples, we will use a shortened and simplified version of the text with no punctuation. Later we will work with the text of the scene with punctuation included. 

```
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
```

We will write a Python program to read through the lines of the file, break each line into a list of words, and then loop through each of the words in the line and count each word using a dictionary.

You will see that we have two `for` loops. The outer loop is reading the lines of the file and the inner loop is iterating through each of the words on that particular line. This is an example of a pattern called *nested loops* because one of the loops is the outer loop and the other loop is the inner loop.

Because the inner loop executes all of its iterations each time the outer loop makes a single iteration, we think of the inner loop as iterating “more quickly” and the outer loop as iterating more slowly.

The combination of the two nested loops ensures that we will count every word on every line of the input file.

In [1]:
fname = str(input('Enter the file name: '))
try: 
    fhand = open(fname) 
except:
    print('File cannot be opened:', fname)
    exit()

counts = dict()
for line in fhand:
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1 
        else:
            counts[word] +=1 
print(counts)

Enter the file name: words.txt
{'Writing': 1, 'programs': 2, 'or': 1, 'programming': 1, 'is': 2, 'a': 3, 'very': 2, 'creative': 1, 'and': 5, 'rewarding': 1, 'activity': 1, 'You': 1, 'can': 4, 'write': 1, 'for': 1, 'many': 2, 'reasons': 1, 'ranging': 2, 'from': 2, 'making': 1, 'your': 2, 'living': 1, 'to': 16, 'solving': 1, 'difficult': 1, 'data': 1, 'analysis': 1, 'problem': 2, 'having': 1, 'fun': 1, 'helping': 1, 'someone': 1, 'else': 1, 'solve': 1, 'This': 1, 'book': 1, 'assumes': 1, 'that': 4, '{\\em': 1, 'everyone}': 1, 'needs': 1, 'know': 2, 'how': 2, 'program': 1, 'once': 1, 'you': 4, 'program,': 1, 'will': 1, 'figure': 1, 'out': 1, 'what': 2, 'want': 1, 'do': 5, 'with': 2, 'newfound': 1, 'skills': 1, 'We': 2, 'are': 3, 'surrounded': 1, 'in': 2, 'our': 5, 'daily': 1, 'lives': 1, 'computers': 5, 'laptops': 1, 'cell': 1, 'phones': 1, 'think': 1, 'of': 5, 'these': 1, 'as': 1, 'personal': 1, 'assistants': 1, 'who': 1, 'take': 1, 'care': 1, 'things': 3, 'on': 2, 'behalf': 2, 'The': 1,

In our `else` statement, we use the more compact alternative for incrementing a variable. `counts[word] += 1` is equivalent to `counts[word] = counts[word] + 1`. Either method can be used to change the value of a variable by any desired amount. Similar alternatives exist for `-=`, `*=`, and `/=`.

When we run the program, we see a raw dump of all of the counts in unsorted hash order. (the romeo.txt file is available at <http://www.py4e.com/code3/romeo.txt>

In [1]:
fname = str(input('Enter the file name: '))
try: 
    fhand = open(fname) 
except:
    print('File cannot be opened:', fname)
    exit()

counts = dict()
for line in fhand:
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1 
        else:
            counts[word] +=1 
print(counts)

Enter the file name: romeo.txt
{'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1, 'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3, 'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1, 'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1, 'grief': 1}


It is a bit inconvenient to look through the dictionary to find the most common words and their counts, so we need to add some more Python code to get us the output that will be more helpful.

## Looping and Dictionaries

If you use a dictionary as the sequence in a `for` statement, it traverses the keys of the dictionary. This loop prints each key and the corresponding value:

In [3]:
counts = {'chuck':1, 'annie':42, 'jan':100}
for key in counts:
    print(key,counts[key])

chuck 1
annie 42
jan 100


Again, the keys are ordered.

We can use this pattern to implement the various loop idioms that we have described earlier. For example if we wanted to find all the entries in a dictionary with a value above ten, we could write the following code:

In [4]:
counts = {'chuck':1, 'annie':42, 'jan':100}
for key in counts: 
    if counts[key] > 10:
        print(key,counts[key])

annie 42
jan 100


The `for` loop iterates through the keys of the dictionary, so we must use the index operator to retrieve the corresponding *value* for each key.

We see only the entries with a value above 10.

If you want to print the keys in alphabetical order, you first make a list of the keys in the dictionary using the `keys` method available in dictionary objects, and then sort that list and loop through the sorted list, looking up each key and printing out key-value pairs in sorted order as follows:

In [9]:
counts = {'chuck':1, 'annie':42, 'jan':100}
lst = list(counts.keys())
lstv = list(counts.values())
print(lst, lstv)
lst.sort()
print(lst)
for key in lst:
    print(key, counts[key])

['chuck', 'annie', 'jan'] [1, 42, 100]
['annie', 'chuck', 'jan']
annie 42
chuck 1
jan 100


First you see the list of keys in non-alphabetical order that we get from the `keys` method. Then we see the key-value pairs in alphabetical order from the `for` loop.

## Advanced Text Parsing

In the above example using the file romeo.txt, we made the file as simple as possible by removing all punctuation by hand. The actual text has lots of punctuation, as shown below.

```
But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,
```

Since the Python `split` function looks for spaces and treats words as tokens separated by spaces, we would treat the words "soft!" and "soft" as *different* words and create a separate dictionary entry for each word.

Also since the file has capitalization, we would treat "who" and "Who" as different words with different counts.

We can solve both these problems by using the string methods `lower`, `punctuation`, and `translate`. The `translate` is the most subtle of the methods. Here is the documentation for `translate`:

`line.translate(str.maketrans(fromstr, tostr, deletestr))`

Replace the characters in `fromstr` with the character in the same position in `tostr` and delete all characters that are in `deletestr`. The `fromstr` and `tostr` can be empty strings and the `eletestr` parameter can be omitted.

We will not specify the `tostr` but we will use the `deletestr` parameter to delete all of the punctuation. We will even let Python tell us the list of characters that it considers "punctuation":

In [55]:
import string

fname = input("Enter the file name: ")
try: 
    fhand = open(fname)
except:
    print('File cannot be opened:', fname)
    exit()

counts = dict()
for line in fhand:
    line = line.rstrip()
    line = line.translate(line.maketrans('', '', string.punctuation))
    line = line.lower()
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1 
        else:
            counts[word] += 1 
print(counts)

Enter the file name: clown.txt
{'the': 7, 'clown': 2, 'ran': 2, 'after': 1, 'car': 3, 'and': 3, 'into': 1, 'tent': 2, 'fell': 1, 'down': 1, 'on': 1}


Part of learning the "Art of Python" or "Thinking Pythonically" is realizing that Python often has built-in capabilities for many common data analysis problems. Over time, you will see enough example code and read enough of the documentation to know where to look to see if someone has already written something that makes your job much easier.

Looking through this output is still unwieldy and we can use Python to give us exactly what we are looking for, but to do so, we need to learn about Python tuples. We will pick up this example once we learn about `tuples`.

## Debugging

As you work with bigger datasets it can become unwieldy to debug by printing and checking data by hand. Here are some suggestions for debugging large datasets:

**Scale down the input**: If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first n lines.

If there is an error, you can reduce n to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.

**Check summaries and types**: Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.

A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.

**Write self-checks**: Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a "sanity check" because it detects results that are "completely illogical".

Another kind of check compares the results of two different computations to see if they are consistent. This is called a "consistency check".

**Pretty print the output**: Formatting debugging output can make it easier to spot an error.

Again, time you spend building scaffolding can reduce the time you spend debugging.

## Glossary

**dictionary**: A mapping from a set of keys to their corresponding values.

**hashtable**: The algorithm used to implement Python dictionaries.

**hash function**: A function used by a hashtable to compute the location for a key.

**histogram**: A set of counters.

**implementation**: A way of performing a computation.

**item**: Another name for a key-value pair.

**key**: An object that appears in a dictionary as the first part of a key-value pair.

**key-value pair**: The representation of the mapping from a key to a value.

**lookup**: A dictionary operation that takes a key and finds the corresponding value.

**nested loops**: When there are one or more loops "inside" of another loop. The inner loop runs to completion each time the outer loop runs once.

**value**: An object that appears in a dictionary as the second part of a key-value pair. This is more specific than our previous use of the word "value".

Video: Counting Word Frequency using a Dictionary

### <https://www.youtube.com/watch?v=lLbyEYjU55A>

In [44]:
# first method from video 
fname = input('Enter file: ')
if len(fname) < 1: fname = 'clown.txt'
hand = open(fname)

di = dict()
for lin in hand:
    lin = lin.rstrip()
    #print(lin)
    wds = lin.split()
    #print(wds)
    for w in wds:
        #print(w)
        #print('**', w, di.get(w,-99))
        
        if w in di:
            di[w] += 1
            #print('**EXISTING**')
        else:
            di[w] = 1
            #print('***NEW***')
            
                    
        # if the key is not there, the count is zero
        oldcount = di.get(w,0)
        #print(w,'old',oldcount)
        newcount = oldcount + 1 
        di[w] = newcount
        #print(w,'new',newcount)
    #print(w, di[w])
print(di)


Enter file: 
{'the': 14, 'clown': 4, 'ran': 4, 'after': 2, 'car': 6, 'and': 6, 'into': 2, 'tent': 4, 'fell': 2, 'down': 2, 'on': 2}


In [54]:
# final method from video - using the idiom 
fname = input('Enter file: ')
if len(fname) < 1: fname = 'clown.txt'
hand = open(fname)

di = dict()
for lin in hand:
    lin = lin.rstrip()
    wds = lin.split()
    for w in wds:
        # idiom - retrieve/create/update/counter all in one line
        di[w] = di.get(w,0) + 1

# now we want to find the most common word
largest = -1
theWord = None
for k,v in di.items() : 
    if v > largest : 
        largest = v
        theWord = k # capture/remember the word that was largest
print(theWord, largest)


Enter file: 
the 7


## Exercise 2

Write a program that categorizes each mail message by which day of the week the commit was done. To do this look for lines that start with "From", then look for the third word and keep a running count of each of the days of the week. At the end of the program print out the contents of your dictionary (order does not matter).

Sample Line:

    From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
    
Sample Execution:

    python dow.py
    Enter a file name: mbox-short.txt
    {'Fri': 20, 'Thu': 6, 'Sat': 1}

In [74]:
fname = input('Enter file name: ')
if len(fname) < 1: fname = 'mbox-short.txt'
hand = open(fname)

try:
    hand = open(fname)
except:
    print('Wrong file name: ', fname)
    quit()

di = {}
for line in hand:
    words = line.split()
    if len(words) == 0 or words[0] != 'From': continue
    days = words[2]
    di[days] = di.get(days,0) + 1 
print(di)

### SAME RESULT FROM CLASS EXAMPLE BELOW

Enter file name: 
{'Sat': 1, 'Fri': 20, 'Thu': 6}


In [76]:
# from class on Saturday 19AUG23 example
file = open("mbox-short.txt")
days = {}
for line in file:
    line_split = line.split()
    if len(line_split) == 0 or line_split[0] != "From": continue
    #print(line_split)
    day = line_split[2]
    days[day] = days.get(day, 0) + 1
print(days)

{'Sat': 1, 'Fri': 20, 'Thu': 6}


## Exercise 3

    Write a program to read through a mail log, build a histogram using a dictionary to count how many messages have come from each email address, and print the dictionary.

```
Enter file name: mbox-short.txt
{'gopal.ramasammycook@gmail.com': 1, 'louis@media.berkeley.edu': 3,
'cwen@iupui.edu': 5, 'antranig@caret.cam.ac.uk': 1,
'rjlowe@iupui.edu': 2, 'gsilver@umich.edu': 3,
'david.horwitz@uct.ac.za': 4, 'wagnermr@iupui.edu': 1,
'zqian@umich.edu': 4, 'stephen.marquard@uct.ac.za': 2,
'ray@media.berkeley.edu': 1}
```

In [96]:
mbox_short = open('mbox-short.txt')
emails = {}
for e in mbox_short:
    e = e.split()
    if len(e) == 0 or e[0] != "From":
        continue
    indEmails = e[1]
    emails[indEmails] = emails.get(indEmails,0) + 1 
print(emails)

{'stephen.marquard@uct.ac.za': 2, 'louis@media.berkeley.edu': 3, 'zqian@umich.edu': 4, 'rjlowe@iupui.edu': 2, 'cwen@iupui.edu': 5, 'gsilver@umich.edu': 3, 'wagnermr@iupui.edu': 1, 'antranig@caret.cam.ac.uk': 1, 'gopal.ramasammycook@gmail.com': 1, 'david.horwitz@uct.ac.za': 4, 'ray@media.berkeley.edu': 1}


## Exercise 4

Accept and complete the assignment in the Github Classroom. Add code to the above program to figure out who has the most messages in the file. After all the data has been read and the dictionary has been created, look through the dictionary using a maximum loop (see Chapter 5: Maximum and minimum loops) to find who has the most messages and print how many messages the person has.

    Enter a file name: mbox-short.txt
    cwen@iupui.edu 5

    Enter a file name: mbox.txt
    zqian@umich.edu 195

In [101]:
def ex_09_04():
    file = open('mbox-short.txt')
    histogram = {}
    for line in file:
        lineSp = line.split()
        if len(lineSp) > 0 and lineSp[0] == "From":
            email = lineSp[1]
            histogram[email] = histogram.get(email,0) + 1 
            
    maxCount = max(histogram.values())
    for email in histogram:
        if histogram[email] == maxCount:
            print(email, maxCount)
            return
            
ex_09_04()            

cwen@iupui.edu 5


## Exercise 5

This program records the domain name (instead of the address) where the message was sent from instead of who the mail came from (i.e., the whole email address). At the end of the program, print out the contents of your dictionary.

```
python schoolcount.py
Enter a file name: mbox-short.txt
{'media.berkeley.edu': 4, 'uct.ac.za': 6, 'umich.edu': 7,
'gmail.com': 1, 'caret.cam.ac.uk': 1, 'iupui.edu': 8}
```

In [114]:
def ex_09_05():
    file = open('mbox-short.txt')
    histogram = {}
    for line in file:
        lineSp = line.split()
        if line.startswith("From"):
            mail = lineSp[1]
            ncount = mail.find("@")
            email = mail[ncount:]
            histogram[email] = histogram.get(email, 0) + 1 
    print(histogram)
ex_09_05()

{'@uct.ac.za': 12, '@media.berkeley.edu': 8, '@umich.edu': 14, '@iupui.edu': 16, '@caret.cam.ac.uk': 2, '@gmail.com': 2}


In [1]:
def ex_09_05():
    filename = input("Enter a file name: ")
    fhand = open(filename, 'r')
    
    domains = {}
    for line in fhand:
        if line.startswith("From "):
            email = line.split()[1]
            domain = email.split('@')[1]
            domains[domain] = domains.get(domain, 0) + 1 
    print(domains)
ex_09_05()

Enter a file name: mbox-short.txt
{'uct.ac.za': 6, 'media.berkeley.edu': 4, 'umich.edu': 7, 'iupui.edu': 8, 'caret.cam.ac.uk': 1, 'gmail.com': 1}
