## Dictionaries

A dictionary is like a list, but more general. In a list, the positions (a.k.a. indices) have to be integers; in a dictionary the indices can be (almost) any type.

You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item

### An Example of Dictionary

Let us build a dictionary that maps English to Spanish words. In this dictionary the keys will be some english words and their corresponding values will be the corresponding spanish words

In [2]:
eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}

**Note:** 

* A dictionary is created using a curly bracket

* The values in a dictionary are indexed by keys. The keys are needed to be specified while creating a dictionary. 

* As an example, 'one':'uno' represents a key-value pair.

*What if we enclose some values (separated by commas) in a curly bracket? What type of object do we get?*

* *A dictionary with only keys.*
* *A dictionary with elements indexed by integer*
* *This object is not a dictionary*

In [2]:
d={'one', 'two', 'three'}

In [2]:
type(d)

set

In [5]:
d={1.2:'one', 2.75:'two'}

In [None]:
type(d)

### Retrieving elements from a dictionary

In [None]:
eng2sp['one']

In [None]:
eng2sp['two']

In [6]:
d[1.2]

'one'

### Slicing a dictionary

In [None]:
l = ['one', 'three']

In [None]:
for key in l:
    print(eng2sp[key])

In [None]:
d = dict()
for key in l:
    d[key] = eng2sp[key] #clone the dic to new dic
print(d)

In [None]:
#Alternate way
dict([(key, eng2sp[key]) for key in l])

### Adding Key-Value in a dictionary

In [None]:
eng2sp['four'] = 'cuatro'

In [None]:
eng2sp

### in Operator for Dictionary

The in operator works on dictionaries; it tells you whether something appears as a key in the dictionary

In [None]:
'one' in eng2sp   #only works for keys present in the dictionary

In [None]:
'uno' in eng2sp

### Counting the Frequency Distribution of Letters in a Word

In [7]:
#Frequency distriution of the letters in 'banana'
word = "banana"
d = {}  # empty dictionary.

for letter in word:
    if letter not in d:
        d[letter] = 1  # entering the letter as key and its count as value.
    else:
        d[letter] += 1 # if the letter is present, then increment the value by 1
print(d)

{'b': 1, 'a': 3, 'n': 2}


**The 'get' method for dictionary**

 Dictionaries have a method called get that takes a key and a default value. If the key appears in the dictionary, get returns the corresponding value; otherwise it returns the default value. 

In [3]:
#For example,
eng2sp.get('one',0) #Return 0 as default value

'uno'

In [None]:
eng2sp.get('five',0)

**Making the counting simple using the 'get' method**

In [1]:
word = "banana"
d = {}  # empty dictionary.
for letter in word:
    d[letter] = d.get(letter,0) + 1

print(d)

{'b': 1, 'a': 3, 'n': 2}


### A Common Use of Dictionaries

One of the common uses of a dictionary is to count the occurrence of words in a ﬁle with some written text.

In [22]:
#Exercise: Read the Ashop1.txt file
import os
os.chdir("C:\\Users\\DELL\\Desktop\\Python\\27022018")
ashop1 = open('Ashop1.txt')

In [23]:
d = {}
for line in ashop1:
    line = line.lower()
    l = line.split()
    for word in l:
        d[word] = d.get(word,0)+1
print(d)

{'the': 7, 'cock': 3, 'and': 3, 'pearl': 2, 'a': 5, 'was': 1, 'once': 1, 'strutting': 1, 'up': 1, 'down': 1, 'farmyard': 1, 'among': 1, 'hens': 1, 'when': 1, 'suddenly': 1, 'he': 2, 'espied': 1, 'something': 1, 'shinning': 1, 'amid': 1, 'straw': 2, 'ho': 2, 'quoth': 2, 'thats': 1, 'for': 3, 'me': 2, 'soon': 1, 'rooted': 1, 'it': 2, 'out': 2, 'from': 1, 'beneath': 1, 'what': 1, 'did': 1, 'turn': 1, 'to': 2, 'be': 2, 'but': 2, 'that': 3, 'by': 1, 'some': 1, 'chance': 1, 'had': 1, 'been': 1, 'lost': 1, 'in': 1, 'yard': 1, 'you': 1, 'may': 1, 'treasure': 1, 'master': 1, 'men': 1, 'prize': 2, 'you,': 1, 'i': 1, 'would': 1, 'rather': 1, 'have': 1, 'single': 1, 'barley-corn': 1, 'than': 1, 'peck': 1, 'of': 1, 'pearls': 1, 'precious': 1, 'things': 1, 'are': 1, 'those': 1, 'can': 1, 'them': 1}


**Exercise:** Find the frequency distribution of the of the words present in this file. 

*(A hint is given at the bottom. Look at it only if you are no more able to figure out how to do it)*

In [24]:
#Printing properly
for key in d:
    print(d[key],key)

7 the
3 cock
3 and
2 pearl
5 a
1 was
1 once
1 strutting
1 up
1 down
1 farmyard
1 among
1 hens
1 when
1 suddenly
2 he
1 espied
1 something
1 shinning
1 amid
2 straw
2 ho
2 quoth
1 thats
3 for
2 me
1 soon
1 rooted
2 it
2 out
1 from
1 beneath
1 what
1 did
1 turn
2 to
2 be
2 but
3 that
1 by
1 some
1 chance
1 had
1 been
1 lost
1 in
1 yard
1 you
1 may
1 treasure
1 master
1 men
2 prize
1 you,
1 i
1 would
1 rather
1 have
1 single
1 barley-corn
1 than
1 peck
1 of
1 pearls
1 precious
1 things
1 are
1 those
1 can
1 them


Note: The printing is not in any particular order

In [25]:
#Printing words with count greater than or equal to 2
for key in d:
    if d[key] >= 2:
        print(d[key],key)

7 the
3 cock
3 and
2 pearl
5 a
2 he
2 straw
2 ho
2 quoth
3 for
2 me
2 it
2 out
2 to
2 be
2 but
3 that
2 prize


In [26]:
#Printing keys in alphabetical order:

#But before that....
#Method keys - makes a dict of keys
d.keys()

dict_keys(['the', 'cock', 'and', 'pearl', 'a', 'was', 'once', 'strutting', 'up', 'down', 'farmyard', 'among', 'hens', 'when', 'suddenly', 'he', 'espied', 'something', 'shinning', 'amid', 'straw', 'ho', 'quoth', 'thats', 'for', 'me', 'soon', 'rooted', 'it', 'out', 'from', 'beneath', 'what', 'did', 'turn', 'to', 'be', 'but', 'that', 'by', 'some', 'chance', 'had', 'been', 'lost', 'in', 'yard', 'you', 'may', 'treasure', 'master', 'men', 'prize', 'you,', 'i', 'would', 'rather', 'have', 'single', 'barley-corn', 'than', 'peck', 'of', 'pearls', 'precious', 'things', 'are', 'those', 'can', 'them'])

In [27]:
type(d.keys())

dict_keys

In [28]:
keys = list(d.keys())  #putting the keys into list
keys.sort()   # sort the list
for key in keys:
    if d[key] >= 2:
        print(d[key],key)

5 a
3 and
2 be
2 but
3 cock
3 for
2 he
2 ho
2 it
2 me
2 out
2 pearl
2 prize
2 quoth
2 straw
3 that
7 the
2 to


### Advanced Text Parsing

The actual text for this particular Ashop's fable is given in the file Ashop.txt. The actual file has lots of punctuations. We should also take care of the case sensitivity.

In [29]:
#Before we do so lets look at some other thing.

#1. punctuation
import string
p = string.punctuation
p

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [30]:
#2. translate method for string
# str.maketrans()

#This uses the 3-argument version of str.maketrans with arguments (x, y, z) where 'x' and 'y'
# must be equal-length strings and characters in 'x' are replaced by characters in 'y'. 'z'
# is a string (string.punctuation here) where each character in the string is mapped to None

s = "wow!"
trans = str.maketrans("w", "W", "!") #Replace 'w' by 'W' and '!' is mapped to none
s = s.translate(trans)
print(s)

WoW


In [31]:
string = "Wow! isn't today a great day???"
trans = str.maketrans("", "", p)
string = string.translate(trans)
print(string)

Wow isnt today a great day


In [None]:
type(p)

In [33]:
import os
os.chdir("C:\\Users\\DELL\\Desktop\\Python\\27022018")
ashop1 = open('Ashop.txt')
d = {}
trans = str.maketrans("", "", p)
for line in ashop1:
    line = line.translate(trans)
    line = line.lower()
    l = line.split()
    for word in l:
        d[word] = d.get(word,0)+1
for key in d:
    print(d[key],key)

7 the
3 cock
3 and
2 pearl
5 a
1 was
1 once
1 strutting
1 up
1 down
1 farmyard
1 among
1 hens
1 when
1 suddenly
2 he
1 espied
1 something
1 shinning
1 amid
2 straw
2 ho
2 quoth
1 thats
3 for
2 me
1 soon
1 rooted
2 it
2 out
1 from
1 beneath
1 what
1 did
1 turn
2 to
2 be
2 but
3 that
1 by
1 some
1 chance
1 had
1 been
1 lost
1 in
1 yard
2 you
1 may
1 treasure
1 master
1 men
2 prize
1 i
1 would
1 rather
1 have
1 single
1 barleycorn
1 than
1 peck
1 of
1 pearls
1 precious
1 things
1 are
1 those
1 can
1 them


### Further Problems

**Problem 1**

Write a program that categorizes each mail message by which day of the week the commit was done.

    Sample Line: 
    From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

    Sample Execution: 
    python dow.py 
    Enter a file name: mbox-short.txt 
    {'Fri': 20, 'Thu': 6, 'Sat': 1}

In [4]:
import os
os.chdir("C:\\Users\\DELL\\Desktop\\Python\\12022018")

In [48]:
fhand = open('mbox-short.txt')
d= {}
for line in fhand:
    if line.startswith("From "):
        line = line.rstrip()
        line = line[line.find(" ",line.find("@"))+1:]
        line = line[:line.find(" ")]
        d[line] = d.get(line,0)+1
print(d)

{'Sat': 1, 'Fri': 20, 'Thu': 6}


In [33]:
help(line.rstrip)

Help on built-in function rstrip:

rstrip(...) method of builtins.str instance
    S.rstrip([chars]) -> str
    
    Return a copy of the string S with trailing whitespace removed.
    If chars is given and not None, remove characters in chars instead.



**Problem 2**

From where did they receive most of their messages

    Enter a file name: mbox.txt 
    zqian@umich.edu 195

In [46]:
fhand = open('mbox.txt')
d = {}  # empty dictionary.
for line in fhand:
    if line.startswith("From:"):
        line = line.rstrip()
        line = line[line.find(" ")+1:]
        d[line] = d.get(line,0) + 1
maxi  = d['stephen.marquard@uct.ac.za']
for i in d:
    if maxi < d[i]:
        maxi = d[i]
for j in d:
    if d[j] == maxi:
        print(j,d[j])

zqian@umich.edu 195


**Problem 3**

Write a program to record the domain name (instead of the address) where the message was sent from instead of who the mail came from (i.e. the whole e-mail address). At the end of the program print out the contents of your dictionary.

In [52]:
fhand = open('mbox-short.txt')
d= {}
for line in fhand:
    if line.startswith("From "):
        line = line.rstrip()
        line = line[line.find("@")+1:line.find(" ",line.find("@"))]
        d[line] = d.get(line,0)+1
print(d)

{'uct.ac.za': 6, 'media.berkeley.edu': 4, 'umich.edu': 7, 'iupui.edu': 8, 'caret.cam.ac.uk': 1, 'gmail.com': 1}
