## Dictionaries
dictionaries are very similar to the associative container `map<T,K>` discussed in C++. They are also known as __hash tables__ in other languages, e.g. `perl`. The `{}` operator is used to create a `dict` object

In [3]:
months = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
print(months)
days = {}
for m in months:
    days[m] = int(input("# of days in {0}: ".format(m)))
print(days)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
# of days in january: 31
# of days in february: 28
# of days in march: 31
# of days in april: 30
# of days in may: 31
# of days in june: 30
# of days in july: 31
# of days in august: 31
# of days in september: 30
# of days in october: 31
# of days in november: 30
# of days in december: 31
{'january': 31, 'february': 28, 'march': 31, 'april': 30, 'may': 31, 'june': 30, 'july': 31, 'august': 31, 'september': 30, 'october': 31, 'november': 30, 'december': 31}


You can also create a dictionary by hand

In [4]:
dict1 = { 'a' : 1, 'b' : (1,2,3), 'c' : ['one','two'], 'd' : 'example', }
print(dict1)

{'a': 1, 'b': (1, 2, 3), 'c': ['one', 'two'], 'd': 'example'}


In [5]:
students = { 'rio': {'name':'john', 'age':23, 'id':123456}, 'nairobi':{'name':'susan', 'id':123123, 'age':21},  'tokyo':{'name':'maria', 'id':123651, 'age':24}, }
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}}


you can add a new value for a key

In [6]:
students['oslo'] = {'name':'', 'age':30, 'id':111111} 
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}, 'oslo': {'name': '', 'age': 30, 'id': 111111}}


If the key already is used, its value will be updated. This is similar to modifying elements of a list

In [7]:
students['oslo'] = {'name':'sergey', 'age':22} 
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}, 'oslo': {'name': 'sergey', 'age': 22}}


You can check if the dictionary contains a key

In [8]:
while True:
    name = input("name (press return to end): ")  
    if(name==''): break
    if name not in students:
        print("{0} not in the list. sorry.".format(name))
    else: 
        print("name: {0}\t age: {1}\t id: {2}".format(students[name]['name'], students[name]['age'], students[name]['id']))
    

name (press return to end): joe
joe not in the list. sorry.
name (press return to end): rio
name: john	 age: 23	 id: 123456
name (press return to end): 


This if-else structure is very common with dictionaries. so in pythion there is a dedicated method
```python
value = some_dict.get(key, default_value)
```

In [9]:
while True:
    name = input("name (press return to end): ")  
    if(name==''): break
    val = students.get(name, "not found")
    print(val)

name (press return to end): joe
not found
name (press return to end): yuo
not found
name (press return to end): 


### Keys are unique
- there can be only one value for a given key in a dict made of `key:value`
- if you need more values for a key, then what you want is a dictionary of `key:[value]` 

In [10]:
particles = { 'boson':['Z', 'gluon', 'W', 'photon'], 'meson':['pion', 'kaon'], 'quark':['u','d','s'], 'lepton':['electron', 'muon']}
particles

{'boson': ['Z', 'gluon', 'W', 'photon'],
 'meson': ['pion', 'kaon'],
 'quark': ['u', 'd', 's'],
 'lepton': ['electron', 'muon']}

In [11]:
particles['lepton'].append('tau')
particles

{'boson': ['Z', 'gluon', 'W', 'photon'],
 'meson': ['pion', 'kaon'],
 'quark': ['u', 'd', 's'],
 'lepton': ['electron', 'muon', 'tau']}

In [12]:
particles.keys()

dict_keys(['boson', 'meson', 'quark', 'lepton'])

In [14]:
particles['meson']

['pion', 'kaon']

### iterating over dict 
by default the iterator gives you the keys

In [8]:
for p in particles:
    print(p)

boson
meson
quark
lepton


You can also explicitly loop over keys

In [9]:
for k in particles.keys():
    print(k)

boson
meson
quark
lepton


In [10]:
for k in particles:
    print(particles[k])

['Z', 'gluon', 'W', 'photon']
['pion', 'kaon']
['u', 'd', 's']
['electron', 'muon', 'tau']


### acccessing values without keys
If you do not care about the kets but need all the values python provides with `values` function.

This operation is also called __flattening__.

In [13]:
all_vals=[]
for v in particles.values():
    print(v)
    all_vals.extend(v)
print(all_vals)

['Z', 'gluon', 'W', 'photon']
['pion', 'kaon']
['u', 'd', 's']
['electron', 'muon', 'tau']
['Z', 'gluon', 'W', 'photon', 'pion', 'kaon', 'u', 'd', 's', 'electron', 'muon', 'tau']


In [20]:
dic2 = { 123: (1,2,3), 'one': [1.2, 2.3] , (1,2): 'tuple'}
print(dic2)
for i in dic2:
    print( type(i), type(dic2[i]) )

{123: (1, 2, 3), 'one': [1.2, 2.3], (1, 2): 'tuple'}
<class 'int'> <class 'tuple'>
<class 'str'> <class 'list'>
<class 'tuple'> <class 'str'>


Same behavior can be obtained with a double loop

In [21]:
flat=[]
for v in particles.values():
    for i in v:
        flat.append(i)
print(flat)

['Z', 'gluon', 'W', 'photon', 'pion', 'kaon', 'u', 'd', 's', 'electron', 'muon', 'tau']


### Valid key types
- Keys must be hashable
    - immutable scalar type like int, float, string
    - tuples
- This means a unique identifier can be created based on your key.
- youn can check if a variable is hashable or not in python

In [33]:
hash('boson')

5680941550141782090

In [81]:
hash((2,3,2.4))

-8536403370352787292

In [82]:
hash(3.1234324)

284615736650468355

In [42]:
c = 2.9
hash(c)
dict3 = { c:'value of c', 5.9:'value of something'}
print(dict3)

c = 5.4
dict3[c] = 'new'
print(dict3)

c = 5.6
dict3[-3.4] = 'new val'
print(dict3)

{2.9: 'value of c', 5.9: 'value of something'}
{2.9: 'value of c', 5.9: 'value of something', 5.4: 'new'}
{2.9: 'value of c', 5.9: 'value of something', 5.4: 'new', -3.4: 'new val'}


In [83]:
hash([1,2.3])

TypeError: unhashable type: 'list'

In [40]:
d = [1,2,3]
hash(d)

TypeError: unhashable type: 'list'

## Set
- an unordered collection of __unique__ elements
- the natural example is the collection of the keys of a dictionary
- a set is created with `{}` or with the `set` function

consider our days dictionary that stores the days for each month

In [37]:
day_len = days.values()
day_len

dict_values([31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31])

we can create a set from this list

In [39]:
days_set = set( day_len)
days_set
print(days_set)
new_set = {1,2,3,4,1, 34, 3, 2, 34}
print(new_set)

{28, 30, 31}
{1, 2, 3, 4, 34}


### Sets and dictionaries for data analysis

In [44]:
import numpy as np

grades = np.random.randint(18,31, 63).tolist()
print(grades)

[23, 23, 23, 25, 18, 21, 27, 20, 30, 21, 30, 24, 18, 24, 22, 25, 26, 27, 29, 30, 18, 24, 30, 29, 23, 29, 24, 19, 30, 21, 27, 27, 25, 25, 28, 27, 20, 26, 20, 26, 25, 26, 29, 24, 22, 19, 25, 29, 25, 18, 26, 18, 24, 26, 29, 27, 21, 20, 23, 25, 20, 21, 21]


In [45]:
type(grades)

list

In [46]:
vals = set(grades)
print(vals)
grades.count(18)

{18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30}


5

In [47]:
data = {}
for v in vals:
    data[v] = grades.count(v)
    print("grade: {0}  frequency: {1}".format(v,data[v]))

grade: 18  frequency: 5
grade: 19  frequency: 2
grade: 20  frequency: 5
grade: 21  frequency: 6
grade: 22  frequency: 2
grade: 23  frequency: 5
grade: 24  frequency: 6
grade: 25  frequency: 8
grade: 26  frequency: 6
grade: 27  frequency: 6
grade: 28  frequency: 1
grade: 29  frequency: 6
grade: 30  frequency: 5


In [60]:
%matplotlib notebook
import matplotlib.pyplot as plt

plt.bar(list(data.keys()), list(data.values()), color='blue')
plt.title("distribution of grades in midterm")
plt.xlabel('grade')
plt.ylabel('frequency')

<IPython.core.display.Javascript object>

Text(0, 0.5, 'frequency')

You could also create a histogram using the `matplotlib.pyplot.histogram`

In [172]:
%matplotlib notebook
n, bins, patches = plt.hist(grades, bins=len(set(grades)),facecolor='green')
print(n)
print(bins)

<IPython.core.display.Javascript object>

[4. 3. 7. 7. 1. 8. 7. 3. 1. 6. 7. 6. 3.]
[18.         18.92307692 19.84615385 20.76923077 21.69230769 22.61538462
 23.53846154 24.46153846 25.38461538 26.30769231 27.23076923 28.15384615
 29.07692308 30.        ]


## Set operations
sets in python  support all mathematical operations for a mathematical set
try `help(set)` for all functionalities

In [173]:
a = {0,1,2,3,4,5,6,7,8,9}
even = {0,2,4,6,8}
odd = {1,3,5,7,9}
prime = {1,2,3,5,7,11,13,17,19}

In [174]:
a & even

{0, 2, 4, 6, 8}

In [175]:
even.intersection(prime)

{2}

In [176]:
a | prime

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 17, 19}

In [177]:
even.union(prime)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 13, 17, 19}

In [178]:
odd.difference(prime)

{9}

# Comprehensions for List, Set, Dict
- One of the lost-loved features of python
- allows concise operation on collections without too many loops
- output of the operation is a new collection

## List comprehension
the basic expression is
```python
[ expression for val in collection if some_condition ]
```
Let's consider this example where we want to analyse the results of the midterm test

In [50]:
import numpy as np
voti = np.random.randint(10,31, 100).tolist()
print(voti)

[21, 15, 13, 27, 19, 22, 27, 14, 28, 20, 24, 13, 14, 18, 17, 23, 21, 24, 14, 27, 12, 10, 29, 12, 21, 24, 17, 17, 25, 12, 18, 10, 11, 24, 20, 28, 27, 15, 29, 23, 15, 22, 15, 20, 15, 19, 23, 26, 20, 20, 26, 21, 14, 26, 13, 27, 21, 14, 25, 23, 21, 23, 18, 25, 30, 27, 25, 29, 29, 15, 12, 18, 29, 27, 30, 30, 27, 30, 17, 24, 27, 27, 25, 29, 15, 20, 10, 24, 26, 16, 13, 17, 20, 25, 24, 19, 30, 18, 18, 25]


The most basic question is how many people failed the exam.

you could  do simple counting

In [51]:
nfail = 0
for v in voti:
    if v <18:
        nfail+=1
print("# grades <18:  %2d"%(nfail))

# grades <18:  30


but in general having a list of information is more flexible for future analysis

In [52]:
failed = []
for v in voti:
    if v <18:
        failed.append(v)
print("# grades <18:  {0}".format(len(failed)))

# grades <18:  30


You note that you did the following sequence of operations
  - create a new empty list
  - iterate over existing objects
  - check some_condition on each object
  - if positive then add object to new list

In python this can be written concisely and in a natural language with what is called a __comprehension__.

In [54]:
new_failed  = [ v for v in voti if v<18 ]
good_grades = [ v for v in voti if v>=18 ]
print(len(new_failed),len(good_grades))

30 70


you can also also apply any function to each item 


In [55]:
def isodd(x):
    if x%2 != 0:
        return True
odds  = [ v for v in voti if isodd(v) ]
evens  = [ v for v in voti if not isodd(v) ]
print(len(odds))
print(len(evens))

import math
sqrts = [ math.sqrt(v) for v in voti]
print(sqrts[:10])

54
46
[4.58257569495584, 3.872983346207417, 3.605551275463989, 5.196152422706632, 4.358898943540674, 4.69041575982343, 5.196152422706632, 3.7416573867739413, 5.291502622129181, 4.47213595499958]


Manipulation with strings is also very easy

In [56]:
months

['january',
 'february',
 'march',
 'april',
 'may',
 'june',
 'july',
 'august',
 'september',
 'october',
 'november',
 'december']

In [117]:
monthkeys = [ m[:3].upper() for m in months ]
monthkeys

['JAN',
 'FEB',
 'MAR',
 'APR',
 'MAY',
 'JUN',
 'JUL',
 'AUG',
 'SEP',
 'OCT',
 'NOV',
 'DEC']

## Another example with comprehensions: motion of a body under gravity
we now revisit our program from last lecture to use comprehensions.

The orignal example [gravity1](../lec23/examples/gravity1.py) is reported here again

In [None]:
# initial conditions
g = 9.8
h = 0.
theta = (45./180.)*math.pi
v0 = 10.
dt=0.01

#compute velocity components
v0x = v0*math.cos(theta)
v0y = v0*math.sin(theta)
print("v0_x: %.3f m/s \t v0_y: %.3f m/s"%(v0x,v0y))

t=0.
x=[]
y=[]
xi=0
yi=h

while yi>=0:
    x.append(xi)
    y.append(yi)
    t+=dt
    xi=v0x*t
    yi=h+v0y*t-0.5*g*t*t

We can rewrite the computational part with comprehensions. Rather than computing the time at eah step, we first create a list of times to iterate over.

In [144]:
import numpy as np
import math

# initial conditions
g = 9.8
h = 0.
theta = (45./180.)*math.pi
v0 = 10.
dt=0.01

#compute velocity components
v0x = v0*math.cos(theta)
v0y = v0*math.sin(theta)
print("v0_x: %.3f m/s \t v0_y: %.3f m/s"%(v0x,v0y))

x0 = 0
y0 = h

def x(t):
    return x0+v0x*t

def y(t):
    return y0+v0y*t-0.5*g*t*t


# generate list of times for sampling
times = np.arange(0., 1000., 0.01).tolist() 

#print first 10 elements
print(times[:10])

# compute x(t_i)
xi = [ x(t) for t in times if y(t)>=0.]

# compute y(t_i)
yi = [ y(t) for t in times if y(t)>=0. ]

print( "total steps:\t %-4d"%len(xi))
print( "last x:\t\t %.2f"%xi[-1])
print( "last y:\t\t %.3f"%yi[-1])

v0_x: 7.071 m/s 	 v0_y: 7.071 m/s
[0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09]
total steps:	 145 
last x:		 10.18
last y:		 0.022


### Comprehension with dictionary
We now use a comprehension to invert our dict of months and days

In [57]:
days

{'january': 31,
 'february': 28,
 'march': 31,
 'april': 30,
 'may': 31,
 'june': 30,
 'july': 31,
 'august': 31,
 'september': 30,
 'october': 31,
 'november': 30,
 'december': 31}

In [59]:
inv_map = { i: [] for i in set(days.values()) }
print(inv_map)

for i in days:
    inv_map[days[i]].append(i)
print(inv_map)

{28: [], 30: [], 31: []}
{28: ['february'], 30: ['april', 'june', 'september', 'november'], 31: ['january', 'march', 'may', 'july', 'august', 'october', 'december']}
