# 1.1 Python Lists and Dictionaries

This section introduces Python lists, dictionaries and comprehensions.  The objects are foundational to Python and are used extensively.

## Introduction to Lists
Lists are *sequences* of objects.  A *sequence* is a "positionally ordered collection of other objects."  Lists can include any types of Python objects (and do not need to be homogeneous).

In [1]:
# Define a list and then show the list.  Note that the list contains objects of 
# different types (ints, floats, strings).
l1 = [1, 2, 3, 17.24, 967, 45, "dog", 'cat', [1, 2, 3]]
l1

[1, 2, 3, 17.24, 967, 45, 'dog', 'cat', [1, 2, 3]]

In [2]:
# element referencing, i, -i, 0-based.
l1[0], l1[1]

(1, 2)

In [3]:
# and can be negative (-1 starts at the "end" of the list)
l1[-1]

[1, 2, 3]

In [4]:
# Lists have "iterators" that allow you to easily iterate though the list elements
# without using explicit indices
for item in l1:
    print(item)

1
2
3
17.24
967
45
dog
cat
[1, 2, 3]


In [5]:
# as compared to the traditional indexing method ...
for j in range(len(l1)):
    print(l1[j])

1
2
3
17.24
967
45
dog
cat
[1, 2, 3]


In [6]:
# List slicing - as with string slicing, the slice 
# icludes elements i, i+1, i+2, up to, but not including j (and is a separate list)
l2 = l1[2:6]
l2

[3, 17.24, 967, 45]

In [7]:
# contatenation
l3 = l1 + l2 + [86.4, 91.8, 'pony']
l3

[1,
 2,
 3,
 17.24,
 967,
 45,
 'dog',
 'cat',
 [1, 2, 3],
 3,
 17.24,
 967,
 45,
 86.4,
 91.8,
 'pony']

In [8]:
# repetition
l4 = l3[:3]*3
l4

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [9]:
# Nested lists - this is a "list of lists."
l2 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
l2

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [10]:
# this one and the previous one are equivalent -- which is clearer (visually)?
l2 = [ [1, 2, 3]
     , [4, 5, 6]
     , [7, 8, 9]
     ]
l2

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [11]:
# to view the nested list (list of lists) in matrix form:
for r in l2:
    print(r)

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]


### List Mutability
This is IMPORTANT and causes beginners problems.  Unlike simple objects, lists are *mutable* in Python.

In [12]:
# Create a list and assign two references to that list
l1 = [1, 2, 3]
l2 = l1
l1, l2

([1, 2, 3], [1, 2, 3])

In [2]:
# Update the first element of l1
l1[0] = "The dog ate my homework."
# Now show both lists.  
l1, l2

(['The dog ate my homework.', 2, 3], ['The dog ate my homework.', 2, 3])

In [14]:
# compare to the similar actions with simple objects (which are immutable)
x = 123
y = x
x, y

(123, 123)

In [15]:
x = "The dog ate my homework."
x, y

('The dog ate my homework.', 123)

In [4]:
# compare with this version (reset the two lists first)
l1 = [1, 2, 3]
l2 = l1
l1, l2

([1, 2, 3], [1, 2, 3])

In [5]:
# Why was L2 changed last time but not this time?
l1 = ['a', 'b']
l1, l2

(['a', 'b'], [1, 2, 3])

### Sample List/Sequence Functions/Methods

In [7]:
# append
l = [1, 2, 3, 'cow']
l.append(0)
l

[1, 2, 3, 'cow', 0]

In [8]:
# pop
x = l.pop()
l, x

([1, 2, 3, 'cow'], 0)

In [9]:
l.remove?

In [10]:
# remove - try l.remove? and enter (to see the help)
l.remove(3)
l

[1, 2, 'cow']

In [11]:
#sort - Note that in the default version (with 'cow' in the list), you'll get an error
#  since the default sorting can't process numbers and strings together.  We will revisit
#  this limitation later.
l.sort()
l

TypeError: '<' not supported between instances of 'str' and 'int'

In [12]:
l2 = [1, 22, 3, 9, 2]
l2.sort()
l2

[1, 2, 3, 9, 22]

In [13]:
l

[1, 2, 'cow']

In [14]:
# reverse
l.reverse()
l

['cow', 2, 1]

### Range Objects and Lists

A nice (short) discussion on Python 3 iterators - https://stackoverflow.com/questions/22147757/iterators-in-python-3

In [15]:
r = range(10)
r, list(r)

(range(0, 10), [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [16]:
r = range(7, 24, 2)
r, list(r)

(range(7, 24, 2), [7, 9, 11, 13, 15, 17, 19, 21, 23])

In [17]:
# Range objects are often used as iterators for loop operations
for j in range(5):
    print(j)

0
1
2
3
4


In [18]:
str = "The dog ate my homework."
for j in range(len(str)):
    print (str[j])
# What exactly does the expression range(len(str)) return?

T
h
e
 
d
o
g
 
a
t
e
 
m
y
 
h
o
m
e
w
o
r
k
.


In [19]:
# Note that since strings are sequences, they also have iterators
for char in str:
    print(char)

T
h
e
 
d
o
g
 
a
t
e
 
m
y
 
h
o
m
e
w
o
r
k
.


In [20]:
l = ["one", 2, "three", 4, 'Five']
for j in range(len(l)):
    print(l[j])

one
2
three
4
Five


In [21]:
# Note that list objects also have built-in iterators
for item in l:
    print(item)

one
2
three
4
Five


## Introduction to List Comprehensions
This is an **important** concept in Python world -- comprehensions are your friends!

In [22]:
# Define a matrix
m = [ [1, 2, 3]
    , [4, 5, 6]
    , [7, 8, 9]
    ]
# Now, m is a list of 3 lists
# Equivalent to m = [[1, 2, 3], [4,5,6],[7,8,9]] (A list of three three-element lists)
m

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [23]:
# To extract the third row (as a list)
m[2]

[7, 8, 9]

In [24]:
# the second element of the third row
m[2][1]

8

In [25]:
# extract the third column using a standard programatic approach:
c = []
for i in [0, 1, 2]:
    c.append(m[i][2])
c

[3, 6, 9]

In [26]:
# more generally (works with any length (number of rows) matrix):
c = []
for i in range(len(m)):
    c.append(m[i][2])
c

[3, 6, 9]

In [27]:
# or, y using the list iterator rather than an index:
c = []
for r in m:
    c.append(r[2])
c

[3, 6, 9]

In [28]:
# or, equivalently - use a list comprehension
c = [r[2] for r in m]
c

[3, 6, 9]

In [29]:
# to extract the diagonal (upper left to lower right)
[m[i][i] for i in [0, 1, 2]]

[1, 5, 9]

In [30]:
# more generally ... (what does this mean "more generally?")
[m[i][i] for i in range(len(m))]

[1, 5, 9]

In [31]:
# even elements of the second column
[r[1] for r in m if r[1] % 2 == 0]
# added an "if condition" to the comprehension

[2, 8]

In [32]:
# 18 random die rolls
import random
[random.randint(1, 6) for i in range(18)]
# note that the iterator (i) is only used to iterate and isn't used in the expression.

[3, 2, 2, 5, 1, 1, 3, 5, 1, 3, 5, 1, 5, 1, 3, 1, 4, 3]

In [33]:
# 100 "craps" rolls (rolling a pair of dice in each roll)
rolls = [[random.randint(1,6), random.randint(1,6)] for i in range(100)]
rolls

[[2, 1],
 [4, 3],
 [3, 1],
 [3, 1],
 [1, 3],
 [5, 2],
 [2, 1],
 [5, 3],
 [3, 6],
 [5, 5],
 [3, 4],
 [4, 5],
 [5, 1],
 [5, 1],
 [3, 6],
 [6, 4],
 [5, 1],
 [5, 4],
 [5, 4],
 [2, 2],
 [2, 4],
 [4, 4],
 [5, 5],
 [5, 3],
 [3, 3],
 [6, 5],
 [5, 2],
 [3, 3],
 [1, 2],
 [5, 6],
 [5, 3],
 [1, 3],
 [4, 4],
 [6, 4],
 [3, 1],
 [6, 1],
 [3, 3],
 [5, 3],
 [3, 2],
 [2, 1],
 [5, 6],
 [2, 4],
 [5, 3],
 [2, 6],
 [3, 3],
 [6, 1],
 [6, 4],
 [2, 3],
 [6, 2],
 [6, 4],
 [5, 3],
 [3, 3],
 [5, 3],
 [3, 4],
 [2, 4],
 [3, 4],
 [2, 2],
 [2, 6],
 [6, 1],
 [3, 1],
 [1, 1],
 [2, 6],
 [6, 1],
 [2, 4],
 [3, 1],
 [3, 4],
 [3, 2],
 [4, 3],
 [3, 4],
 [3, 4],
 [6, 4],
 [2, 3],
 [3, 4],
 [4, 6],
 [1, 4],
 [4, 5],
 [3, 1],
 [5, 4],
 [4, 5],
 [5, 4],
 [5, 4],
 [1, 1],
 [4, 6],
 [2, 2],
 [1, 6],
 [6, 4],
 [6, 4],
 [2, 5],
 [6, 5],
 [5, 5],
 [5, 1],
 [3, 5],
 [5, 5],
 [4, 1],
 [3, 4],
 [6, 6],
 [5, 2],
 [1, 1],
 [5, 4],
 [4, 4]]

In [34]:
# Or, using Numpy (which is vectorized)
import numpy as np
rolls = [np.random.randint(1, 7, 2) for i in range(100)]
rolls
# notice that Numpy uses [low, high) and Random uses [Low, High] -- Look at the help to see this.

[array([4, 1]),
 array([6, 2]),
 array([3, 4]),
 array([2, 3]),
 array([1, 2]),
 array([4, 2]),
 array([1, 2]),
 array([6, 1]),
 array([2, 2]),
 array([3, 5]),
 array([6, 6]),
 array([4, 1]),
 array([3, 5]),
 array([1, 5]),
 array([4, 5]),
 array([4, 4]),
 array([6, 1]),
 array([1, 2]),
 array([5, 2]),
 array([3, 4]),
 array([4, 5]),
 array([1, 5]),
 array([4, 4]),
 array([3, 3]),
 array([6, 1]),
 array([6, 1]),
 array([1, 1]),
 array([3, 4]),
 array([6, 6]),
 array([3, 1]),
 array([1, 1]),
 array([3, 3]),
 array([2, 4]),
 array([1, 1]),
 array([5, 5]),
 array([3, 3]),
 array([6, 6]),
 array([3, 6]),
 array([4, 5]),
 array([5, 4]),
 array([3, 4]),
 array([5, 5]),
 array([2, 4]),
 array([6, 5]),
 array([6, 5]),
 array([4, 1]),
 array([2, 2]),
 array([4, 1]),
 array([3, 4]),
 array([6, 3]),
 array([4, 5]),
 array([2, 6]),
 array([4, 3]),
 array([1, 5]),
 array([1, 5]),
 array([2, 1]),
 array([1, 3]),
 array([3, 5]),
 array([6, 4]),
 array([1, 4]),
 array([3, 6]),
 array([6, 5]),
 array([

In [35]:
# Suppose you'd like to estimate the probability of rolling a '7' ...
#  (true prob. is 6/36=.1667)
import numpy as np
obs = 50000
p = len([r for r in [np.random.randint(1, 7, 2) for i in range(obs)] if sum(r) == 7])/obs
#
# Or, if you want to see it step-by-step - remember, think inside-to-outside, left-to-right.
#rolls = [np.random.randint(1, 7, 2) for i in range(obs)]
#sevens = [r for r in rolls if sum(r) == 7]
#p = len(sevens)/obs

"Estimate or prob. based on {:,d} samples: {:.4f}".format(obs,p)

'Estimate or prob. based on 50,000 samples: 0.1679'

### Sample List-based Data Structure and Accessing/Processing/Comprehensions

In [36]:
# creating a list to define a person
person = ["Tom Howard", 54, 6.0]

# creating a list of lists to define a team
people = [
    ["Tom Howard",          54,  6.0],
    ["Jane Grimm",          19,  4.9],
    ["Sam Brown",           25,  6.2],
    ["Sarah Joan Spade",    26, 5.25],
    ["Blaine Jones",        62,  5.8],
    ["Devin Callahan",      32, 5.92],
]

In [37]:
person

['Tom Howard', 54, 6.0]

In [38]:
people

[['Tom Howard', 54, 6.0],
 ['Jane Grimm', 19, 4.9],
 ['Sam Brown', 25, 6.2],
 ['Sarah Joan Spade', 26, 5.25],
 ['Blaine Jones', 62, 5.8],
 ['Devin Callahan', 32, 5.92]]

In [39]:
# How many people on the team
len(people)

6

In [40]:
# Print each person's name and age
for p in people:
    print("{:} is {:} years old".format(p[0], p[1]))
# In this statement, each p represents a "record" in the dataset and
# each element of p (it is a list) is an "attribute"

Tom Howard is 54 years old
Jane Grimm is 19 years old
Sam Brown is 25 years old
Sarah Joan Spade is 26 years old
Blaine Jones is 62 years old
Devin Callahan is 32 years old


In [42]:
# Create a list of all names
[p[0] for p in people]

['Tom Howard',
 'Jane Grimm',
 'Sam Brown',
 'Sarah Joan Spade',
 'Blaine Jones',
 'Devin Callahan']

In [44]:
# Create a list of all last names
[p[0].split()[-1] for p in people]

['Howard', 'Grimm', 'Brown', 'Spade', 'Jones', 'Callahan']

In [45]:
# Create a list of all ages
[p[1] for p in people]

[54, 19, 25, 26, 62, 32]

In [46]:
# Compute the average age
sum([p[1] for p in people])/float(len(people))

36.333333333333336

In [47]:
# Using a more human-friendly format
"The average age of the team members is {:.1f} years.".format(
    sum([p[1] for p in people])/float(len(people))
)
# Note that the previous examples create anonymous objects -- nothing
# persists after dispalaying the expressions and the memory is 
# marked for garbage collection.

'The average age of the team members is 36.3 years.'

In [51]:
# Find the oldest person.  Go though step-by-step, uncommenting lines to
# see the progress.
# max age
ages =[p[1] for p in people] 
max(ages)
# which is max? - 62
ages.index(62)
# who - person 4
people[4][0]

'Blaine Jones'

In [52]:
# all together
people[[p[1] for p in people].index(max([p[1] for p in people]))][0]

'Blaine Jones'

In [53]:
# Find the youngest person
people[[p[1] for p in people].index(min([p[1] for p in people]))][0]

'Jane Grimm'

In [54]:
# Supposed that we had defined variaables to store the "column numbers" -- the
#  indices into the internal lists for the given data items
name   = 0
age    = 1
height = 2
# Now, we can use these instead of the literals in all of the expressions
# For example.  
people[[p[age] for p in people].index(max([p[age] for p in people]))][name]
# now it's a little clearer that we're looking for the name of the oldest person
# Further, if we add or remove columns from the data set, can simply update the
# index variables and all of the expression code will still work as expected.


'Blaine Jones'

### Dictionaries

In [23]:
# A dictionary is similar to a list, but uses a 'key' rather than an integer
# index to specify 'location' in the dictionary.
# Dictionaries are very useful, but there's usually a bit of a learning curve
d = {"a": 123, "b": 345, "c": 789}
d

{'a': 123, 'b': 345, 'c': 789}

In [24]:
d['a'], d['b'], d['c']

(123, 345, 789)

In [25]:
my_key = 'a'
d[my_key]

123

In [26]:
for k in ['a', 'b','c']:
    print(d[k])

123
345
789


In [27]:
# all of the keys
d.keys()

dict_keys(['a', 'b', 'c'])

In [28]:
# as a list
list(d.keys())

['a', 'b', 'c']

In [29]:
# iterator -- note that the key is the iterator
for k in d:
    print(d[k])

123
345
789


In [30]:
d.values()

dict_values([123, 345, 789])

In [31]:
list(d.values())

[123, 345, 789]

In [32]:
# a dictionary of lists
dl = { 
     'one' : [1, 2, 3, 4, 5]
    ,'two' : [6, 7, 8, 9, 10]
    ,'three' : [127, 96, 455, 32, 5] 
}
dl

{'one': [1, 2, 3, 4, 5],
 'two': [6, 7, 8, 9, 10],
 'three': [127, 96, 455, 32, 5]}

In [33]:
for key in dl:
    print(dl[key])

[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]
[127, 96, 455, 32, 5]


In [34]:
for key in dl:
    for e in dl[key]:
        print(e)

1
2
3
4
5
6
7
8
9
10
127
96
455
32
5


In [35]:
student = {'name': 'Joe', 'class':'sr', 'grade':92}
student

{'name': 'Joe', 'class': 'sr', 'grade': 92}

In [36]:
student['grade']

92

In [38]:
# Nested dictionaries
students = {
    'Jane Doe'   : {'ID':'b0001','Gender': 'F','HW1':95,'HW2':87, 'HW3':92,'Exam1': 88,'Exam2':93,'FinalExam':90},
    'John Blue'  : {'ID':'b0002','Gender': 'M','HW1':55,'HW2':76, 'HW3':89,'Exam1': 77,'Exam2':82,'FinalExam':80},
    'Kim Tester' : {'ID':'b0003','Gender': 'F','HW1':80,'HW2':75, 'HW3':65,'Exam1': 70,'Exam2':75,'FinalExam':80},
    'Larry Black': {'ID':'b0004','Gender': 'M','HW1':90,'HW2':90, 'HW3':92,'Exam1': 95,'Exam2':85,'FinalExam':94},
    'Susan White': {'ID':'b0005','Gender': 'F','HW1':65,'HW2':52, 'HW3':85,'Exam1': 45,'Exam2':80,'FinalExam':82}
    }


In [39]:
# show all students
for student in students:
    print (student)

Jane Doe
John Blue
Kim Tester
Larry Black
Susan White


In [40]:
# Final Exams
for student in students:
    print ("{:} made a {:d} on the final exam".format(student, students[student]['FinalExam']))

Jane Doe made a 90 on the final exam
John Blue made a 80 on the final exam
Kim Tester made a 80 on the final exam
Larry Black made a 94 on the final exam
Susan White made a 82 on the final exam
