<h1 id="tocheading">Table of Contents and Notebook Setup</h1>
<div id="toc"></div>

In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

# Tuple

Tuples are immutable, fixed length sequences of Python objects. They are defined using CIRCULAR brackets or no brackets at all.

In [2]:
tup = 4, 5, 6
tup

(4, 5, 6)

We can nest tuples (we can do this with a variety of data types). What we have here is a tuple of tuples:

In [3]:
nested_tup = (3, 1, 2), (1, 2)
nested_tup

((3, 1, 2), (1, 2))

We can convert any sequence or iterator to a tuple by invoking the method "tuple." Here we convert a list to a tuple:

In [4]:
tuple([4, 0, 6])

(4, 0, 6)

In [5]:
tuple('words')

('w', 'o', 'r', 'd', 's')

Elements of a tuple are accessed using square brackets:

In [6]:
tup = 4, 5, 6
tup[1]

5

The best part of python is that we can combine multiple data types into things like tuples. Here we have a list, a string, and a dictionary all combined into 1 tuple.

In [7]:
tup=tuple([[1,2], 'list', {'firstletter:' 'a', 'secondletter:' 'b'}])
tup

([1, 2], 'list', {'firstletter:a', 'secondletter:b'})

If an object inside a tuple is mutable, we can modify it in place. HOWEVER, once the tuple is created, we cannot modify which object is stored in which slot. See an example of each below.

In [8]:
tup[0]=[1,2,3]

TypeError: 'tuple' object does not support item assignment

In [9]:
tup[0].append(3)
tup

([1, 2, 3], 'list', {'firstletter:a', 'secondletter:b'})

What we did above is very subtle; it is important you understand these fundamental rules of the tuple data type.

Tuples have multiple other operations which we show off below.

In [10]:
(4,'tup',True)+([1,2], {'dict': 1})

(4, 'tup', True, [1, 2], {'dict': 1})

In [11]:
('bootsa', 'cotsa')*4

('bootsa', 'cotsa', 'bootsa', 'cotsa', 'bootsa', 'cotsa', 'bootsa', 'cotsa')

## Unpacking Tuples

We can extract the values of a tuple using commands like such

In [12]:
tup = 4,5,6
a, b, c = tup
print(a)
print(b)

4
5


Using this comma notation, it becomes very easy to swap variables in python.

In [13]:
a, b = b, a
print(a)

5


Suppose we have a sequence of tuples. It is very easy to iterate over these in a forloop.

In [14]:
seq = [(1,2,3), (4,5,6), (7,8,9)]
for a, b, c in seq:
    print(a+b+c)

6
15
24


Suppose we only want to extract a few values from a tuple but we don't care about the rest. Such an operstion can be completed as such (the _ variable is typically reserved for un-wanted values)

In [15]:
values = 1, 2, 3, 4, 5
a, b, *_ =values

a

1

The * symbol means collect the rest and the _ is the unwanted variable symbol. 

We can determine the size of a tuple using len.

In [16]:
values = 1, 2, 3, 4, 5
len(values)

5

We can also count the occurences of an object in a tuple using .count

In [17]:
values = ([1,2], [1,2], [1,2,3], 'size', True)

print(values.count([1,2]))
print(values.count(True))

2
1


# List

In contrast to tuples, lists are variable length and their contents can be modified in place.

In [18]:
list1 = [1, 2, 3, 4]
list1[2] = 4

list1

[1, 2, 4, 4]

Similar to how we can use the "tuple" function to convert a list to a tuple, we can use the "list" function to convert a tuple to a list.

In [19]:
list((1, 2, 3, 4))

[1, 2, 3, 4]

The list function is typically used to materialize an iterator or a generator (these are two other python objects that have their own special behaviours)

In [20]:
gen = range(10)
gen

range(0, 10)

In [21]:
seq = list(gen)
seq

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

## Adding or Removing Elements

Elements can be appended to  the end of a list using the .append method. We can specify a certain location using the .insert method.

In [22]:
seq = list(range(10))
seq.append(11)
seq

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11]

In [23]:
seq.insert(10,'ten')
seq

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 'ten', 11]

(Note that the insert method uses a lot more computation power then the append method. Use this information wisely when dealing with large amounts of data). <br/>

Somewhat similar to the Stack object in Java, we have a pop method that removes and returns a value from a list.

In [24]:
seq.pop(10)

'ten'

In [25]:
seq

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11]

The remove method removes elements by their value (as opposed to their index). It first locates the value and then deletes it from the list.

In [26]:
elements = [True, [1,2], (1,2,3), 'swag']
elements.remove([1,2])
elements

[True, (1, 2, 3), 'swag']

## Concatenating and Combining Lists

The following outlines two different ways to combine two lists together:

In [27]:
['Python', 'rocks!']+[2, True, 'dude']

['Python', 'rocks!', 2, True, 'dude']

In [28]:
x = ['Python', 'rocks!']
x.extend([2, True, 'dude'])
x

['Python', 'rocks!', 2, True, 'dude']

The addition of lists is computationally more expensive since it requires creating a new list and then copying the two lists over. If you don't require the original lists after operation (say you're building up a large list) then the second method is the best.

## Sorting


Python has many different ways to sort a variety of different lists.

In [29]:
a = [7, 2, 4, 3, 6]
a.sort()
a


[2, 3, 4, 6, 7]

Python gives us the freedom to use a sort key; an option that lets you specify a function. This function is applied to all the values in the list and then they are sorted in increasing numerical order. For example, with the "len" function:

In [30]:
a = ['se', 's', 'seess', 'ssss']
a.sort(key=len)
a

['s', 'se', 'ssss', 'seess']

We don't need to use len as our key; depending on a purposes we could put any function we want (maybe a key that counts the number of occurances of the letter 's' in the word)

In [31]:
def count_s(string):
    return string.count('s')

a = ['sss', 'seeee', 'ss', 'seeesss']
a.sort(key=count_s)
a

['seeee', 'ss', 'sss', 'seeesss']

## Binary Search and Maintaining a Sorted List


Suppose we have a sorted list that will continuously grow in size. If we want the list to remain sorted, we need a way to ensure that the new values are placed in a sorted order. For this we use the "bisect" module.

In [32]:
import bisect
c = [1, 2, 2, 4, 5, 6, 6, 7, 8]
bisect.bisect(c,3) #finds the location where element "3" should be inserted

3

In [33]:
bisect.insort(c,3)#inserts the element 3 into the appropriate location in "c"
c

[1, 2, 2, 3, 4, 5, 6, 6, 7, 8]

Note that the bisect methods don't actually check whether or not the list is actually sorted, so it is imperative that you work only with sorted lists when using this module.

## Slicing

Slicing involves selecting the parts of lists that we are interested in. Rather than explain what every bit of notation does, I will define a sequence and then show what happens when we use a variety of different slices.

In [34]:
seq = [7, 2, 4, 5, 6, 1, 7, 8]

In [35]:
seq[1:5] #elements from index 1 to index 5

[2, 4, 5, 6]

In [36]:
seq[:5] #elements from beginning to index 5

[7, 2, 4, 5, 6]

In [37]:
seq[4:] #elements from index 4 to end

[6, 1, 7, 8]

We can also use negative indices when slicing. Negative indices are relative from the end of the list.

In [38]:
seq[-2:] #from second last index to the end

[7, 8]

We can use two colons and then specify a value "n" to take every nth element of the sequence

In [39]:
seq[::3]

[7, 5, 7]

In [40]:
seq[1::3] #starts at index 1 and takes every 3rd element.

[2, 6, 8]

We can also reverse the order of a list like such

In [41]:
seq[::-1]

[8, 7, 1, 6, 5, 4, 2, 7]

# Built in Sequence Functions

Python has a variety of built in sequence functions. It is worth familiarizing yourself with these functions and using them at any opportunity.

## enumerate

When iterating over a sequence you often want to keep track of the index you are on. In most languages, this would be accomplished as such:

In [42]:
collection = [1, 4, 5, 7, 3]
i = 0
for item in collection:
    print('The value at index '+ str(i)+ ' is '+ str(item))
    i = i+1

The value at index 0 is 1
The value at index 1 is 4
The value at index 2 is 5
The value at index 3 is 7
The value at index 4 is 3


Python has a built in function called enumerate which returns a sequence of (i, item) tuples from the collection. This should always be used instead of the previous method:

In [43]:
for i, item in enumerate(collection):
    print('The value at index '+ str(i)+ ' is '+ str(item))

The value at index 0 is 1
The value at index 1 is 4
The value at index 2 is 5
The value at index 3 is 7
The value at index 4 is 3


This technique is useful whenever we need the index of values in a list.

## sorted

The sorted <i> function </i> is distinct from the sort <i> method. </i> "sorted" returns a new sorted list from a previously unsorted list (the original list is left unchanged); "sort" is a list method that sorts the list in place.

In [44]:
a = [3, 1, 7, 2]
b = sorted(a)

print(a)
print(b)

[3, 1, 7, 2]
[1, 2, 3, 7]


In [45]:
a = [3, 1, 7, 2]
a.sort()
a

[1, 2, 3, 7]

This is a good example which outlines the difference between a <i> method </i> and a <i> function. </i>

## zip

"zip" is a crucial sequence function that zips together a number of lists, tuples, or other sequences to create a list of tuples.

In [46]:
seq1 = ['one', 'two', 'three']
seq2 = [1, 2, 3]
zipped = zip(seq1, seq2)
list(zipped)

[('one', 1), ('two', 2), ('three', 3)]

Even if the three sequenes are different lengths, "zip" will simply concatenate to the shortest length:

In [47]:
seq1 = ['one', 'two', 'three']
seq2 = [1, 2, 3]
seq3 = [True, False]
zipped = zip(seq1, seq2, seq3)
list(zipped)

[('one', 1, True), ('two', 2, False)]

A common use of "zip" includes iterating over multiple sequences at once. Mixing this with "enumerate" leads to some pretty neat stuff.

In [48]:
for i, (a,b) in enumerate(zip(seq1, seq2)):
    print('Index: {0}, Seq1 Element: {1}, Seq2 Element: {2}'.format(i, a, b))

Index: 0, Seq1 Element: one, Seq2 Element: 1
Index: 1, Seq1 Element: two, Seq2 Element: 2
Index: 2, Seq1 Element: three, Seq2 Element: 3


As a side note, notice that format(i, a, b) inserts "i" "a" and "b" in the order of the {0} {1} {2} in the string.

"zip" can also be used to unzip sequences of tuples like such:

In [49]:
names = [('Luke', 'Polson'), ('Stewart', 'Copeland'), ('Jordan', 'Peterson')]
first_names, last_names = zip(*names)
list(last_names)

['Polson', 'Copeland', 'Peterson']

The \* before a object in python "unpacks" the object. In this case, \*names=(Luke, Polson), (Stewart, Copeland), (Jordan Peterson). Zipping these three objects together leads to a sequence of both first and last names.

## reversed

The "reversed" <i> function </i> iterates over the elements of a sequence in reverse order

In [50]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Keep in mind that "reversed" returns a <i> generator </i> so it needs to be materialized by something like the "list" <i> function. </i> 

# dict

It is argued that "dict" is the most important python data structure. It is often also called a hash map or an associative array. A dict uses curly brackets. It works like such:

In [51]:
d1 = {'Lukes_num': 1, 'Sams_num': 2}

We access the elements of a dict using the keys specified.

In [52]:
d1['Lukes_num']

1

We can add new elements to a dict as such:

In [53]:
d1['Pauls_num'] = 7
d1

{'Lukes_num': 1, 'Pauls_num': 7, 'Sams_num': 2}

We can, of course, mix data types. In the example below, both the key and the data type stored are different than the preceding data types used.

In [54]:
d1[5] = [3, 4, 5]
d1

{'Lukes_num': 1, 'Sams_num': 2, 'Pauls_num': 7, 5: [3, 4, 5]}

To delete or pop values from the dict, we specify them from their key (as opposed to their value).

In [55]:
del d1['Pauls_num']
d1

{'Lukes_num': 1, 'Sams_num': 2, 5: [3, 4, 5]}

In [56]:
ret = d1.pop(5)
ret

[3, 4, 5]

Notice that square brackets were used for "del" but circular were used for pop (pop is a method.)

We can make lists of both the keys and values of a dict as such.

In [57]:
d1 = {'Lukes_num': 1, 'Sams_num': 2}
keys = list(d1.keys())
values = list(d1.values())
print(keys)
print(values)

['Lukes_num', 'Sams_num']
[1, 2]


We can update dictionaries (similar to extending lists) using the update <i> method. </i>

In [58]:
d1 = {'Lukes_num': 1, 'Sams_num': 2}
d1.update({'Pauls_num': 3, 4: [1,4,3]})
d1

{'Lukes_num': 1, 'Sams_num': 2, 'Pauls_num': 3, 4: [1, 4, 3]}

When updating we need to ensure that none of the keys are overlapping. If they are, then the key and value in the initial dictionary will be deleted.

## Creating dicts from sequences

Sometimes you might end up with two sequences that you want to pair up element-wise into a dictionary. Your first instinct might be to write code like this:

In [59]:
seq1 = ['word1', 'word2', 'word3']
seq2 = ['swag', 'boys', 'yee']

mapping = {}
for key, value in zip(seq1, seq2):
    mapping[key] = value
    
mapping

{'word1': 'swag', 'word2': 'boys', 'word3': 'yee'}

Despite your clever use of the zip function, there is an easier way to do this using the dict <i> function </i> which returns a dict.

In [60]:
seq1 = ['word1', 'word2', 'word3']
seq2 = ['swag', 'boys', 'yee']

mapping = dict(zip(seq1, seq2))
mapping

{'word1': 'swag', 'word2': 'boys', 'word3': 'yee'}

Sometimes we want to retrieve a value from a certain key in a dictionary, and if the key doesn't exist, then we want to return something else (we essentially don't want an error to get thrown). Rather than writing...

In [61]:
defaultvalue='none'
if 'word1' in mapping:
    print(mapping['word1'])
else:
    print(defaultvalue)

swag


... we can simply use the following the "get" <i> method. </i>

In [62]:
defaultvalue='no key'

value1 = mapping.get('word1', defaultvalue)
value2 = mapping.get('no_key_of_this_type', defaultvalue)

print(value1)
print(value2)

swag
no key


If we leave the second argument blank and the key isn't included in the dict then None is returned:

In [63]:
value2 = mapping.get('no_key_of_this_type')
print(value2)

None


Suppose we wish to catagorize words based on their first letter and store this information in a list of lists. The naive thing to do would be...

In [64]:
words = ['apple', 'aardvark', 'baseball', 'bat', 'atom']
by_letter = {}

for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter]=[word]
    else:
        by_letter[letter].append(word)
by_letter

{'a': ['apple', 'aardvark', 'atom'], 'b': ['baseball', 'bat']}

dicts have a setdefault <i> method </i> exactly for this purpose. It works as such:

In [65]:
words = ['apple', 'aardvark', 'baseball', 'bat', 'atom']
by_letter = {}

for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
    
by_letter

{'a': ['apple', 'aardvark', 'atom'], 'b': ['baseball', 'bat']}

There is in fact an even easier way to do this. We simply need to import the defaultdict <i> class </i> from the collections <i> module. </i> 

In [66]:
from collections import defaultdict
by_letter = defaultdict(list) # the values in the dict are lists
for word in words:
    by_letter[word[0]].append(word)
dict(by_letter) # by_letter is a "defaultdict" so we convert it to a "dict" here

{'a': ['apple', 'aardvark', 'atom'], 'b': ['baseball', 'bat']}

## Valid dict key types

Valid dict key types generally need to be <u> immutable. </u> Valid dict value types can be nearly any python data type. The proper term here is <i> hashability; </i> keys need to be hashable. To check if something is hashable, simply use the hash function. Numbers, strings, tuples are hashable. Lists are not because they are mutable.

In [67]:
hash('word')

-7340175332943391004

In [68]:
hash((5, 6))

3713085962043070856

If you really want to use a list as a key in a dict, convert it to a tuple and use that as the key.

# set

A set is an unordered collection of unique elements. They're like dicts, except with no keys. You can define them using the set <i> function </i> on a list or simply just using curly braces.

In [69]:
set([1,1,1,5,6,7,6,8,4,1])

{1, 4, 5, 6, 7, 8}

In [70]:
{1,1,3,4,5,5,6,2,1}

{1, 2, 3, 4, 5, 6}

Notice that there are no repeated elements in a set This is similar to how we would list the items in a mathematical set. As a matter of fact, the set data structure supports operations like union, intersection, and other set functions.

In [71]:
a={1, 3, 4, 6, 7}
b={3, 6, 8, 9}
a.union(b)

{1, 3, 4, 6, 7, 8, 9}

In [72]:
a.intersection(b)

{3, 6}

We can also use a|b for union and a&b for intersection. There are a variety of different operations one can do on sets that are all documented online. When working with sets, you should have a cheat sheet open.

Like dicts, the objects in a set must be immutable. Thus list like elements must always be converted into tuples.

# List, Set, and Dict Comprehensions

<b> List Comprehensions </b> are one of the most loved features of python. They essentially allow you to filter through the elements of a collection (list, set, dict, ...) and create a new collection. The notation works as such:

In [73]:
collection = [3, 4, 6, 7, 9, 10, 12]
[i for i in collection if i%3==0]

[3, 6, 9, 12]

This is equivalent to the rather cumbersome alternative that one might use in other languages:

In [74]:
collection = [3, 4, 6, 7, 9, 10, 12]
result = []
for i in collection:
    if i%3 ==0:
        result.append(i)
result

[3, 6, 9, 12]

There are similar notations we use with dicts and sets.

In [75]:
dict1 = {'Letter1': 'a', 'Number1': 1, 'Letter2': 'b'}
dict_letters = {key: val for (key, val) in dict1.items() if ('Letter' in key)}
dict_letters

{'Letter1': 'a', 'Letter2': 'b'}

In [76]:
set1 = {3, 4, 6, 7, 9, 10, 12}
set_vals = {int(i/3) for i in set1 if i%3==0}
set_vals

{1, 2, 3, 4}

There also exists nested list comprehensions. For this we use the following notation.

In [77]:
some_tups = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
[x for tup in some_tups for x in tup]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Notice how we "bury" our way into the collections. This can be used to accomplish many things in python. If you start burying more than two or three levels, however, using such notation can be unclear and it is often better to use forloop notation at that point.