# 1. Outline

### This Week

* How to store a bunch of things in an organized way
  * We will look at **sequences** (strings, lists, tuples) and **containers** (sets, dictionaries)
  * For each type we will see how to initialize them, put stuff into them and how to get stuff back out
  * Some types are [**mutable**](http://www.merriam-webster.com/dictionary/mutable) (lists, sets, dictionaries) and others are **immutable** (strings, tuples).
  * Some things you might want to store in an organized way
   * the names of all your cats
   * months of the year
   * student grades
* We will wrap up by showing a few more operators

This is the first introduction to [_data structures_](https://en.wikipedia.org/wiki/Data_structure). An important part of programming is storing your results. If you were just going to sum two numbers, there is little need to use programming; just bust out the calculator on your phone. If you are going to sum hundreds (or thousands) of pairs of numbers, then you need to know what to do with all those results. One of the key skills of a programmer is outlining the goals of a project, and then linking those to appropriate data structure(s). Much of this course deals with the many data structure options and their pros and cons.

# 2. Strings

We introduced strings last week. Recall that a string is created by enclosing characters in single or double quotes.  

In [43]:
s = 'a string of words'
s

'a string of words'

While `s` in the cell above is a string, `s` can also be viewed as a sequence of characters starting with `'a'` and ending with `'s'`.

Because `s` is a sequence of characters, it has a length.

In [44]:
len(s)

17

**Action**: Go back and count the number of characters in `s`. Compare your count to the result of `len(s)`.  Are spaces considered characters?

**Note**: Numbers don't have a length.

In [46]:
num = 489
len(num)

TypeError: object of type 'int' has no len()

If you wanted to know how many digits are in a particular integer, you could use this little trick.

In [47]:
num_string = str(num)
num_string

'489'

In [48]:
len(num_string)

3

**Action**: In the cell below, repeat the above trick on `87.34`, which is a float. How does python treat the decimal after converting to a string? If you used this trick to count digits in a number, would it be important to know if you were counting an integer or float?

## Operators on strings

A little more review of stuff we saw last week.

In [49]:
new_s = s + ' that is really long'
new_s

'a string of words that is really long'

In [50]:
num * 5

2445

In [51]:
num_string * 5

'489489489489489'

# 3. Indexing and Slicing

* You can access the elements of a sequence using **slicing**.
* Slicing allows you to select individual (or groups of) characters from a string based on their _positions_.
* These first examples are for strings, but similar syntax will be shown for lists and tuples later in this Notebook.
* Python is a zero-offset language
  * The first element of a sequence is located at position zero (0)
  * The second element of a sequence is located at position one (1)
  * **This is important to remember**
  * [Dijkstra's explanation of this subject](http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) (optional reading)
* Slicing is done using square brackets: `[]`

In [52]:
s

'a string of words'

In [53]:
print(s[0])
print(s[1])
print(s[2])
print(s[3])
print(s[4])

a
 
s
t
r


In [54]:
for i in range(len(s)): print(s[i])

a
 
s
t
r
i
n
g
 
o
f
 
w
o
r
d
s


The last element of a string with **n** characters is located at position **n-1**.

In [55]:
len(s)

17

In [56]:
s[16]

's'

Why is the last character at position 16, when there are 17 characters? Because python is zero-offset. (scroll back up a few cells to where this was introduced).

You can count from the back using negative indexes. 

In [57]:
s[-1]

's'

In [58]:
s[-2]

'd'

**Note**: Negative indexes come in handy when you want the last element in a sequence, but don't know the length of the sequence. The code below gets the same result, but is more complex.

In [59]:
s[len(s)-2]

'd'

A continuous block of characters can be sliced using the `:` notation.

In [60]:
s

'a string of words'

In [61]:
s[0:5]

'a str'

**Action**: Look at the cell above, and count the number of characters returned. At this point you're probably on board with the idea that python starts counting at zero (0)... although you may not like it! But then shouldn't the range `[0:5]` return six characters, not five???  Maybe, but that is not how you do it in python. For a python range, the upper bound is not included. If the above was translated into mathematical notation, it would be: $0 \le x < 5$. You read this as: "x is greater than or equal to 0, and less than 5." The optional reading from [Dijkstra](http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) may make more sense now.

Below is another example.

In [62]:
s[5:8]

'ing'

Omitting the number on the **left** side of the `:` implies "start counting at zero"; omitting the number on the **right** side of the `:` implies "stop counting at the end."

In [63]:
s[:10]

'a string o'

In [64]:
s[10:]

'f words'

ðŸ™Š

**Note**: All this stuff can be combined. So the following cell says to start five steps back from the end, and then count to the end.

In [65]:
s[-5:]

'words'

Remember:
* If either number is omitted in a range, it is assumed to be the beginning or end of the sequence
* The brackets are inclusive on the left but exclusive on the right

# 4. Mutable vs. Immutable

* The value of some objects can change
* Objects whose value can change are said to be "mutable"; objects whose value is unchangeable once they are created are called "immutable"
* An object's mutability is determined by its type
* Numbers, strings and tuples are immutable
* Dictionaries, sets and lists are mutable (we will get to these later in this Notebook)

In [66]:
s

'a string of words'

In [67]:
s[3]

't'

We can ask python to tell us what is in the `3` slot of a string, but we cannot change what is in the `3` slot.

In [68]:
s[3] = 'p'

TypeError: 'str' object does not support item assignment

The string is immutable, but that doesn't mean that the variable `s` cannot be reassigned to a different string.

In [69]:
s = 'a spring of words'
s

'a spring of words'

# 5. Lists

### Characteristics
* Flexible sequence object
 * Heterogeneous elements (recall strings only hold alphanumeric characters)
 * Nestable (you can put a list in a list)
* Ordered -> elements indexed from 0 to n-1
* Mutable (modifiable)
* Examples
 * Courses you've taken in grad school (i.e., you still have more courses to take, so you need a container that can be added to)
 * Points scored in each Seminoles football game in 2019 (i.e., the team has only played two games so far this season, more games will be added)
 * To do items (i.e., items are being added and removed from a to do list)

### Creation

Lists can be created by wrapping comma separated items with square brackets (`[]`).

In [70]:
x = [23, 89, 74, 90, 11, 68]
x

[23, 89, 74, 90, 11, 68]

In [71]:
type(x)

list

The elements of a list can be different types. The following list contains strings, lists, integers and floats.

In [72]:
y = ['dog', x, s, 87, 64.2]
y

['dog', [23, 89, 74, 90, 11, 68], 'a spring of words', 87, 64.2]

__Action__: Make sure you are comfortable with where all the stuff in the above cell came from.

An empty list can be created in the following ways:

In [73]:
z = []
z

[]

In [74]:
z = list()
z

[]

### Slicing

List slicing uses the same syntax and rules we saw for string slicing.

In [75]:
x

[23, 89, 74, 90, 11, 68]

In [76]:
x[1]

89

In [77]:
x[2:5]

[74, 90, 11]

In [78]:
x[-2]

11

### Nestable

In [79]:
nest = [[4, 8],[9,3],[5,7]]
nest

[[4, 8], [9, 3], [5, 7]]

**Note**: The above list named `nest` contains three other lists: `[4,8]`, `[9,3]` and `[5,7]`.

In [80]:
nest[1]

[9, 3]

In [81]:
nest[1][1]

3

**Action**: Take a close look at the previous two cells to understand what is being sliced out. The element in the `1` slot of `nest` is [9,3].  The syntax `nest[1][1]` then drills in to grab the element in the `1` slot of `[9,3]`, which is `3`. In the empty cell below make your own list that contains lists, and see if you can slice out particular values.

### Mutable

Elements within a list are mutable. Recall that the elements (i.e., characters) within a string are immutable.

In [82]:
x

[23, 89, 74, 90, 11, 68]

In [83]:
x[2]

74

In [84]:
x[2] = 'dog'
x

[23, 89, 'dog', 90, 11, 68]

In the above cell we replaced the integer 74 in the `2` slot of `x` with the string `'dog'`.

In the cells below we replace two elements of `x` with two elements from another list.

In [85]:
x[3:5]

[90, 11]

In [86]:
x[3:5] = ['cat', 'rat']
x

[23, 89, 'dog', 'cat', 'rat', 68]

### Operators

As we saw last week, the same operator can do different things when applied to different types of objects. Recall that `+` between two integers results in a sum, but `+` between two strings is concatenation.

In [87]:
a = ['new', 'stuff']
a

['new', 'stuff']

In [88]:
x + a

[23, 89, 'dog', 'cat', 'rat', 68, 'new', 'stuff']

In [89]:
a * 3

['new', 'stuff', 'new', 'stuff', 'new', 'stuff']

In [90]:
a == x

False

### Methods

In python (almost) everything is an object. A list is an object, as is a string, an integer, a float, etc. Objects have methods and attributes, and you can see these using the `dir()` function (this was introduced in **Section 6** of last week's Notebook). At this point in the course, we are simply using methods that others have  created; in a few weeks you will be writing your own methods. 

In [91]:
schools = ['FSU', 'FAMU']
print(dir(schools))

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


#### `append` and `extend`

These methods are used to tack new stuff onto an existing list.

In [92]:
schools

['FSU', 'FAMU']

In [93]:
schools.append('TCC')
schools

['FSU', 'FAMU', 'TCC']

In [94]:
schools.append(['USF', 'UM'])
schools

['FSU', 'FAMU', 'TCC', ['USF', 'UM']]

In [95]:
len(schools)

4

**Action**: The above cell says that `schools` has 4 elements. Why four instead of five? Answer: `schools` contains three strings: `'FSU'`,  `'FAMU'` and  `'TCC'`; and one list `['USF','UM']`. This is a total of four elements. 

In [96]:
schools.extend(['USF','UM'])
schools

['FSU', 'FAMU', 'TCC', ['USF', 'UM'], 'USF', 'UM']

In [97]:
len(schools)

6

**Action**: Review the last few cells and notice the subtle difference between `append` and `extend`. What are the 6 elements currently in `schools`?  Below is another example to help differentiate `append` and `extend`.

In [98]:
l = [34,56,78]
l

[34, 56, 78]

In [99]:
l.append(45)
l

[34, 56, 78, 45]

In [100]:
l.append(99)

In [101]:
l

[34, 56, 78, 45, 99]

In [102]:
l.extend([459,798,999])

In [103]:
l

[34, 56, 78, 45, 99, 459, 798, 999]

**Action**: `extend` can take any sequence style object. Recall that a string is a sequence. Before running the next two cells, what do you think will happen when you pass a string to `extend`?

In [104]:
animals = ['dog', 'cat']
animals

['dog', 'cat']

In [105]:
animals.extend('elephant')
animals

['dog', 'cat', 'e', 'l', 'e', 'p', 'h', 'a', 'n', 't']

#### `sort` and `reverse`

These two methods change the order of the elements within a list.

In [106]:
nums = [45, 2, 888, 16]
nums

[45, 2, 888, 16]

In [107]:
nums.reverse()
nums

[16, 888, 2, 45]

In [108]:
nums.sort()
nums

[2, 16, 45, 888]

In [109]:
nums.reverse()
nums

[888, 45, 16, 2]

**Note**: All the methods so far have been **in place** meaning that the method is changing the object itself.

#### Other methods

In [110]:
dups = [8, 3, 4, 5, 4, 4, 7, 9]
dups

[8, 3, 4, 5, 4, 4, 7, 9]

**`count`** counts the number of times a value appears in the list.

In [111]:
dups.count(4)

3

In [112]:
dups.count(99)

0

**`pop`** removes the last element from the list and returns that element.

In [113]:
dups.pop()

9

In [114]:
dups

[8, 3, 4, 5, 4, 4, 7]

#### `range`

**`range`** creates a sequence of integers. Note: it is NOT a method of a list.

__Aside__: I have briefly talked about the transition from Python2 to Python3. The vast majority of Python code runs fine in both versions, but `range` is a case where there was a noticeable change. In the old days, `range` created a `list` of integers based on the user's request. If it was a sequence of 10, 20 or even a 1000 values, this has very little impact on the performance of a modern computer. However, if you asked for a sequence of 1 million or 10 million numbers, the computer would have to immediately create and store all those numbers. The new version of `range` creates an _iterator_. An iterator only keeps track of the current value, the step size and the end value; therefore, no matter the length of the sequence, it only stores a few values. A major goal of the transition from Python2 to Python3 was to improve performance of the language.

In [115]:
help(range)

Help on built-in function range in module __builtin__:

range(...)
    range(stop) -> list of integers
    range(start, stop[, step]) -> list of integers
    
    Return a list containing an arithmetic progression of integers.
    range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0.
    When step is given, it specifies the increment (or decrement).
    For example, range(4) returns [0, 1, 2, 3].  The end point is omitted!
    These are exactly the valid indices for a list of 4 elements.



Let's break down this help explanation. First off, there are a few ways to call `range`. You can pass one, two or three values to it; the number of values passed dictates what `range` will return. As always, you can ignore stuff that starts with an underscore.

If you pass a single value, y, `range` interprets this as, "start counting at zero, and stop counting at y-1." Notice that ranges are inclusive of the start value, and exclusive of the end, similar to what we saw in slicing.

In [116]:
range(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

**Aside (cont.)**: So here is the change I was talking about. **If we were in Python3**, `range(10)` would return an object `range(0, 10)`. Calling `range` would simply create an object that can do stuff, but hasn't really done anything yet. To see all the values in the sequence, we need to use the `list` command to get the `range` object to actually spit out the numbers.

However, Python2 does not follow that process and creates the list above directly. It is good to note this in case you transition to Python3 at some point in the future. The code below returns the same result in both versions.

In [117]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

If you pass two values, *x* and *y*, the function interprets this as, "start counting at *x* and stop counting at *y-1*."

In [118]:
list(range(3, 10))

[3, 4, 5, 6, 7, 8, 9]

If you pass three values, *x*, *y* and *z*, the function interprets this as, "start counting at *x* and stop counting at *y-1*, with a step size of *z*."

In [119]:
list(range(3, 10, 2))

[3, 5, 7, 9]

**Action**: Go back and read the help again. Does the explanation make more sense? Notice that the text in the help gives an example, you can always try this out when reading help for this or other functions.

One final note: You may be asking yourself, "how does it 'improve performance' to make the user use two commands (i.e., `list` and `range`) when only range was needed before?" This intuition is correct; the particular _use case_ I showed here is kinda clunky. However, it was simply to introduce `range`. The more common way people use a sequence of numbers it to loop through them one at a time. For example, I want to iterate over the 10,000 trees in my forest sample and run some analysis on each one. I am simply using the sequence of 10,000 numbers to keep my place in line. We will discuss "loops" in an upcoming Notebook.

# 6. Tuples

### Characteristics

* Same as lists, except they're immutable.
* Examples
 * Days of the week (i.e., every week has seven days in a particular order; this is not going to change)
 * Courses you took in high school (i.e., since you are currently in college, your high school record will not be changing, [unless you're Ferris Bueller](https://youtu.be/Hh_vLKlz2Mc))
 * Points scored in each Seminoles football game in 2017 (i.e., that season has ended, so the scores won't change)

A couple of advantages of immutability. The ability for change means that a mutable object takes up more space in memory and is more difficult to create. In most cases, this difference between a list and tuple is imperceptible; but if you are creating 1 million objects, those small difference can add up. A second advantage is that a tuple can protect you from yourself (or from some person you share your code with). If you know the object should never change, then defining it a such ensures its stability. 

### Creation

Lists are created using square brackets, tuples are created using parentheses.

In [120]:
t = (85, 7, 19.2, 'dog', 'dog')
t

(85, 7, 19.2, 'dog', 'dog')

In [121]:
type(t)

tuple

**Note**: Technically you can even skip the parentheses when creating a tuple. My opinion is that the code is clearer when using the parentheses, but it is up to you.

In [122]:
t_alt = 85, 7, 19.2, 'dog', 'dog'
t_alt

(85, 7, 19.2, 'dog', 'dog')

In [123]:
t == t_alt

True

In [124]:
empty = ()   # an empty tuple
empty

()

### Methods

In [125]:
t

(85, 7, 19.2, 'dog', 'dog')

Since a tuple is immutable (unchangeable), you can see below that it cannot really do much to itself.

In [126]:
print(dir(t))

['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']


In [127]:
t.count('dog')

2

In [128]:
t.index(7)

1

In [129]:
t.index('cat')

ValueError: tuple.index(x): x not in tuple

### Immutable

Python will return an error if you try to change something.

In [130]:
t

(85, 7, 19.2, 'dog', 'dog')

In [131]:
t[2]

19.2

In [132]:
t[2] = 555

TypeError: 'tuple' object does not support item assignment

#### Mutability caveat

If a tuple contains any elements that are mutable, then we can change those elements.

In [133]:
mixed = (34, ['Arizona', 'Alabama', 'Alaska'], 'hurricane')
mixed

(34, ['Arizona', 'Alabama', 'Alaska'], 'hurricane')

In [134]:
mixed[1]

['Arizona', 'Alabama', 'Alaska']

In [135]:
mixed[1] = 'Florida'

TypeError: 'tuple' object does not support item assignment

In [136]:
mixed[1][2]

'Alaska'

In [137]:
mixed[1][2] = 'Florida'
mixed

(34, ['Arizona', 'Alabama', 'Florida'], 'hurricane')

**Action**: Tuple slicing is the same as list slicing. Review the previous few cells to see that we cannot replace the list `['Arizona', 'Alabama', 'Alaska']` with `'Florida'`, but we can replace the list element `'Alabama'` with `'Florida'`. The reason is that tuples are immutable, while lists are mutable.

### Converting between lists and tuples

You can use the functions `list` and `tuple` to convert between the two types of objects.

In [138]:
my_tup = ('a', 'b', 'c')
my_tup

('a', 'b', 'c')

In [139]:
type(my_tup)

tuple

In [140]:
my_list = list(my_tup)
my_list

['a', 'b', 'c']

In [141]:
type(my_list)

list

In [142]:
new_tup = tuple(my_list)
new_tup

('a', 'b', 'c')

In [143]:
type(new_tup)

tuple

# 7. Sets

### Characteristics

* A collection of unique unordered elements
  * Since the elements are unordered, there is no indexing (i.e, there is not a `0` or `1` slot in a set, so the elements cannot be extracted in the same way as a string, list or tuple)
  * Each element is unique (i.e., there are no duplicate elements in a set)
* Mutability
 * Sets are mutable (i.e., you can add or remove elements)
 * Elements of sets must be immutable types (i.e., strings, numbers and tuples are acceptable, but lists are not)
 * The system a set uses to keep track of its elements is called a hash table. This is useful for fast lookups, but the system relies on the elements not to change, hence the reason set *elements* must be immutable. The concept of a hash table is beyond the scope of this course, but you can [optionally read more about it](https://en.wikipedia.org/wiki/Hash_table)... and not to be confused with [hash on table](http://www.seriouseats.com/recipes/assets_c/2013/10/20131023-brussels-sprouts-kale-potato-hash-10-thumb-625xauto-360819.jpg). Note that dictionaries (later in this Notebook) also rely on hash tables.
* Allows heterogeneous elements 
* Use cases
 * Breeds of dog that are currently at an animal shelter (i.e., each day the shelter takes dogs in and adopts dogs out, but it is likely there are multiple dogs of the same breed); a set will keep track of all the unique breeds in the shelter.
 * The courses the students in this class have taken at FSU (i.e., if we wanted the aggregated pool of knowledge all the students in the class have, then we only need to track the unique classes taken).

### Creation

One way to create a set is to wrap comma separated elements with curly brackets.

In [144]:
s = {34, 10, 'cat', 99.2, (5,6)}
s

{10, 34, 99.2, 'cat', (5, 6)}

**Note**: The order we see for `s` is not the same as how we passed in the elements. You cannot count on a set to be in any particular order.

**Note**: Sets cannot contain mutable elements. Which element in the next cell is not legit?

In [145]:
s1 = {[5,6], 22, 'dog'}

TypeError: unhashable type: 'list'

Sets can also be created by passing any sequence to the `set` function.

In [146]:
num_list = [9, 4, 2, 1, 6]
s = set(num_list)
s

{1, 2, 4, 6, 9}

In [147]:
num_tup = (9, 4, 2, 1, 6)
s = set(num_tup)
s

{1, 2, 4, 6, 9}

In [148]:
quote = "We were somewhere around Barstow"
s = set(quote)
s

{' ', 'B', 'W', 'a', 'd', 'e', 'h', 'm', 'n', 'o', 'r', 's', 't', 'u', 'w'}

**Note**: A set contains unique elements. For example, there are six `"e"`s in the string `"We were somewhere around Barstow"`; so only one `"e"` appears in the set.

Below is another example showing that sets contain unique elements.

In [149]:
s = {8, 8, 8, 3, 3, 9, 9, 2, 1}
s

{1, 2, 3, 8, 9}

#### Empty set

Sets are not the only python object that uses curly brackets (`{}`). For this reason, to make an empty set you must use the `set` function. The other object that uses `{}` are dictionaries. If you just use `{}`, you will get an empty dictionary. Dictionaries will be introduced later in this Notebook.

In [150]:
empty = set()
empty

set()

In [151]:
type(empty)

set

In [152]:
empty = {}
empty

{}

In [153]:
type(empty)

dict

### Methods

In [154]:
s = {1, 4, 9, 2, 8}

In [155]:
print(dir(s))

['__and__', '__class__', '__cmp__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__iand__', '__init__', '__ior__', '__isub__', '__iter__', '__ixor__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__rand__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__rsub__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__xor__', 'add', 'clear', 'copy', 'difference', 'difference_update', 'discard', 'intersection', 'intersection_update', 'isdisjoint', 'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference', 'symmetric_difference_update', 'union', 'update']


#### `add` and `update`

These methods are used to include new elements in an existing set.

In [156]:
s

{1, 2, 4, 8, 9}

In [157]:
s.add(24)
s

{1, 2, 4, 8, 9, 24}

If you try to `add` an element that is already in the set, nothing happens; the set does not change, and no error is raised.

In [158]:
s.add(4)
s

{1, 2, 4, 8, 9, 24}

Use `update` when you want to add multiple values to the set.

In [159]:
s.update([24, 5, 88])
s

{1, 2, 4, 5, 8, 9, 24, 88}

`update` will complain if you try to add just one element.

In [160]:
s.update(444)

TypeError: 'int' object is not iterable

**Note**: There is a similarity to the methods we saw earlier for lists. `add` and `append` are similar, and `update` and `extend` are similar.

#### `discard` and `remove`

These are two methods for eliminating an item from a set.

In [161]:
s = {3, 4, 1, 9, 8}
s

{1, 3, 4, 8, 9}

In [162]:
s.remove(3)
s

{1, 4, 8, 9}

In [163]:
s.remove(888)

KeyError: 888

In [164]:
s.discard(4)
s

{1, 8, 9}

In [165]:
s.discard(888)
s

{1, 8, 9}

**Note**: The main difference is that `discard` will not raise an error if the item you're asking to remove is not in the set.

#### Set theory

Python sets can be manipulated like mathematical sets. You should read this basic [introduction to set theory](http://www.mathsisfun.com/sets/venn-diagrams.html). This [optional reading](https://en.wikipedia.org/wiki/Set_theory) goes into much more detail.

In [166]:
group1 = set([1, 2, 3, 4, 5])
group2 = set([2, 6, 7])

`union` combines two sets by keeping the unique elements from both sets.

In [167]:
group1.union(group2)

{1, 2, 3, 4, 5, 6, 7}

`intersection` combines two sets by keeping the elements both sets share.

In [168]:
group1.intersection(group2)

{2}

In [169]:
group3 = set([8, 9, 10])
group3.intersection(group1)

set()

`difference` combines two sets by keeping the elements in the first set that are not in the second.

In [170]:
group1.difference(group2)

{1, 3, 4, 5}

Notice that if you reverse the order to `difference`, you get a different result.

In [171]:
group2.difference(group1)

{6, 7}

In [172]:
s1 = set([1,2,3])
s2 = set(range(10))
print(s1)
print(s2)

set([1, 2, 3])
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


`issubset` returns `True` if all the elements in the first set are in the second set. If you change the order of the sets you can get a different answer.

In [173]:
s1.issubset(s2)

True

In [174]:
s2.issubset(s1)

False

**Note**: The above examples are a sample of the possible set theory operations available.

**Action**: For some set methods the order in which the sets are passed gives different results. Does order matter for `union` and `intersection`? The [introductory reading](http://www.mathsisfun.com/sets/venn-diagrams.html) will help you answer this question. Test your answer in the cell below.

# 8. Dictionaries

### Characteristics

* A collection of objects indexed by keys
 * Each **key** must be unique
 * Each key has one **value**
 * Think of unordered **key:value** pairs
* Access and storage is different from sequence types
 * that being said, you can think of lists and tuples as using "sequential" numeric keys
 * dictionaries can use other types as keys
* Mutability
 * Dictionaries are mutable (i.e., you can add and remove **key:value** pairs)
 * Keys must be immutable types (similar to sets, the keys are organized using a hash table)
 * Values can be any type
* Powerful, fast and efficient for some problems
* Use cases
 * Height of each student in an elementary school classroom (i.e., the **key** would be the student's name and the **value** the student's height; you can then look for the student by name and get back his/her height)
 * People assigned to each office in the department (i.e., the department has a set of offices that it can allocate; each office has one (or more) people assigned to it)
 * Address of every restaurant in the U.S. (i.e., **key** would be restaurant name, and the **value** its address; a dictionary allows you to use a **key** (restaurant name) to quickly find its value (address))

### Creation

A dictionary is enclosed in `{}` (like sets), but each element requires two objects separated by a `:`.

In [175]:
d = {'rat':'squeak', 'cat':'meow', 'dog':'bark'}
d

{'cat': 'meow', 'dog': 'bark', 'rat': 'squeak'}

Typically there is a substantive relationship between the key and the value, as in the example above.

In [176]:
d = {3:['a', 'b', 'c'], (4,7):87.5, 33.3:'one third'}
d

{3: ['a', 'b', 'c'], 33.3: 'one third', (4, 7): 87.5}

Above is an example to show that dictionary keys and values are very flexible. Although this example is difficult to read, notice that it still has a pattern: `{key: value, key: value, key: value}`.

Dictionaries can also be built one element at a time. We start with an empty dictionary, and then add elements.

In [177]:
d = {}
d

{}

In [178]:
d['FSU'] = 'Florida'
d

{'FSU': 'Florida'}

In the above example, `'FSU'` is the key and `'Florida'` is the value. Below we add more key:value pairs in the same way.

In [179]:
d['UGA'] = 'Georgia'
d

{'FSU': 'Florida', 'UGA': 'Georgia'}

In [180]:
d['ISU'] = 'Iowa'
d

{'FSU': 'Florida', 'ISU': 'Iowa', 'UGA': 'Georgia'}

In [181]:
d['FAMU'] = 'Florida'
d

{'FAMU': 'Florida', 'FSU': 'Florida', 'ISU': 'Iowa', 'UGA': 'Georgia'}

Dictionaries can also be built from a list of lists. Each two-element sub-list forms a key:value pair.

In [182]:
trees_inventory = [['oak', 10], ['elm', 12], ['pine', 8]]
trees_inventory

[['oak', 10], ['elm', 12], ['pine', 8]]

In [183]:
trees_dict = dict(trees_inventory)
trees_dict

{'elm': 12, 'oak': 10, 'pine': 8}

### Indexing

The strength of dictionaries is the indexing (or "lookups"). In a traditional dictionary (i.e., the book), what are the keys and what are the values? A list or tuple can only be indexed using the integer that corresponds to its position in the sequence. But a dictionary can be indexed by many different types of values. In the example below we index using the university's name. If we want to know where FSU is located we can simply pass in the string `'FSU'` and we get back its location. Again, `'FSU'` is the key and `'Florida'` is the value.

In [184]:
d['FSU']

'Florida'

We can use this notation to change the **value** associated with a **key** (see next cell).

In [185]:
d['FSU'] = 'close to Georgia'

In [186]:
d['FSU']

'close to Georgia'

In [187]:
d

{'FAMU': 'Florida', 'FSU': 'close to Georgia', 'ISU': 'Iowa', 'UGA': 'Georgia'}

**Action**: If you go back to the lists section (much earlier in this Notebook) you will see that indexing and changing a value in a list uses syntax similar to what we're using here for a dictionary.

### Methods

In [188]:
print(dir(d))

['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']


In [189]:
d.keys()

['UGA', 'FSU', 'ISU', 'FAMU']

In [190]:
d.values()

['Georgia', 'close to Georgia', 'Iowa', 'Florida']

In [191]:
d.items()

[('UGA', 'Georgia'),
 ('FSU', 'close to Georgia'),
 ('ISU', 'Iowa'),
 ('FAMU', 'Florida')]

**Note**: The methods above all return objects that act similar to lists. Also notice that `items` returns essentially a list of tuples.

Similar to the `pop` method for lists, the `pop` method also removes an item from the dictionary.

In [192]:
d.pop('UCLA')

KeyError: 'UCLA'

In [193]:
d

{'FAMU': 'Florida', 'FSU': 'close to Georgia', 'ISU': 'Iowa', 'UGA': 'Georgia'}

Two dictionaries can be combined.

In [194]:
d

{'FAMU': 'Florida', 'FSU': 'close to Georgia', 'ISU': 'Iowa', 'UGA': 'Georgia'}

In [195]:
more_unis = {'Duke':'North Carolina', 'Auburn':'Alabama'}
more_unis

{'Auburn': 'Alabama', 'Duke': 'North Carolina'}

In [196]:
d.update(more_unis)
d

{'Auburn': 'Alabama',
 'Duke': 'North Carolina',
 'FAMU': 'Florida',
 'FSU': 'close to Georgia',
 'ISU': 'Iowa',
 'UGA': 'Georgia'}

# 9. Operators

#### Last Week

* Arithmetic operators: `+, -, *, /, **, %, //`
* Comparison (i.e., relational) operators: `==, !=, <>, >, <, >=, <=`
* Assignment operators: `=, +=, -=, *=, /=, **=, %=, //=`

#### This Week

* Membership: `in, not in`
* Logical: `and, or, not`

### Membership

Membership operators test for membership in a sequence or container.

In [197]:
title = "The Black Album"
title

'The Black Album'

A string is a sequence, so we can check if a particular character is `in` the string.

In [198]:
'c' in title

True

In [199]:
'm' not in title

False

In [200]:
'u' in title

True

We can use the same operators on lists.

In [201]:
side1 = ["99 Problems", "Moment of Clarity",
         "Dirt of Your Shoulder", "Allure", "Encore"]
side1

['99 Problems',
 'Moment of Clarity',
 'Dirt of Your Shoulder',
 'Allure',
 'Encore']

In [202]:
'Encore' in side1

True

In [203]:
"Problems" in side1

False

**Action**: Look closely at the above cell. `'Encore'` is in the `side1` list, but the code is asking about the string `'Problems'`. Since `'Problems'` is not an exact match to any element in the `side1` list, the operator returns `False`.

In [204]:
"Tom Ford" in side1

False

In [205]:
writers = {'Angelou', 'Morrison', 'Gaines', 'Hughes', 'Wright'}
writers

{'Angelou', 'Gaines', 'Hughes', 'Morrison', 'Wright'}

What type is `writers`? If you're not sure you can run the line: `type(writers)`.

In [206]:
'Gaines' in writers

True

Create a dictionary where the song title is the key and the length of the song is the value. We will use the `zip` function as an intermediate step.

In [207]:
mins = [3.55, 4.24, 4.05, 4.52, 4.11]
zip(side1, mins)

[('99 Problems', 3.55),
 ('Moment of Clarity', 4.24),
 ('Dirt of Your Shoulder', 4.05),
 ('Allure', 4.52),
 ('Encore', 4.11)]

__Note__: `zip` is similar to `range` in that it simply instantiates an object. Until you do something with it, the resulting object just sits there. In the next cell we convert it to a list.

In [208]:
list(zip(side1, mins))

[('99 Problems', 3.55),
 ('Moment of Clarity', 4.24),
 ('Dirt of Your Shoulder', 4.05),
 ('Allure', 4.52),
 ('Encore', 4.11)]

__Action__: What does `zip` do? Why is "zip" a good name for what it does?

In [209]:
times = dict(zip(side1, mins))
times

{'99 Problems': 3.55,
 'Allure': 4.52,
 'Dirt of Your Shoulder': 4.05,
 'Encore': 4.11,
 'Moment of Clarity': 4.24}

Membership operators only look at the keys of the dictionary.

In [210]:
'Encore' in times

True

In [211]:
4.52 in times

False

If you want to check if something is in the values of the dictionary, you can extract just the values using `times.values()` and then see if the object is in there.

In [212]:
4.52 in times.values()

True

### Logical

`and`, `or`, and `not` are very general operators. The examples below use them in conjunction with the membership operators, but these logical operators will be applied more widely in future weeks.

In [213]:
side1

['99 Problems',
 'Moment of Clarity',
 'Dirt of Your Shoulder',
 'Allure',
 'Encore']

In the example below, the queries on the left and right of the `and` must both be `True` for the overall statement to return `True`.

In [214]:
"Allure" in side1 and "Moment of Clarity" in side1

True

We can show how the above statement works by splitting it into two parts.

In [215]:
"Allure" in side1

True

In [216]:
"Moment of Clarity" in side1

True

Let's try a different song.

In [217]:
"Tom Ford" in side1

False

In [218]:
"Tom Ford" in side1 and "Allure" in side1

False

In the example below, if the query on either the left or right of the `or` is `True`, then the overall statement will return `True`. Note: the statement will also return `True` if both are `True`.

In [219]:
"Tom Ford" in side1 or "Allure" in side1

True

In [220]:
"Allure" in side1 and not "Tom Ford" in side1

True

**Note**: `not` essentially flips the meaning of the operator that follows it.

We can use parentheses to make the code more readable.

In [221]:
("Allure" in side1) and ("Encore" in side1)

True

When using parentheses, you can use the `&` symbol in place of `and`.

In [222]:
("Allure" in side1) & ("Encore" in side1)

True

Similarly you can use the pipe symbol (`|`) for `or`.

In [223]:
("Allure" in side1) | ("Tom Ford" in side1)

True

# 10. Test Yourself 

1) Identify the type of each variable in the code below using a comment on the __same line__ as the object. Note: this is testing both your knowledge of types and the use of comments.

In [None]:
a = {5, 6}
b = [5, 6]
c = (5, 6)
d = "5, 6"
e = {5:6}

---

2) What is the main difference between a list and a tuple?

[double click to type answer here]

---

3) Slice out the 8 from the var1 list.

In [None]:
var1 = [[9,1,4,3], [2,8,3], [9,3]]
var1

---

4) Slice out the 8 from the var2 list. 
- Hint: your answer will look something like this `var2[0][0][0][0]`
- Hint: when you move your cursor next to a bracket, its partner bracket will highlight
- Hint: build up your answer one slice at a time; for example test `var2[0]` and see if the 8 is in there; once you get the first slice then test the next, for example `var2[0][0]`

In [None]:
var2 = [4,5,2,[9,3,[6,2,5,4],[7,[7,8,3],3,4],[1,7,9]]]
var2

---

5) Explain in a sentence or two the difference between "union" and "intersection" in set theory.

[double click to type answer here]

---

6) Answer the following questions using Python syntax on the `data1` list.

In [None]:
data1 = [65,23,14,90,56]

a) Is there a `14` in `data1`?

b) Are there more than six elements in `data1`?

c) Is the first element in `data1` larger than 30 and is the last element in `data1` smaller than 60?

d) Is there a `20` or `90` in `data1`?

---

7) Below are some scenarios. Which data storage type (string, list, tuple, set, dictionary) would be best? Give a one sentence explanation of why you made this choice.

a) You are going to start a local version of Yelp in Tallahassee. You have classified each restaurant by cuisine (e.g., Italian, BBQ, etc.). You expect users might want to enter the cuisine type to find all the restaurants that meet that type.

[double click to type answer here]

b) You have the names of all the restaurants in Tallahassee, but just want the unique names.

[double click to type answer here]

c) The names of all the restaurants in Tallahassee.

[double click to type answer here]

d) Names of all the counties in Florida in case you want to expand to the whole state. 

[double click to type answer here]

e) You have the names of all the restaurants in Pensacola, and you want to see how many more restaurants you'll need to categorize if you expand there. Hint: if there is a McDonald's in Pensacola you don't need to classify it again since you classified it in Tallahassee. 

[double click to type answer here]

---

8) A statement was made above that lists and tuples are kind of like dictionaries except that keys are a sequence of integers. Why is this the case? Hint: compare the syntax for slicing and selecting values.

[double click to type answer here]