

### External References:
- sets
    - http://www.diveintopython3.net/native-datatypes.html#sets
    - https://docs.python.org/3/tutorial/datastructures.html#sets
    - Also this: https://en.wikipedia.org/wiki/Set_theory
- dictionaries
    - https://automatetheboringstuff.com/chapter5/ (first half)
    - http://www.diveintopython3.net/native-datatypes.html#dictionaries
    - http://greenteapress.com/thinkpython/html/thinkpython012.html

    

# Sets (set)
- mutable
- Can only contain immutable values
  - tuple, str, int, float, bool
- unordered
- cannot contain duplicates
- REALLY fast for checking whether objects are in the collection or not
- useful for venn-diagram like questions
    - i.e. what items are in this collection that aren't in this collection
    - what items are in both collections?
- Surrounded with brackets {}


In [5]:
# Creating an empty set
example = set()

len(example)

0

In [3]:
s2 = {'a', 'b', 'c'}

In [4]:
type(s2)

set

In [2]:
type({})

dict

Adding values to a set one at a time

In [6]:
example.add(3)
example.add(4)
example

{3, 4}

Adding a value already in a set has no effect.

In [7]:
example.add(3) # 3 is already in there, so nothing changes
example

{3, 4}

In [8]:
{'word', 'words'}

{'word', 'words'}

Removing things from a set mutates it.

In [9]:
# removing things
example.remove(3)
example

{4}

Check for inclusion just like other containers

In [11]:
4 in example

True

Type-casting to a set.
Works on any sequence.

In [12]:
hello = 'hello'
set(hello)

{'e', 'h', 'l', 'o'}

In [13]:
duplicates = [1, 1, 1, 0, 0, 2, 0, 2]
set(duplicates)

{0, 1, 2}

In [14]:
list(set(duplicates))

[0, 1, 2]

Sets are most useful when you have 2 sets of data.

In [16]:
target_word = set('horse')
guessed = set('hnsje')

What guessed letters are correct?
aka. what values in guessed also appear in target_word?

In [17]:
guessed.intersection(target_word)

{'e', 'h', 's'}

In [20]:
guessed

{'e', 'h', 'j', 'n', 's'}

In [18]:
guessed & target_word # & is special operator for sets.  It means intersection

{'e', 'h', 's'}

What what guessed letters are not correct?

aka what letters appear in guessed but not target_word?

In [21]:
guessed.difference(target_word)

{'j', 'n'}

In [22]:
guessed - target_word

{'j', 'n'}

What correct letters have not been guessed?
aka What letters appear in target word but not guessed?

In [23]:
target_word.difference(guessed)

{'o', 'r'}

In [24]:
target_word - guessed

{'o', 'r'}

What are all the letters that have been guessed or are in target word?

In [25]:
target_word.union(guessed)

{'e', 'h', 'j', 'n', 'o', 'r', 's'}

In [27]:
target_word.union(['a', 'b', 'c'])

{'a', 'b', 'c', 'e', 'h', 'o', 'r', 's'}

In [29]:
target_word.union('astb')

{'a', 'b', 'e', 'h', 'o', 'r', 's', 't'}

In [30]:
target_word + guessed

TypeError: unsupported operand type(s) for +: 'set' and 'set'

Can be converted and looped over just like other containers

In [36]:
list(target_word)

['s', 'h', 'o', 'r', 'e']

In [34]:
for letter in target_word:
    print(letter, 'is in our goal')

s is in our goal
h is in our goal
o is in our goal
r is in our goal
e is in our goal


Sets are **unordered** meaning no indexing like strings, lists, and tuples

In [37]:
target_word[0]

TypeError: 'set' object does not support indexing

But remember, no mutable data!

In [38]:
s = set()
s.add([])

TypeError: unhashable type: 'list'

In [39]:
s.add(set())

TypeError: unhashable type: 'set'

# Dictionary data type (dict)

- `dict` in python
- also called `associative array` and `hash` in other languages.
- accessed with a `key`, not an index.
- keys are unique, immutable.
- keys can be: strings, tuples, numbers
- values can be anything.
- represented with curly braces in python. `{}`

A dictionary is a data structure which associates (or maps) a `key` to a `value`.  The keys of a dictionary are analogous to the variables we use in our code.  They are simply a reference to a value.  A dictionary is a collection of these references.

Key takeaway is that **dictionaries associate one value with another**.  That is most often why we use them.

In [40]:
# Creating a dict with braces literal
example_literal = {'vowels': 'aeiouy', 'date': '11-8-2016'}
example_literal

{'date': '11-8-2016', 'vowels': 'aeiouy'}

In [43]:
len(example_literal)

2

In [44]:
{'v': 'aei', 'v': 2}

{'v': 2}

In [46]:
# Creating a dict with the dict function
example_from_function = dict(vowels='aeiouy', date='11-8-2016')
example_from_function

{'date': '11-8-2016', 'vowels': 'aeiouy'}

In [47]:
empty = {}
type(empty)

dict

We access data from our dictionary based on its **key**, not index

Format is:

`dictionary[key_name]`

In [48]:
example_from_function['vowels']

'aeiouy'

In [49]:
print(example_from_function['vowels'])
print("Today's date is", example_literal['date'])

aeiouy
Today's date is 11-8-2016


Accessing a key that isn't in the dictionary raises an error

In [51]:
example_from_function['not a key']

KeyError: 'not a key'

We can add new key, value pairs to our dictionary by assigning values to that key

In [52]:
example_from_function['pi'] = 3.14
example_from_function

{'date': '11-8-2016', 'pi': 3.14, 'vowels': 'aeiouy'}

In [54]:
example_from_function[True] = False

In [55]:
example_from_function

{'vowels': 'aeiouy', 'date': '11-8-2016', 'pi': 3.14, True: False}

In [58]:
type_examples = {
    str: 'hello',
    int: 3,
    float: 3.0,
    bool: True,
    list: ['hi'],
    set: set(),
    tuple: (),
}

In [59]:
type_examples

{str: 'hello',
 int: 3,
 float: 3.0,
 bool: True,
 list: ['hi'],
 set: set(),
 tuple: ()}

In [60]:
type_examples[list]

['hi']

## Dictionary vs lists

#### The differences:
- dicts are unordered (therefore cant be sliced)
- lists only accessed with index
- KeyError vs IndexError

#### The similarities:
- both are mutable
- both can contain *multiple* values
- both can be iterated over

To be equal, two lists must have the same **items** in the same **order**.

To be equal, two dicts must have the same **keys** with the same **value**

##### Adjust the following data structures to make the comparision  true. Keep the `==`

In [62]:
[3, 2, 1] == [3, 2, 1]

True

In [64]:
{3:'b', 1:'a', 2:'c'} == {1:'a', 2:'c', 3:'b'}

True

## keys(), values(), and items()
- dict methods
- used to access dict information in a list-like way

In [66]:
state_abbreviations = {'AK': 'Alaska',
 'AL': 'Alabama',
 'AR': 'Arkansas',
 'AZ': 'Arizona',
 'CA': 'California',
 'CO': 'Colorado',
 'CT': 'Connecticut',
 'DE': 'Delaware',
 'FL': 'Florida',
 'GA': 'Georgia',
 'HI': 'Hawaii',
 'IA': 'Iowa',
 'ID': 'Idaho',
 'IL': 'Illinois',
 'IN': 'Indiana',
 'KS': 'Kansas',
 'KY': 'Kentucky',
 'LA': 'Louisiana',
 'MA': 'Massachusetts',
 'MD': 'Maryland',
 'ME': 'Maine',
 'MI': 'Michigan',
 'MN': 'Minnesota',
 'MO': 'Missouri',
 'MS': 'Mississippi',
 'MT': 'Montana',
 'NC': 'North Carolina',
 'ND': 'North Dakota',
 'NE': 'Nebraska',
 'NH': 'New Hampshire',
 'NJ': 'New Jersey',
 'NM': 'New Mexico',
 'NV': 'Nevada',
 'NY': 'New York',
 'OH': 'Ohio',
 'OK': 'Oklahoma',
 'OR': 'Oregon',
 'PA': 'Pennsylvania',
 'RI': 'Rhode Island',
 'SC': 'South Carolina',
 'SD': 'South Dakota',
 'TN': 'Tennessee',
 'TX': 'Texas',
 'UT': 'Utah',
 'VA': 'Virginia',
 'VT': 'Vermont',
 'WA': 'Washington',
 'WI': 'Wisconsin',
 'WV': 'West Virginia',
 'WY': 'Wyoming'}

In [67]:
len(state_abbreviations)

50

In [68]:
state_abbreviations.keys()

dict_keys(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY'])

In [71]:
state_abbreviations.values()

dict_values(['Alaska', 'Alabama', 'Arkansas', 'Arizona', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Iowa', 'Idaho', 'Illinois', 'Indiana', 'Kansas', 'Kentucky', 'Louisiana', 'Massachusetts', 'Maryland', 'Maine', 'Michigan', 'Minnesota', 'Missouri', 'Mississippi', 'Montana', 'North Carolina', 'North Dakota', 'Nebraska', 'New Hampshire', 'New Jersey', 'New Mexico', 'Nevada', 'New York', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Virginia', 'Vermont', 'Washington', 'Wisconsin', 'West Virginia', 'Wyoming'])

In [72]:
state_abbreviations.items()

dict_items([('AK', 'Alaska'), ('AL', 'Alabama'), ('AR', 'Arkansas'), ('AZ', 'Arizona'), ('CA', 'California'), ('CO', 'Colorado'), ('CT', 'Connecticut'), ('DE', 'Delaware'), ('FL', 'Florida'), ('GA', 'Georgia'), ('HI', 'Hawaii'), ('IA', 'Iowa'), ('ID', 'Idaho'), ('IL', 'Illinois'), ('IN', 'Indiana'), ('KS', 'Kansas'), ('KY', 'Kentucky'), ('LA', 'Louisiana'), ('MA', 'Massachusetts'), ('MD', 'Maryland'), ('ME', 'Maine'), ('MI', 'Michigan'), ('MN', 'Minnesota'), ('MO', 'Missouri'), ('MS', 'Mississippi'), ('MT', 'Montana'), ('NC', 'North Carolina'), ('ND', 'North Dakota'), ('NE', 'Nebraska'), ('NH', 'New Hampshire'), ('NJ', 'New Jersey'), ('NM', 'New Mexico'), ('NV', 'Nevada'), ('NY', 'New York'), ('OH', 'Ohio'), ('OK', 'Oklahoma'), ('OR', 'Oregon'), ('PA', 'Pennsylvania'), ('RI', 'Rhode Island'), ('SC', 'South Carolina'), ('SD', 'South Dakota'), ('TN', 'Tennessee'), ('TX', 'Texas'), ('UT', 'Utah'), ('VA', 'Virginia'), ('VT', 'Vermont'), ('WA', 'Washington'), ('WI', 'Wisconsin'), ('WV', 'We

In [73]:
example = [
    [
        'one', 1
    ],
    ['two', 2]
]

In [74]:
example

[['one', 1], ['two', 2]]

In [75]:
dict(example)

{'one': 1, 'two': 2}

In [76]:
# A demsonstration loop.
for item in state_abbreviations.items():
    print('The addreviation for', item[1], 'is:', item[0])
    

The addreviation for Alaska is: AK
The addreviation for Alabama is: AL
The addreviation for Arkansas is: AR
The addreviation for Arizona is: AZ
The addreviation for California is: CA
The addreviation for Colorado is: CO
The addreviation for Connecticut is: CT
The addreviation for Delaware is: DE
The addreviation for Florida is: FL
The addreviation for Georgia is: GA
The addreviation for Hawaii is: HI
The addreviation for Iowa is: IA
The addreviation for Idaho is: ID
The addreviation for Illinois is: IL
The addreviation for Indiana is: IN
The addreviation for Kansas is: KS
The addreviation for Kentucky is: KY
The addreviation for Louisiana is: LA
The addreviation for Massachusetts is: MA
The addreviation for Maryland is: MD
The addreviation for Maine is: ME
The addreviation for Michigan is: MI
The addreviation for Minnesota is: MN
The addreviation for Missouri is: MO
The addreviation for Mississippi is: MS
The addreviation for Montana is: MT
The addreviation for North Carolina is: NC
Th

In [None]:
# A demsonstration loop.
for key, value in state_abbreviations.items():
    print('The addreviation for', value, 'is:', key)
    

In [78]:
# That was verbose. Let's just get 5 items out of the dict
state_abbreviations.items()[0:5]

TypeError: 'dict_items' object is not subscriptable

The results if items(), keys(), and values() are not lists.  They look and act mostly like lists, but if we want the other functionality (accessing by index, mutability) we must convert them to lists.

In [79]:
state_items = state_abbreviations.items()
list(state_items)[:5]

[('AK', 'Alaska'),
 ('AL', 'Alabama'),
 ('AR', 'Arkansas'),
 ('AZ', 'Arizona'),
 ('CA', 'California')]

## the 'in' keyword with dicts

`in` and `not in` worked with lists.  They also work with dictionaries. (looks in keys by default)

In [81]:
'CA' in state_abbreviations

True

In [82]:
'CA' in state_abbreviations.keys()

True

In [83]:
'Oregon' in state_abbreviations

False

In [84]:
'Oregon' in state_abbreviations.values()

True

We can manually check if a key is set in a dictionary.

If it isn't, then we can add it.

In [86]:
if 'PR' not in state_abbreviations:
    state_abbreviations['PR'] = 'Puerto Rico'

In [87]:
state_abbreviations['PR']

'Puerto Rico'

## Safely accessing dictionary with get()

Sometimes you don't know if a key is set in a dictionary. If we try to access a key that isn't set we get a `KeyError`
```python
if some_key in some_dict:
    result = some_dict[some_key]
 else:
    result = default_value
```

This can get a little verbose, so we have the `get` method.

```python
result = some_dict.get(some_key, default_value)
```

In [88]:
state_abbreviations['DC']

KeyError: 'DC'

This line says "Get the value of the 'DC' key in the abbreviations dictionary.  If it's not there, give me the default value ('Unknown' in this case) instead.

In [91]:
state_abbreviations.get('DC', 'Unknown')

'Unknown'

In [93]:
state_abbreviations.get('DC') == None

True

In [94]:
import data.states
votes = data.states.ELECTORAL_VOTES

In [95]:
votes

{'Alabama': 9,
 'Alaska': 3,
 'Arizona': 11,
 'Arkansas': 6,
 'California': 55,
 'Colorado': 9,
 'Connecticut': 7,
 'Delaware': 3,
 'District of Columbia': 3,
 'Florida': 29,
 'Georgia': 16,
 'Hawaii': 4,
 'Idaho': 4,
 'Illinois': 20,
 'Indiana': 11,
 'Iowa': 6,
 'Kansas': 6,
 'Kentucky': 8,
 'Louisiana': 8,
 'Maine': 4,
 'Maryland': 10,
 'Massachusetts': 11,
 'Michigan': 16,
 'Minnesota': 10,
 'Mississippi': 6,
 'Missouri': 10,
 'Montana': 3,
 'Nebraska': 5,
 'Nevada': 6,
 'New Hampshire': 4,
 'New Jersey': 14,
 'New Mexico': 5,
 'New York': 29,
 'North Carolina': 15,
 'North Dakota': 3,
 'Ohio': 18,
 'Oklahoma': 7,
 'Oregon': 7,
 'Pennsylvania': 20,
 'Rhode Island': 4,
 'South Carolina': 9,
 'South Dakota': 3,
 'Tennessee': 11,
 'Texas': 38,
 'Utah': 6,
 'Vermont': 3,
 'Virginia': 13,
 'Washington': 12,
 'West Virginia': 5,
 'Wisconsin': 10,
 'Wyoming': 3}

In [96]:
sorted(votes.keys())

['Alabama',
 'Alaska',
 'Arizona',
 'Arkansas',
 'California',
 'Colorado',
 'Connecticut',
 'Delaware',
 'District of Columbia',
 'Florida',
 'Georgia',
 'Hawaii',
 'Idaho',
 'Illinois',
 'Indiana',
 'Iowa',
 'Kansas',
 'Kentucky',
 'Louisiana',
 'Maine',
 'Maryland',
 'Massachusetts',
 'Michigan',
 'Minnesota',
 'Mississippi',
 'Missouri',
 'Montana',
 'Nebraska',
 'Nevada',
 'New Hampshire',
 'New Jersey',
 'New Mexico',
 'New York',
 'North Carolina',
 'North Dakota',
 'Ohio',
 'Oklahoma',
 'Oregon',
 'Pennsylvania',
 'Rhode Island',
 'South Carolina',
 'South Dakota',
 'Tennessee',
 'Texas',
 'Utah',
 'Vermont',
 'Virginia',
 'Washington',
 'West Virginia',
 'Wisconsin',
 'Wyoming']

In [97]:
votes.get('District of Columbia', 0)

3

In [98]:
votes.get('Guam', 0)

0

`get` is great to use when you want to assume the value is of a certain type and a default value makes sense.  An example:

In [None]:
# Count the number of times each character appears
speech = 'Four score and seven years ago our forefathers brought forth upon this continent a new nation'
char_count = {} # empty dict

In [None]:
for char in speech:
    count = char_count.get(char, 0)
    char_count[char] = count + 1
print(char_count)

## Nested Dictionaries and Lists

Often when modeling real world data, it makes sense to compose lists and dicts within eachother.  i.e. a list of dicts or a dict where the values are lists.  This also applies to list of lists and dicts where the values are dicts.

Example: You are having a picnic. You have a guest list and want to keep track of who's bringing what.  There are several ways to structure your data depending on what information you have and how you want to use it.

In [113]:
# In this example, we only care about who's bringing what.
potluck1 = {'Jim': ['apples', 'bananas'],
            'Sally': ['pears', 'bacon'],
            'Hassan': ['cups']
           }

In [114]:
potluck1.keys() # our guests

dict_keys(['Jim', 'Sally', 'Hassan'])

In [116]:
potluck1.values()

dict_values([['apples', 'bananas'], ['pears', 'bacon'], ['cups']])

In [117]:
potluck1.values() # a list of lists representing the foods being brought

dict_values([['apples', 'bananas'], ['pears', 'bacon'], ['cups']])

**QUESTION**: how would you generate a collection that contains just contains all the items being brought to the picnic?  What would be a good data structure?

In [None]:
# Here we have  list of dicts, each representing one guest and what they are bringing.
# This format is more verbose, but a great example of what you might get
# if you have your guests fill out a form and then download the results.
potluck2 = [
    {'name': 'Hassan', 'foods': ['cups']},
    {'name': 'Jim', 'foods': ['apples', 'bananas'] },
    {'name': 'Sally', 'foods': ['pears', 'bacon']}
]

In [None]:
# Here we have a dict of dicts, where each value is a dict which maps the type: amount of the food they're bringing.

potluck3 = {
    'Hassan': {'cups': 5},
    'Jim': {'apples': 2, 'bananas': 5},
    'Sally': {'pears': 3, 'bacon': 10}
}

In [None]:
# Here is the same data represented as a list of lists.
# This is a good example of what our data might look like if we download it from an excel spreadsheet.
potluck4 = [ # guest, item, quantity
    ['hassan', 'cups', 5],
    ['Sally', 'pears', 3],
    ['Sally', 'bacon', 10],
    ['Jim', 'apples', 2],
    ['Jim', 'bananas', 5]
]

## A simple example

In [None]:
import data.states

In [103]:
electoral_votes = data.states.ELECTORAL_VOTES
state_abbreviations = data.states.STATE_ABBREVIATIONS

Say we have a list of state abbreviations reprensting states that have been won by a candidate.

In [107]:
current = ['PA', 'OR', 'TN', 'MN', 'Blah']

Let's calculate the total number of electoral votes these states are worth.

In [110]:
def total_votes(states):
    total_votes = 0
    for state in current: # loop through our list of states
        name = state_abbreviations.get(state) # get the name of the state from our abbreviations dictionary
        votes = electoral_votes.get(name, 0) # get the number of votes that state is worth from our votes dictionary
        total_votes += votes # add the votes to our total
    return total_votes

In [111]:
total_votes(current)

48

# Review Questions
- What does the code for an empty dictionary look like?
- What happens if you access a key that doesn't exist in a dict like so: states['England']
- What is the difference between the code: `'apple' in foods` and `'apple' in foods.values()`
- What is the difference between: `foo['monkey']` and `foo.get('monkey')`?
- What do lists and dicts have in common?

In [None]:
# empty dictionary
{}
dict()

In [None]:
#access a missing key
KeyError

In [None]:
#difference between 'apple' in foods and 'apple' in foods.values()
one checks keys the other checks values

In [None]:
# differnce between foo['monkey'] and foo.get(monkey)?
first assumes 'monkey' is a key in the dictionary.
second is the 'safe' way of accessing the dictionary for key 'monkey'

In [None]:
#What do lists and dicts have in common?
- both mutable
- can be nested
- values can be anything
- dictionaries are unordered
- dictionaries cannot contain duplicates (keys)

## Practice:
Update your Recipe data format to use a dictionary instead of a list.