# Dictionaries

Now let's talk about another very useful data type: **dictionaries**. **Dictionaries** allow you to create mappings from keys to values. For example, you might use a dictionary to map Australian state names to the names of their corresponding capital cities:


In [None]:
capitals = {'Victoria': 'Melbie',
    'New South Wales': 'Sydney',
    'Queensland': 'Brisbane',
    'Tasmania': 'Hobart',
    'South Australia': 'Adelaide',
    'Western Australia': 'Perth'}
print(capitals['Tasmania'])


(Yes, there is an intentional mistake in there — `'Melbie'`. We'll fix it shortly.)

The **keys** in the above dictionary are strings which represent the names of the Australian states. Each key maps to a **value**, which also happens to be a string in this case. In the general case, the keys and values can be of different types.

Dictionary values are constructed using the curly brace characters `{` and `}`. As a special case, you can construct an empty dictionary using an opening and closing brace with nothing in between:


In [None]:
example_empty_dict = {}


In the above code, an empty dictionary is created and assigned to the variable `example_empty_dict`. Currently this dictionary contains no mappings, so it is fairly uninteresting. As you shall soon see, it is possible to add new mappings to an existing dictionary, so it can be useful to construct empty dictionaries and later add values as necessary.


# Indexing Dictionaries

Like lists and strings, dictionaries are **indexable**. However, the indices of dictionaries are keys of arbitrary value, whereas the indices of lists and strings are always integers. A given key can only occur once in the dictionary, and is associated with a unique value (but you can, of course, make the value a list containing multiple objects). You can look up values in a dictionary using the normal indexing notation:


```python
capitals = {'Victoria': 'Melbie',
    'New South Wales': 'Sydney',
    'Queensland': 'Brisbane',
    'Tasmania': 'Hobart',
    'South Australia': 'Adelaide',
    'Western Australia': 'Perth'}

print(capitals['Victoria'])
print(capitals['Queensland'])
print(capitals['ACT'])  # This is an ERROR!
```

In [None]:
# Try running the last example here


Note that if the index is not a key in the dictionary (e.g. `'ACT'`) then you get an error!


# Dictionary Methods

Let's look at a few dictionary methods.

The `.get()` method takes a key as an argument and returns the value associated with it. The difference between this and a direct index is that in the case a key doesn't exist, indexing will throw an error while using `.get()` will simply return `None`.

The `.pop()` method takes a key as an argument and returns the value, also **deleting** that key-value pair. This is similar to the way `.pop()` is used for lists in that it mutates the dictionary it is called over and returns the value if it executes successfully, or `KeyError` if the key wasn't found in the dictionary (what was the error type raised by `.pop()` in the case of a list, when it fails?).  

Finally, the `.clear()` method deletes the entire contents of a dictionary. Like `.pop()`, this is a mutating method, so don't assign it to anything!


In [None]:
my_dict = {"Age":34,"Anna":"Joe","Jobs":"Steve"}
print(my_dict.get("Age"))
print(my_dict)
print(my_dict.pop("Jobs"))
print(my_dict)

my_dict.clear()
print(my_dict)

# Updating Dictionaries

You can change the value which is associated with a key or add a new key and value pair to the dictionary using the assignment (`=`) operator:


In [None]:
capitals = {'Victoria': 'Melbie',
    'New South Wales': 'Sydney',
    'Queensland': 'Brisbane',
    'Tasmania': 'Hobart',
    'South Australia': 'Adelaide',
    'Western Australia': 'Perth'}

print(capitals['Victoria'])

capitals['Victoria'] = 'Melbourne'
capitals['ACT'] = 'Canberra'

print(capitals['Victoria'])
print(capitals['ACT'])



`'Victoria'` now maps to the value `'Melbourne'` instead of `'Melbie'`, thus fixing the previous mistake. The key `'ACT'` was not present in the dictionary before the assignment statement, so the above code extends the dictionary with a new mapping.


# Testing Dictionary Membership

You can test if a key is present in a dictionary using the `in` operator. Try running the following code a few times, entering both valid and invalid state names:


In [None]:
capitals = {'Victoria': 'Melbourne',
    'New South Wales': 'Sydney',
    'Queensland': 'Brisbane',
    'Tasmania': 'Hobart',
    'South Australia': 'Adelaide',
    'Western Australia': 'Perth'}

state = input('Enter a state name: ')

if state in capitals:
    print(capitals[state])
else:
    print(state, 'not found')


`in` is especially useful because if we were to run the line `print(capitals[state])` where the value of `state` did not exist as a key in the dictionary, it would generate a `KeyError` as shown below. With `in`, we can confirm that a key exists and will not cause an error before we use it.


```python
capitals = {'Victoria': 'Melbourne'}
print(capitals["Queensland"])
```  

In [None]:
# Try running the last example here

# Accessing all Keys

You can get the keys in a dictionary using the `.keys()` method. `.keys()` returns a special iterable collection called a `dict_keys` [view object ](https://docs.python.org/3/library/stdtypes.html#dict-views) that supports iteration and the `in` operator like a `list`, but **does not** support indexing:


```python
capitals = {'Victoria': 'Melbourne',
    'New South Wales': 'Sydney'}
keys = capitals.keys()

print("The keys:",keys)

print("Looping through all keys:")
for key in keys:
    print(key)

print("Is 'Victoria' in keys?",'Victoria' in keys)
print(keys[0])  # This is an ERROR!
```  

In [None]:
# Try running the last example herer

# Accessing all Values

You can get the values in a dictionary (separate from the keys) using the `.values()` method. `.values()` returns a special iterable collection called a `dict_values` object that acts much like the `dict_keys` object:


```python
capitals = {'Victoria': 'Melbourne',
    'New South Wales': 'Sydney'}
d_values = capitals.values()

print("The values:",d_values)

print("Looping through all values:")
for value in d_values:
    print(value)

print("Is 'Victoria' in values?",'Victoria' in d_values)
print(d_values[0])  # This is an ERROR!
```

In [None]:
# Try running the last example here

# Accessing All Keys and Values

Another useful method for dictionaries is the `.items()` method. Like `.keys()` and `.values()`, `.items()` returns a view object called `dict_items` containing tuples of the `(key, value)` pairs.

You can also convert these view objects into lists using `list`.


In [None]:
capitals = { 'Victoria': 'Melbourne',
    'New South Wales': 'Sydney' }
d_items = capitals.items()

print("Items in this dictionary:",d_items)
print("As a list:", list(d_items))

for (key,value) in d_items:
    print(key, value)


# Further Notes

Keys for dictionaries can only be immutable objects. That means you can use any of `int`, `float`, `str`, `tuple`, or `bool` as keys, but you cannot use `list`, or `set`. Could you use dictionaries? Are they mutable or immutable? Try it below:


In [None]:
d = {9: 34, 5.6: "abc", "str": 20, "str": 15, (1,2,3): 2, True: "hi"}
for (key, value) in d.items():
    print(key, value)


You'll also notice that `"str"` has been used as a key twice. This doesn't produce an error, rather Python only saves the last value for that key (overwriting any earlier values, just as happens with assignment).

# Problem: Capital cities

Write a function `is_capital(state, city)` that returns `True` if the named `city` is the capital of the named `state` and `False` otherwise. Every city and state in the following table should be recognised.



| State | Capital city |
| --- | --- |
| New South Wales | Sydney |
| Queensland | Brisbane |
| South Australia | Adelaide |
| Tasmania | Hobart |
| Victoria | Melbourne |
| Western Australia | Perth |



If a city or state is not in the table the function must return `False`.

Here are some examples of how your function should work:


```python
>>> is_capital('Victoria', 'Melbourne')
True
>>> is_capital('Queensland', 'Adelaide')
False
```  

In [None]:
# Your solution here

# Counting Things with Dictionaries

(Important) The problem below as a very long string wrapped around triple quotations (we have seen this previously as a docstring). It's good to note now that docstrings are a form of a **multi-line string**. 

A common programming task is to keep a tally of how many times various items appear in a piece of data. Below is an example program which counts the number of occurrences of each letter in the first paragraph of Moby Dick:


In [None]:
MOBY = """Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, 
and nothing particular to interest me on shore, I thought I would sail about a little and see the watery 
part of the world. It is a way I have of driving off the spleen and regulating the circulation. 
Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; 
whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral 
I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle 
to prevent me from deliberately stepping into the street, and methodically knocking people's hats off - then, I 
account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical 
flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. 
If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings 
towards the ocean with me."""

tally = {}
for char in MOBY:
    if char in tally:
        tally[char] += 1
    else:
        tally[char] = 1

print(tally['C'])
print(tally['I'])


The above code assigns the text of the first paragraph of Moby Dick to the variable `MOBY`. A new empty dictionary is created and assigned to `tally`. The `for` loop counts the number of times each character appears in the string assigned to `MOBY`. Each iteration, the code checks if the current letter `char` is already in the `tally` dictionary. If yes, its corresponding count is incremented. If not, it is added to the dictionary and its count is set to `1`. The reason we need to do this is that if we attempt to increment a value associated with a non-existent key, we will get a `KeyError` (because there is no pre-existing value to increment):


```python
votes = {}
votes['Melbourne'] = 'many!'
votes['anywhere else'] += 1 
```

In [None]:
# Try running the last example here


When the program completes, it prints out the number of instances of the letters `C` and `I` in the text.


# Character Histogram

After counting all the characters in Moby Dick's first paragraph, you can do interesting things with the information. For instance, you can print out a simple bar chart of the frequencies for all the upper case characters which appear in the text:


In [None]:
MOBY = """Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, 
and nothing particular to interest me on shore, I thought I would sail about a little and see the watery 
part of the world. It is a way I have of driving off the spleen and regulating the circulation. 
Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; 
whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral 
I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle 
to prevent me from deliberately stepping into the street, and methodically knocking people's hats off - then, I 
account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical 
flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. 
If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings 
towards the ocean with me."""

tally = {}
for char in MOBY:
    if char in tally:
        tally[char] += 1
    else:
        tally[char] = 1

for key in tally.keys():
    if key.isupper():
        print(key + ': ' + '=' * tally[key])


For each of the keys in `tally`, the code checks if it is an upper case character using `.isupper()`. If it is upper case, it prints a bar with the number of occurrences of '`=`' equal to the count of that character in `tally`. From the output of the code you can see that, for instance, there are two occurrences of `C` in the text, and 12 occurrences of `I`.


# Problem: Repeat Words

Write a function `repeat_word_count(text, n)` that takes a string `text` and a positive integer `n`, converts `text` into a list of words based on simple whitespace separation (with no removal of punctuation or changing of case), and returns a **sorted** list of words that occur `n` or more times in `text`. For example:


```python
>>> repeat_word_count("buffalo buffalo buffalo buffalo", 2)
['buffalo']
>>> repeat_word_count("one one was a racehorse two two was one too", 3)
['one']
>>> repeat_word_count("how much wood could a wood chuck chuck", 1)
['a', 'chuck', 'could', 'how', 'much', 'wood']
```

In [None]:
# Your solution here

# Problem: Mode

Write a function `mode(numlist)` that takes a single argument `numlist` (a non-empty list of numbers), and returns the sorted list of numbers which appear with the highest frequency in `numlist` (i.e. the **mode**). For example:


```python
>>> mode([0, 2, 0, 1])
[0]
>>> mode([5, 1, 1, 5])
[1, 5]
>>> mode([4.0])
[4.0]
```

In [None]:
# Your solution here

# Problem: Top-5 Words

Write a function `top5_words(text)` that takes a single argument `text` (a non-empty string), tokenises `text` into words based on whitespace (once again, without any stripping of punctuation or case normalisation), and returns the top-5 words as a list of strings, in descending order of frequency. If there is a tie in frequency at any point, the words with the same frequency should be sub-sorted alphabetically. If there are less than five distinct words in `text`, the function should return the words in descending order of frequency (with the same tie-breaking mechanism). For example:


```python
>>> top5_words("one one was a racehorse two two was one too")
["one", "two", "was", "a", "racehorse"]
>>> top5_words("buffalo buffalo buffalo chicken buffalo")
["buffalo", "chicken"]
>>> top5_words("the quick brown fox jumped over the lazy dog")
["the", "brown", "dog", "fox", "jumped"]
```

In [None]:
# Your solution here


> ## Sorting Tuples
> One trick that you may find useful is that when sorting a list of tuples, sorting is done based on the first element, and if there is a tie in the value of the first element, the second element is used to break the tie, etc. For example:

In [None]:
print(sorted([(1, 4), (1, 2), (1, 1)]))
print(sorted([(1, 'b'), (1, 'a'), (0, 'z')]))

> In this exercise you'll need to sort numbers in index 0 from highest to lowest and tiebreak strings in position 1 lowest to highest. This is difficult to do without realising that you can convert the numbers to *negative* values, in which case both numbers and strings can be sorted lowest to highest:

In [None]:
my_list = [(-4, "hello"), (-1, "toy"), (-7, "marshmallow"), (-4, "alphabet")]
print(sorted(my_list))

# Sets

Finally, we introduce a useful but somewhat obscure data structure called a **set**. If you are familiar with sets in mathematics, it is exactly the same concept. For the rest of us, a set is like a list, but with unique elements, and no order. Or perhaps even clearer, it is a dictionary with keys but no values.

Below we define an empty set, and a set with some elements.

Note:

* Although a set with elements uses curly braces, you cannot use empty curly braces to define an empty set as that would be an empty dictionary.
* The order of elements in a set is not significant (similar to dictionaries). Try running the code below more than once to prove this.


In [None]:
my_set1 = set()
my_set2 = {"a", "b", "c"}
print(my_set1, my_set2)


It doesn't make sense to use the "order" of a set anyway, but if you convert it to a type which does have ordering such as a list, remember that the order may be different each time your code is run.

Because sets contain unique elements, they automatically remove duplicate elements:


In [None]:
my_set3 = {"a", "b", "c", "a", "a"}
print(my_set3)


> ## Empty types
> We use `set()` to create an empty set because we can't use brackets or delimiters like we do for other types: the braces we use with sets are already taken for dictionaries. In fact, while we use pairs of brackets or delimiters for creating new dictionaries `{}`, tuples `()`, lists `[]` or strings `""`, this notation is actually just a shortcut to a more universal way of creating a new object: by writing the type name with a pair of parentheses:

In [None]:
empty_dict = dict()
empty_tuple = tuple()
empty_list = list()
empty_string = str()

print(empty_dict=={})
print(empty_tuple==())
print(empty_list==[])
print(empty_string=="")

# Useful Set Operators

Like lists, you can use the `in` operator with sets, to test for a given element:


In [None]:
my_set={"a", "b", "c"}
print("a" in my_set)
print("d" in my_set)


You can make sets from sequences. This is a good hack for removing duplicate elements:


In [None]:
my_sequence = "hello"
my_set = set(my_sequence)
print(my_set)


You can take the difference between sets, join them together (find their union = `|`), and find their intersection (the elements that are in both = `&`).


In [None]:
my_set1 = {2, 3, 4}
my_set2 = {2, 5}

print(f"{my_set1} - {my_set2} = {my_set1 - my_set2}")
print(f"{my_set2} - {my_set1} = {my_set2 - my_set1}")
print(f"Union: {my_set2} | {my_set1} = {my_set2 | my_set1}")
print(f"Intersection: {my_set2} & {my_set1} = {my_set2 & my_set1}")

# Useful Set Methods

Finally, as with lists, sets are **mutable**, so you can add and remove elements. Note `.add()` is the method to add to a set, not `.append()` or `.insert()` as with lists:


In [None]:
my_set = {"a", 1, "sandy"}
my_set.add("kim")
print(my_set)

my_set.remove("sandy")
print(my_set)


You can also find the length of a set:


In [None]:
my_set = {"a", 1, "kim"}
print(len(my_set))


Because sets have no order, it does not make sense to index or slice them. You cannot do this:


```python
my_set = {"a", 1, "bob"}
print(my_set[0])
```

In [None]:
# Try running the last example here

# Set Problem

Write a function `mutual_friends(list1,list2)` that takes two lists of friends and returns the number of friends in common. Use sets. Your problem should behave as follows:


```python
>>> mutual_friends(["Bob","Joe"],["Esmerelda"])
0
>>> mutual_friends(["Bob","Joe"],["Bob","Joe"])
2
>>> mutual_friends(["Bob","Joe"],["Bob","Joe","Keitha"])
2
```  

In [None]:
# Your solution here