### Idiomatic Python: Iterating Dictionaries

This Pythonic tip is very similar in nature to the video I did on Pythonic iteration of sequence types - we typically do not use indexing to iterate over elements of a sequence (there may be cases where we want to, but generally it is rarely needed).

When iterating over the elements of a Python `dict` we also have several Pythonic techniques.

Let's say you need to iterate over the keys of a Python dictionary - that is completely straightforward, as the default in Python is to iterate over the keys only:

In [1]:
characters = 'abcdefgh'
d = dict(zip(characters, (ord(ch) for ch in characters)))
d

{'a': 97, 'b': 98, 'c': 99, 'd': 100, 'e': 101, 'f': 102, 'g': 103, 'h': 104}

In [2]:
for k in d:
    print(k)

a
b
c
d
e
f
g
h


We can also get a list of all the keys this way:

In [3]:
list(d)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

But what about iterating over the values in the dictionary?

I often see people essentially taking the access-by-index approach you can take (but shoudl avoid if possible) for sequence types:

In [4]:
for k in d:
    print(d[k])

97
98
99
100
101
102
103
104


And if we just want a list of all the values we could do this (but don't!):

In [5]:
[d[k] for k in d]

[97, 98, 99, 100, 101, 102, 103, 104]

Instead we can use the **dictionary view**, obtained by using the dictionary's `values()` method:

In [6]:
d.values()

dict_values([97, 98, 99, 100, 101, 102, 103, 104])

Now, `d.values()` is a view object, not a list, but it **is** an **iterable**:

In [7]:
list(d.values())

[97, 98, 99, 100, 101, 102, 103, 104]

In [8]:
for val in d.values():
    print(val)

97
98
99
100
101
102
103
104


As you can see we can iterate over the values of a dictionary **without using key lookups**.

Sometimes we need to iterate over the both the key and the value at the same time - and again there is a temptation to revert to using key lookups:

In [9]:
for k in d:
    print(f"{k} = {d[k]}")

a = 97
b = 98
c = 99
d = 100
e = 101
f = 102
g = 103
h = 104


But Python provides another dictionary view that yields back **both** the key and the value, as a **tuple**:

In [10]:
for item in d.items():
    print(item.__class__.__name__, item)

tuple ('a', 97)
tuple ('b', 98)
tuple ('c', 99)
tuple ('d', 100)
tuple ('e', 101)
tuple ('f', 102)
tuple ('g', 103)
tuple ('h', 104)


Since we are iterating over 2 element tuples, we can easily **unpack** them as well:

In [11]:
for key, value in d.items():
    print(f"{key} = {value}")

a = 97
b = 98
c = 99
d = 100
e = 101
f = 102
g = 103
h = 104


So this is the more Pythonic way of iterating over both the keys and the the values of dictionaries.

Let's take a look at a more practical example.

Suppose we have a list of widgets for sale on our web site.

These widgets have a few attributes:
- unique internal identifier (some uuid, the way we normally identify widgets)
- widget name (may not be unique)
- some kind of bar code (also unique)

Suppose we already have a dictionary that contains this data, keyd by the unique internal identifier.

But our app also needs to do quick lookups based on the bar code - this means we want a new dictionary whose keys are the barcodes, and whose values will be the uuid and the widget name).

Let's build a sample data set. 

I'll use the faker library that I've covered in a previous video.

In [12]:
from random import randint, seed
from faker import Faker

seed(0)
Faker.seed(0)
fake = Faker()

widgets = {
    str(fake.uuid4(cast_to=str)): (fake.ean13(), f"widget-{randint(1, 5)}")
    for _ in range(6)
}

In [13]:
widgets

{'e3e70682-c209-4cac-a29f-6fbed82c07cd': ('6048764759387', 'widget-4'),
 'c17c6279-23c6-412f-8826-867323a7711a': ('1948924115785', 'widget-4'),
 '50f24455-6f25-42a2-9a92-118719c78df4': ('9387784080161', 'widget-1'),
 'ab0c1681-c8f8-43d0-9329-0a4cb5d32b16': ('0975351393326', 'widget-3'),
 '1759edc3-72ae-4244-8b01-63c1cd9d2b7d': ('1587148418588', 'widget-5'),
 '9a6a5f92-cca7-4147-b6be-1f723405095c': ('8947196593423', 'widget-4')}

So with this dictionary we can quickly lookup a widget by it's UUID, but not by its ean13 code.

So, let's create a new dict that gives us this quick lookup:

We can do it this way, but don't!

In [14]:
widget_ean_lookup = {}
for uuid in widgets:
    widget_ean_lookup[widgets[uuid][0]] = uuid, widgets[uuid][1]

In [15]:
widget_ean_lookup

{'6048764759387': ('e3e70682-c209-4cac-a29f-6fbed82c07cd', 'widget-4'),
 '1948924115785': ('c17c6279-23c6-412f-8826-867323a7711a', 'widget-4'),
 '9387784080161': ('50f24455-6f25-42a2-9a92-118719c78df4', 'widget-1'),
 '0975351393326': ('ab0c1681-c8f8-43d0-9329-0a4cb5d32b16', 'widget-3'),
 '1587148418588': ('1759edc3-72ae-4244-8b01-63c1cd9d2b7d', 'widget-5'),
 '8947196593423': ('9a6a5f92-cca7-4147-b6be-1f723405095c', 'widget-4')}

Let's re-write this so we aren't doing these key lookups (especially since we are doing that twice in each loop iteration - we could fix that by using a temporary variable, but we really don't need to do that):

In [16]:
widgets_ean = {}
for uuid, (ean, name) in widgets.items():
    widgets_ean[ean] = uuid, name

In [17]:
widgets_ean

{'6048764759387': ('e3e70682-c209-4cac-a29f-6fbed82c07cd', 'widget-4'),
 '1948924115785': ('c17c6279-23c6-412f-8826-867323a7711a', 'widget-4'),
 '9387784080161': ('50f24455-6f25-42a2-9a92-118719c78df4', 'widget-1'),
 '0975351393326': ('ab0c1681-c8f8-43d0-9329-0a4cb5d32b16', 'widget-3'),
 '1587148418588': ('1759edc3-72ae-4244-8b01-63c1cd9d2b7d', 'widget-5'),
 '8947196593423': ('9a6a5f92-cca7-4147-b6be-1f723405095c', 'widget-4')}

And let's take this one step further by using a comprehension instead:

In [18]:
widgets_ean = {
    ean: (uuid, name) 
    for uuid, (ean, name) in widgets.items()
}

In [19]:
widgets_ean

{'6048764759387': ('e3e70682-c209-4cac-a29f-6fbed82c07cd', 'widget-4'),
 '1948924115785': ('c17c6279-23c6-412f-8826-867323a7711a', 'widget-4'),
 '9387784080161': ('50f24455-6f25-42a2-9a92-118719c78df4', 'widget-1'),
 '0975351393326': ('ab0c1681-c8f8-43d0-9329-0a4cb5d32b16', 'widget-3'),
 '1587148418588': ('1759edc3-72ae-4244-8b01-63c1cd9d2b7d', 'widget-5'),
 '8947196593423': ('9a6a5f92-cca7-4147-b6be-1f723405095c', 'widget-4')}

So this:

In [20]:
widget_ean_lookup = {}
for uuid in widgets:
    widget_ean_lookup[widgets[uuid][0]] = uuid, widgets[uuid][1]

vs this:

In [21]:
widgets_ean = {
    ean: (uuid, name) 
    for uuid, (ean, name) in widgets.items()
}

As an added bonus in this video, how about if we want the ability to lookup by widget name - and given that it's not unique, we'll need to return a list of ean/uuid matches for the particular widget name.

We can't use the same technique we just did, since dictionary keys need to be unique, so we can't just add the same widget name multiple times into the dictionary.

What we need is the ability to store each widget name only once in the dictionary, and set the value to a **sequence** of possible values.

To do this, we'll make use of Python's `defaultdict`:

In [22]:
from collections import defaultdict

In [23]:
widgets_name = defaultdict(list)

for uuid, (ean, name) in widgets.items():
    widgets_name[name].append((uuid, ean))

In [24]:
dict(widgets_name)

{'widget-4': [('e3e70682-c209-4cac-a29f-6fbed82c07cd', '6048764759387'),
  ('c17c6279-23c6-412f-8826-867323a7711a', '1948924115785'),
  ('9a6a5f92-cca7-4147-b6be-1f723405095c', '8947196593423')],
 'widget-1': [('50f24455-6f25-42a2-9a92-118719c78df4', '9387784080161')],
 'widget-3': [('ab0c1681-c8f8-43d0-9329-0a4cb5d32b16', '0975351393326')],
 'widget-5': [('1759edc3-72ae-4244-8b01-63c1cd9d2b7d', '1587148418588')]}

You'll notice by the way that we can't really use a comprehension approach here.

So does this mean that you should never key lookups when iterating over a dictionary?

Of course not, you may be in a situation where you can't, or your code might be more efficient - just be aware of the dictionary view objects, and know when to use them for more Pythonic code.