### Problem
You have an N-element tuple or sequence that you would like to unpack into a collection of N variables.

In [1]:
p = (4, 5)
x, y = p

In [2]:
x

4

In [3]:
y

5

In [4]:
data = ['ACME', 50, 91.1, (2012, 12, 21)]
name, shares, price, date = data

In [5]:
name

'ACME'

In [6]:
date

(2012, 12, 21)

In [7]:
name, shares, price, (year, mon, day) = data

In [8]:
name

'ACME'

In [9]:
year

2012

In [10]:
mon

12

In [11]:
day

21

Throwaway variables

In [12]:
_, shares, price, _ = data

In [13]:
shares

50

In [14]:
price

91.1

### Problem
You need to unpack N elements from an iterable, but the iterable may be longer than N elements, causing a "too many values to unpack" exception.

In [15]:
def drop_first_last(grades):
    first, *middle, last = grades
    return avg(middle)

In [16]:
record = ('Dave', 'dave@example.com', '773-555-1212', '847-555-1212')

In [17]:
name, email, *phone_numbers = record

In [18]:
name

'Dave'

In [19]:
email

'dave@example.com'

In [20]:
phone_numbers

['773-555-1212', '847-555-1212']

In [22]:
*trailing, current = [10, 8, 7, 1, 9, 5, 10, 3]

In [23]:
trailing

[10, 8, 7, 1, 9, 5, 10]

In [24]:
current

3

In [25]:
records = [
    ('foo', 1, 2),
    ('bar', 'hello'),
    ('foo', 3, 4),
]

In [26]:
def do_foo(x, y):
    print('foo', x, y)

In [27]:
def do_bar(s):
    print('bar', s)

In [28]:
for tag, *args in records:
    if tag == 'foo':
        do_foo(*args)
    elif tag == 'bar':
        do_bar(*args)

foo 1 2
bar hello
foo 3 4


In [29]:
line = 'nobody:*:-2:-2:Unprivliged User:/var/empty:/usr/bin/false'

In [30]:
uname, *fields, homedir, sh = line.split(':')

In [31]:
uname

'nobody'

In [32]:
homedir

'/var/empty'

In [33]:
sh

'/usr/bin/false'

In [34]:
record = ('ACME', 50, 123.45, (12, 18, 2012))

In [35]:
name, *_, (*_, year) = record

In [36]:
name

'ACME'

In [37]:
year

2012

In [39]:
items = [1, 10, 7, 4, 5, 9]

In [40]:
head, *tail = items

In [41]:
head

1

In [42]:
tail

[10, 7, 4, 5, 9]

In [46]:
# Be careful not to do clever recursion like this
def sum(items):
    head, *tail = items
    return head + sum(tail) if tail else head

In [47]:
sum(items)

36

### Problem
You want to keep a limited history of the last few items seen during iteration or during some other kind of processing.

In [48]:
# collections.deque is a perfect use for this
from collections import deque

In [49]:
def search(lines, pattern, history=5):
    previous_lines = deque(maxlen=history)
    for line in lines:
        if pattern in line:
            yield line, previous_lines
        previous_lines.append(line)

In [50]:
# Example use on a file
if __name__ == '__main__':
    with open('somefile.txt') as f:
        for line, prevlines in search(f, 'python', 5):
            for pline in prevlines:
                print(pline, end='')
            print(line, end='')
            print('-'*20)

FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'

In [51]:
q = deque(maxlen=3)
q.append(1)
q.append(2)
q.append(3)

In [52]:
q

deque([1, 2, 3])

In [53]:
q.append(4)

In [54]:
q

deque([2, 3, 4])

In [55]:
q.append(5)

In [56]:
q

deque([3, 4, 5])

Although, you could do the same with the list, the queue solution is far more elegant and runs a lot faster.

For deque, if you don't give it a maxsize, then you get an unbounded queue.

In [57]:
q = deque()
q.append(1)
q.append(2)
q.append(3)

In [58]:
q

deque([1, 2, 3])

In [59]:
q.appendleft(4)

In [60]:
q

deque([4, 1, 2, 3])

In [61]:
q.pop()

3

In [62]:
q

deque([4, 1, 2])

In [63]:
q.popleft()

4

### Problem
You want to make a list of the largest or smallest N items in a collection.

In [64]:
# Using heapq module
import heapq

nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(3, nums))
print(heapq.nsmallest(3, nums))

[42, 37, 23]
[-4, 1, 2]


In [65]:
portfolio = [
    {'name': 'IBM', 'shares': 100, 'price': 91.1},
    {'name': 'AAPL', 'shares': 50, 'price': 543.22},
    {'name': 'FB', 'shares': 200, 'price': 21.09},
    {'name': 'HPQ', 'shares': 35, 'price': 31.75},
    {'name': 'YHOO', 'shares': 45, 'price': 16.35},
    {'name': 'ACME', 'shares': 75, 'price': 115.65}
]

In [66]:
cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])

In [67]:
print(cheap)
print(expensive)

[{'name': 'YHOO', 'shares': 45, 'price': 16.35}, {'name': 'FB', 'shares': 200, 'price': 21.09}, {'name': 'HPQ', 'shares': 35, 'price': 31.75}]
[{'name': 'AAPL', 'shares': 50, 'price': 543.22}, {'name': 'ACME', 'shares': 75, 'price': 115.65}, {'name': 'IBM', 'shares': 100, 'price': 91.1}]


If you are looking for the N smallest or largest items and N is small compared to the
overall size of the collection, these functions provide superior performance. Underneath the covers, they work by first converting the data into a list where items are ordered as
a heap. For example:

In [68]:
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
import heapq
heap = list(nums)
heapq.heapify(heap)
heap

[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]

In [69]:
# Get the smallest 3 numbers
heapq.heappop(heap)

-4

In [71]:
heapq.heappop(heap)

1

In [72]:
heapq.heappop(heap)

2

The nlargest() and nsmallest() functions are most appropriate if you are trying to
find a relatively small number of items. If you are simply trying to find the single smallest
or largest item (N=1), it is faster to use min() and max() . Similarly, if N is about the
same size as the collection itself, it is usually faster to sort it first and take a slice (i.e.,
use sorted(items)[:N] or sorted(items)[-N:] ). It should be noted that the actual
implementation of nlargest() and nsmallest() is adaptive in how it operates and will
carry out some of these optimizations on your behalf (e.g., using sorting if N is close to
the same size as the input).

### Problem
You want to implement a queue that sorts items by a given priority and always returns the item with the highest priority on each pop operation.

In [73]:
import heapq

class PriorityQueue:
    def __init__(self):
        self._queue = []
        self._index = 0
        
    def push(self, item, priority):
        heapq.heappush(self._queue, (-priority, self._index, item))
        self._index += 1
        
    def pop(self):
        return heapq.heappop(self._queue)[-1]

In [74]:
class Item:
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return 'Item({!r})'.format(self.name)

In [75]:
q = PriorityQueue()

In [76]:
q.push(Item('foo'), 1)

In [77]:
q.push(Item('bar'), 5)

In [78]:
q.push(Item('spam'), 4)
q.push(Item('grok'), 1)

In [79]:
q.pop()

Item('bar')

In [80]:
q.pop()

Item('spam')

In [81]:
q.pop()

Item('foo')

In [82]:
q.pop()

Item('grok')

### Problem
You want to make a dictionary that maps keys to more than one value (a so-called "multidict").

In [83]:
d = {
    'a': [1, 2, 3],
    'b': [4, 5]
}

e = {
    'a': {1, 2, 3},
    'b': {4, 5}
}

In [89]:
# Using defaultdict
from collections import defaultdict

d = defaultdict(list)
d['a'].append(1)
d['a'].append(2)
d['a'].append(3)
d['b'].append(4)
d['b'].append(5)

In [90]:
d

defaultdict(list, {'a': [1, 2, 3], 'b': [4, 5]})

In [95]:
# For sets
d = defaultdict(set)

d['a'].add(1)
d['a'].add(2)
d['b'].add(4)

In [96]:
d

defaultdict(set, {'a': {1, 2}, 'b': {4}})

In [97]:
d = {} # A regular dictionary
d.setdefault('a', []).append(1)
d.setdefault('a', []).append(2)
d.setdefault('b', []).append(4)

In [98]:
d

{'a': [1, 2], 'b': [4]}

In [99]:
# Messy way of making a multivalued dictionary
d = {}
for key, value in pairs:
    if key not in d:
        d[key] = []
    d[key].append(value)

NameError: name 'pairs' is not defined

In [100]:
# Better way of making multivalued dictionary
d = defaultdict(list)
for key, value in pairs:
    d[key].append(value)

NameError: name 'pairs' is not defined

### Problem
You want to create a dictionary, and you also want to control the order of items when iterating or serializing.

In [101]:
# Using ordereddict
from collections import OrderedDict
d = OrderedDict()
d['foo'] = 1
d['bar'] = 2
d['spam'] = 3
d['grok'] = 4

# Outputs "foo 1", "bar 2", "spam 3", "grok 4"
for key in d:
    print(key, d[key])

foo 1
bar 2
spam 3
grok 4


In [102]:
import json
json.dumps(d)

'{"foo": 1, "bar": 2, "spam": 3, "grok": 4}'

An OrderedDict internally maintains a doubly linked list that orders the keys according
to insertion order. When a new item is first inserted, it is placed at the end of this list.
Subsequent reassignment of an existing key doesn’t change the order.
Be aware that the size of an OrderedDict is more than twice as large as a normal dic‐
tionary due to the extra linked list that’s created. Thus, if you are going to build a data
structure involving a large number of OrderedDict instances (e.g., reading 100,000 lines
of a CSV file into a list of OrderedDict instances), you would need to study the re‐
quirements of your application to determine if the benefits of using an OrderedDict
outweighed the extra memory overhead.

### Problem
You want to perform various calculations (eg. minimum value, maximum value, sorting, etc.) on a dictionary of data.

In [103]:
prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}

In [104]:
# Invert the keys and values of the dictionary using zip()
min_price = min(zip(prices.values(), prices.keys()))

In [105]:
max_price = max(zip(prices.values(), prices.keys()))

In [106]:
prices_sorted = sorted(zip(prices.values(), prices.keys()))

In [107]:
prices_sorted

[(10.75, 'FB'),
 (37.2, 'HPQ'),
 (45.23, 'ACME'),
 (205.55, 'IBM'),
 (612.78, 'AAPL')]

In [108]:
min(prices, key=lambda k: prices[k])

'FB'

In [109]:
max(prices, key=lambda k: prices[k])

'AAPL'

In [111]:
min_value = prices[min(prices, key=lambda k: prices[k])]

In [112]:
min_value

10.75

### Problem
You have two dictionaries and want to find out what they might have in common(same keys, same values, etc.).

In [113]:
a = {
    'x' : 1,
    'y' : 2,
    'z' : 3
}

In [114]:
b = {
    'w' : 10,
    'x' : 11,
    'y' : 2
}

In [115]:
# Find keys in common
a.keys() & b.keys()

{'x', 'y'}

In [116]:
# Find keys in a that are not in b
a.keys() - b.keys()

{'z'}

In [118]:
# Find (key,value) pairs in common
a.items() & b.items()

{('y', 2)}

In [124]:
# Make a new dictionary with selected keys removed
c = {key:a[key] for key in a.keys() - {'z', 'w'}}

In [125]:
c

{'x': 1, 'y': 2}

### Problem
You want to eliminate the duplicate values in a sequence, but preserve the order of the remaining items.

In [126]:
# If the items in the sequence are hashable
def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)

In [127]:
a = [1, 5, 2, 1, 9, 1, 5, 10]
list(dedupe(a))

[1, 5, 2, 9, 10]

In [128]:
# If the items in the sequence are not hashable, then change the function slightly
def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

In [130]:
a = [{'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}]
list(dedupe(a, key=lambda d: (d['x'], d['y'])))

[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]

In [131]:
list(dedupe(a, key=lambda d: d['x']))

[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]

### Problem
Your program has become an unreadable mess of hardcoded slice indices and you want to clean it up.

In [134]:
record = '....................100.......513.25..........'
cost = int(record[20:32] * float(record[40:48]))

ValueError: could not convert string to float: '......'

In [136]:
# Using built-in slice
SHARES = slice(20, 32)
PRICE = slice(40, 48)

cost = int(record[SHARES]) * float(record[PRICE])

ValueError: invalid literal for int() with base 10: '100.......51'

In [137]:
items = [0, 1, 2, 3, 4, 5, 6]
a = slice(2, 4)
items[2:4]

[2, 3]

In [138]:
items[a]

[2, 3]

In [140]:
items[a] = [10, 11]

In [141]:
items

[0, 1, 10, 11, 4, 5, 6]

In [142]:
del items[a]

In [143]:
items

[0, 1, 4, 5, 6]

In [144]:
a = slice(10, 50, 2)

In [145]:
a.start

10

In [146]:
a.stop

50

In [147]:
a.step

2

In [148]:
s = 'Helloworld'
a.indices(len(s))

(10, 10, 2)

In [151]:
for i in range(*a.indices(len(s))):
    print(s[i])

### Problem
You have a sequence of items, and you'd like to determine the most frequently occuring items in the sequence.

In [152]:
# Using collections.Counter
words = ['look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into', 'my', 'eyes', "you're", 'under']

In [153]:
from collections import Counter
word_counts = Counter(words)
top_three = word_counts.most_common(3)
print(top_three)

[('eyes', 8), ('the', 5), ('look', 4)]


Under the covers, a Counter is a dictionary that maps the items to the number of occurences.

In [154]:
word_counts['not']

1

In [155]:
word_counts['eyes']

8

In [156]:
morewords = ['why', 'are', 'you', 'not', 'looking', 'in', 'my', 'eyes']

In [157]:
for word in morewords:
    word_counts[word] += 1

In [158]:
word_counts['eyes']

9

In [159]:
word_counts.update(morewords)