<a href="https://colab.research.google.com/github/kilos11/Python-Cookbook-By-Orielly/blob/main/Unpacking_a_Sequence_into_Separate_Variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
Problem
You have an N-element tuple or sequence that you would like to unpack into a collection
of N variables.

In [None]:
p = (4,5,6)

In [None]:
x,y,z = p

In [None]:
x,y,z

(4, 5, 6)

In [None]:
data = [ 'ACME', 50, 91.1, (2012, 12, 21) ]
name, shares, price, date = data
print(name)
print(price)

ACME
91.1


In [None]:
name, shares, price, (year, mon, day) = data
print(name)
print(year)
print(mon)
print(day)


ACME
2012
12
21


In [None]:
#Unpacking actually works with any object that happens to be iterable, not just tuples or
#lists. This includes strings, files, iterators, and generators. For example:
s = "Hello"
a, b, c, d, e = s
a
b
e

'o'

In [None]:
data = [ 'ACME', 50, 91.1, (2012, 12, 21) ]



Unpacking Elements from Iterables of Arbitrary
Length


In [None]:
def drop_first_last(grades):
    first, *middle, last = grades
    return avg(middle)

suppose you have user records that consist of a name and email
address, followed by an arbitrary number of phone numbers. You could unpack the
records like this:

In [None]:
record = ('Dave', 'dave@example.com', '773-555-1212', '847-555-1212')
name, email, *phone_numbers = record
print(name)
print(email)
print(phone_numbers)


Dave
dave@example.com
['773-555-1212', '847-555-1212']


If you want to see how the most recent quarter stacks up to the average of the first seven,
you could do something like this:

In [None]:
# Unpack the sales record into trailing quarters and the current quarter
*trailing_qtrs, current_qtr = sales_record

# Calculate the average of the trailing quarters
trailing_avg = sum(trailing_qtrs) / len(trailing_qtrs)

# Call the function avg_comparison to compare the trailing average with the current quarter
return avg_comparison(trailing_avg, current_qtr)

*trailing, current = [10, 8, 7, 1, 9, 5, 10, 3]
trailing


Sometimes you might want to unpack values and throw them away. You can’t just specify
a bare * when unpacking, but you could use a common throwaway variable name, such
as _ or ign (ignored). For example:

In [None]:
record = ('ACME', 50, 123.45, (12, 18, 2012))
name, *_, (*_, year) = record
print(name)
print(year)


ACME
2012


There is a certain similarity between star unpacking and list-processing features of various
functional languages. For example, if you have a list, you can easily split it into head
and tail components like this:

In [None]:
items = [1, 10, 7, 4, 5, 9]
*head, tail = items
print(head)
print(tail)


[1, 10, 7, 4, 5]
9


One could imagine writing functions that perform such splitting in order to carry out
some kind of clever recursive algorithm. For example:

In [None]:
# Define a recursive function 'sum' that calculates the sum of a list of items
def sum(items):

    # Recursive case: unpack the list into the head and tail
    head, *tail = items

    # Return the sum of the head and the result of the recursive call on the tail
    return head + sum(tail) if tail else head

# Example usage: calculate the sum of items using the 'sum' function
sum_result = sum(items)
print(sum_result)


36


You want to keep a limited history of the last few items seen during iteration or during
some other kind of processing

In [None]:
# Import the 'deque' class from the 'collections' module
from collections import deque

# Define a search function that yields lines containing a given pattern along with a specified number of previous lines
def search(lines, pattern, history=5):
    # Create a deque (double-ended queue) to store the previous lines, with a maximum length of 'history'
    previous_lines = deque(maxlen=history)

    # Iterate through each line in the input 'lines'
    for line in lines:
        # Check if the pattern is present in the current line
        if pattern in line:
            # Yield the current line along with the deque of previous lines
            yield line, previous_lines

        # Append the current line to the deque of previous lines
        previous_lines.append(line)

# Example use on a file
if __name__ == '__main__':
    # Open a file named 'somefile.txt'
    with open('somefile.txt') as f:
        # Iterate through the lines and previous lines returned by the search function
        for line, prevlines in search(f, 'python', 5):
            # Print the previous lines before the line containing the pattern
            for pline in prevlines:
                print(pline, end='')

            # Print the line containing the pattern
            print(line, end='')

            # Print a separator line
            print('-'*20)


Problem:
You want to make a list of the largest or smallest N items in a collection.

Solution:
The heapq module has two functions—nlargest() and nsmallest()—that do exactly
what you want. For example:

In [None]:
import heapq

nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(5, nums)) # Prints [42, 37, 23]
print(heapq.nsmallest(5, nums)) # Prints [-4, 1, 2]

[42, 37, 23, 23, 18]
[-4, 1, 2, 2, 7]


Both functions also accept a key parameter that allows them to be used with more
complicated data structures. For example:

In [None]:
import heapq

# Define a portfolio list containing dictionaries representing stock information
portfolio = [
    {'name': 'IBM', 'shares': 100, 'price': 91.1},
    {'name': 'AAPL', 'shares': 50, 'price': 543.22},
    {'name': 'FB', 'shares': 200, 'price': 21.09},
    {'name': 'HPQ', 'shares': 35, 'price': 31.75},
    {'name': 'YHOO', 'shares': 45, 'price': 16.35},
    {'name': 'ACME', 'shares': 75, 'price': 115.65}
]

# Find the three cheapest stocks using heapq's nsmallest function
cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])

# Find the three most expensive stocks using heapq's nlargest function
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])

# Print the results
print("Cheapest stocks:")
print(cheap)

print("\nMost expensive stocks:")
print(expensive)


Cheapest stocks:
[{'name': 'YHOO', 'shares': 45, 'price': 16.35}, {'name': 'FB', 'shares': 200, 'price': 21.09}, {'name': 'HPQ', 'shares': 35, 'price': 31.75}]

Most expensive stocks:
[{'name': 'AAPL', 'shares': 50, 'price': 543.22}, {'name': 'ACME', 'shares': 75, 'price': 115.65}, {'name': 'IBM', 'shares': 100, 'price': 91.1}]


**Discussion:**

If you are looking for the N smallest or largest items and N is small compared to the
overall size of the collection, these functions provide superior performance. Underneath the covers, they work by first converting the data into a list where items are ordered as
a heap. For example: *italicized text*

In [None]:
# Import the heapq module, which provides heap-related functions
import heapq

# Create a list 'nums' containing integers
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]

# Create a copy of the 'nums' list called 'heap'
# Note: This is done to demonstrate the heapq.heapify() function, as it modifies the list in-place
heap = list(nums)

# Use the heapq.heapify() function to convert the 'heap' list into a min-heap
heapq.heapify(heap)

# Print the resulting min-heap
heap




[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]

The most important feature of a heap is that heap[0] is always the smallest item. Moreover,
subsequent items can be easily found using the heapq.heappop() method, which
pops off the first item and replaces it with the next smallest item (an operation that
requires O(log N) operations where N is the size of the heap). For example, to find the
three smallest items, you would do this:

In [None]:
print(heapq.heappop(heap))#first smallest
print(heapq.heappop(heap))#second smallest
print(heapq.heappop(heap))#third smallest
print(heapq.heappop(heap))#fourth smallest

**Implementing a Priority Queue:**

You want to implement a queue that sorts items by a given priority and always returns
the item with the highest priority on each pop operation

In [3]:
# Import the heapq module, which provides heap-related functions
import heapq

# Define a PriorityQueue class
class PriorityQueue:
    def __init__(self):
        # Initialize an empty list to store the elements in the priority queue
        self._queue = []
        # Initialize an index to track the order of elements pushed into the queue
        self._index = 0

    # Method to push an item into the priority queue with a specified priority
    def push(self, item, priority):
        # Use heapq.heappush() to insert a tuple into the heap
        # The tuple contains the negation of priority (to create a max-heap),
        # the index to maintain order, and the item itself
        heapq.heappush(self._queue, (-priority, self._index, item))
        # Increment the index for the next element
        self._index += 1

    # Method to pop and return the item with the highest priority
    def pop(self):
        # Use heapq.heappop() to remove and return the smallest element from the heap
        # Retrieve the item from the tuple and return it
        return heapq.heappop(self._queue)[-1]


Here is an example of how it might be used:

In [2]:
class Item:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return 'Item({!r})'.format(self.name)

In [7]:
q = PriorityQueue()
print(q)
q.push(Item('foo'), 1)


<__main__.PriorityQueue object at 0x7f78d18eb8e0>


**Mapping Keys to Multiple Values in a Dictionary:**

A dictionary is a mapping where each key is mapped to a single value. If you want to
map keys to multiple values, you need to store the multiple values in another container
such as a list or set. For example, you might make dictionaries like this:

In [8]:
d = {
'a' : [1, 2, 3],
'b' : [4, 5]
}
e = {
'a' : {1, 2, 3},
'b' : {4, 5}
}

In [11]:
from collections import defaultdict

d = defaultdict(list)
d['a'].append(1)
d['a'].append(2)
d['b'].append(4)
d = defaultdict(set)

**Keeping Dictionaries in Order:**

You want to create a dictionary, and you also want to control the order of items when
iterating or serializing.To control the order of items in a dictionary, you can use an OrderedDict from the
collections module. It exactly preserves the original insertion order of data when
iterating. For example:

In [12]:
from collections import OrderedDict

d = OrderedDict()
d['foo'] = 1
d['bar'] = 2
d['spam'] = 3
d['grok'] = 4

In [13]:
for key in d:
    print(key, d[key])

foo 1
bar 2
spam 3
grok 4


if you want
to precisely control the order of fields appearing in a JSON encoding, first building the
data in an OrderedDict will do the trick:

In [14]:
import json

json.dumps(d)

'{"foo": 1, "bar": 2, "spam": 3, "grok": 4}'

**Calculating with Dictionaries:**
**Problem:**

You want to perform various calculations (e.g., minimum value, maximum value, sorting,
etc.) on a dictionary of data.

In [15]:
prices = {
'ACME': 45.23,
'AAPL': 612.78,
'IBM': 205.55,
'HPQ': 37.20,
'FB': 10.75
}

In order to perform useful calculations on the dictionary contents, it is often useful to
invert the keys and values of the dictionary using zip(). For example, here is how to
find the minimum and maximum price and stock name:

In [18]:
min_price = min(zip(prices.values(), prices.keys()))
print(min_price)
# min_price is (10.75, 'FB')

max_price = max(zip(prices.values(), prices.keys()))
print(max_price)
# max_price is (612.78, 'AAPL')

(10.75, 'FB')
(612.78, 'AAPL')


# Similarly, to rank the data, use zip() with sorted(), as in the following:

In [19]:


# Use the zip() function to create pairs of (price, item) from the 'prices' dictionary
# Sort the pairs based on the price in ascending order
prices_sorted = sorted(zip(prices.values(), prices.keys()))

# Print the sorted list of (price, item) pairs
print(prices_sorted)


[(10.75, 'FB'), (37.2, 'HPQ'), (45.23, 'ACME'), (205.55, 'IBM'), (612.78, 'AAPL')]


# Discussion:
If you try to perform common data reductions on a dictionary, you’ll find that they only
process the keys, not the values. For example:

In [21]:
min(prices) # Returns 'AAPL'
max(prices) # Returns 'IBM'

'IBM'

This is probably not what you want because you’re actually trying to perform a calculation
involving the dictionary values. You might try to fix this using the values()
method of a dictionary:

In [20]:
min(prices.values()) # Returns 10.75
max(prices.values()) # Returns 612.78

612.78

Unfortunately, this is often not exactly what you want either. For example, you may want
to know information about the corresponding keys (e.g., which stock has the lowest
price?).
You can get the key corresponding to the min or max value if you supply a key function
to min() and max(). For example:

In [22]:
min(prices, key=lambda k: prices[k]) # Returns 'FB'
max(prices, key=lambda k: prices[k]) # Returns 'AAPL'

'AAPL'

It should be noted that in calculations involving (value, key) pairs, the key will be
used to determine the result in instances where multiple entries happen to have the same
value. For instance, in calculations such as min() and max(), the entry with the smallest
or largest key will be returned if there happen to be duplicate values. For example:

In [23]:
prices = { 'AAA' : 45.23, 'ZZZ': 45.23 }
print(min(zip(prices.values(), prices.keys())))
print(max(zip(prices.values(), prices.keys())))

(45.23, 'AAA')
(45.23, 'ZZZ')


# 1.9. Finding Commonalities in Two Dictionaries

**Problem:**
You have two dictionaries and want to find out what they might have in common (same
keys, same values, etc.).

In [24]:
a = {
'x' : 1,
'y' : 2,
'z' : 3
}

b = {
'w' : 10,
'x' : 11,
'y' : 2
}

To find out what the two dictionaries have in common, simply perform common set
operations using the keys() or items() methods. For example:

In [31]:


# Use the '&' operator to find the intersection of the keys in dictionaries 'a' and 'b'
# This operation returns a set containing keys that are common to both dictionaries
common_keys = a.keys() & b.keys()

# Print the set of keys that are common to both dictionaries
print(common_keys)


{'y', 'x'}


In [30]:


# Use the '-' operator to find the set of keys in dictionary 'a' that are not present in dictionary 'b'
# This operation returns a set containing keys unique to dictionary 'a'
keys_only_in_a = a.keys() - b.keys()

# Print the set of keys that are in dictionary 'a' but not in dictionary 'b'
print(keys_only_in_a)


{'z'}


In [29]:


# Use the '&' operator to find the intersection of (key, value) pairs in dictionaries 'a' and 'b'
# This operation returns a set containing (key, value) pairs that are common to both dictionaries
common_items = a.items() & b.items()

# Print the set of (key, value) pairs that are common to both dictionaries
print(common_items)


{('y', 2)}


These kinds of operations can also be used to alter or filter dictionary contents. For
example, suppose you want to make a new dictionary with selected keys removed. Here
is some sample code using a dictionary comprehension:

In [32]:


# Use a dictionary comprehension to create a new dictionary 'c' by iterating through keys in 'a'
# Include only the key-value pairs where the key is not in the set {'z', 'w'}
c = {key: a[key] for key in a.keys() - {'z', 'w'}}

# Print the resulting dictionary 'c'
print(c)


{'y': 2, 'x': 1}


# 1.10. Removing Duplicates from a Sequence while
# Maintaining Order

**Problem:**
You want to eliminate the duplicate values in a sequence, but preserve the order of the
remaining items.



In [33]:
# Define a function 'dedupe' that removes duplicates from a sequence while preserving the order
def dedupe(items):
    # Create an empty set 'seen' to keep track of unique items
    seen = set()

    # Iterate through each item in the input sequence
    for item in items:
        # Check if the item is not already in the set 'seen'
        if item not in seen:
            # If not, yield the item, indicating a unique occurrence
            yield item

            # Add the item to the set 'seen' to mark it as seen
            seen.add(item)


In [34]:
a = [1, 5, 2, 1, 9, 1, 5, 10]
list(dedupe(a))

[1, 5, 2, 9, 10]

This only works if the items in the sequence are hashable. If you are trying to eliminate
duplicates in a sequence of unhashable types (such as dicts), you can make a slight
change to this recipe, as follows:

In [35]:
# Define a function 'dedupe' that removes duplicates from a sequence while preserving the order
# The 'key' parameter allows customization of the comparison based on a specific attribute or function
def dedupe(items, key=None):
    # Create an empty set 'seen' to keep track of unique items based on their 'key' values
    seen = set()

    # Iterate through each item in the input sequence
    for item in items:
        # Determine the value to be used for comparison based on the 'key' parameter
        val = item if key is None else key(item)

        # Check if the computed value is not already in the set 'seen'
        if val not in seen:
            # If not, yield the original item, indicating a unique occurrence
            yield item

            # Add the computed value to the set 'seen' to mark it as seen
            seen.add(val)


Here, the purpose of the key argument is to specify a function that converts sequence
items into a hashable type for the purposes of duplicate detection. Here’s how it works:

In [37]:


# Example usage with a list of dictionaries
a = [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 2}, {'x': 2, 'y': 4}]

# Dedupe based on both 'x' and 'y' values
print(list(dedupe(a, key=lambda d: (d['x'], d['y']))))

# Dedupe based on only the 'x' values
print(list(dedupe(a, key=lambda d: d['x'])))


[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]
[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]


**Discussion:**

If all you want to do is eliminate duplicates, it is often easy enough to make a set. For
example:

In [None]:
set(a)

# 1.12. Determining the Most Frequently Occurring Items in a Sequence

**Problem:**
You have a sequence of items, and you’d like to determine the most frequently occurring
items in the sequence

**Solution:**
The collections.Counter class is designed for just such a problem. It even comes with
a handy most_common() method that will give you the answer.
To illustrate, let’s say you have a list of words and you want to find out which words
occur most often. Here’s how you would do it:



In [42]:
words = [
'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
'my', 'eyes', "you're", 'under'
]

In [43]:
from collections import Counter

word_counts = Counter(words)
top_three = word_counts.most_common(3)
print(top_three)

[('eyes', 8), ('the', 5), ('look', 4)]


**Discussion:**
As input, Counter objects can be fed any sequence of hashable input items. Under the
covers, a Counter is a dictionary that maps the items to the number of occurrences. For
example:

In [46]:
print(word_counts['not'])
print(word_counts['eyes'])

1
8


If you want to increment the count manually, simply use addition:

In [49]:
morewords = ['why','are','you','not','looking','in','my','eyes']
for word in morewords:
    word_counts[word] += 1
print(word_counts['eyes'])

#Or, alternatively, you could use the update() method:
word_counts.update(morewords)

11


A little-known feature of Counter instances is that they can be easily combined using
various mathematical operations.

In [53]:
a = Counter(words)
b = Counter(morewords)
print(a)
print(b)

# Combine counts
c = a + b
print(c)

# Subtract counts
d = a - b
print(d)

Counter({'eyes': 8, 'the': 5, 'look': 4, 'into': 3, 'my': 3, 'around': 2, 'not': 1, "don't": 1, "you're": 1, 'under': 1})
Counter({'why': 1, 'are': 1, 'you': 1, 'not': 1, 'looking': 1, 'in': 1, 'my': 1, 'eyes': 1})
Counter({'eyes': 9, 'the': 5, 'look': 4, 'my': 4, 'into': 3, 'not': 2, 'around': 2, "don't": 1, "you're": 1, 'under': 1, 'why': 1, 'are': 1, 'you': 1, 'looking': 1, 'in': 1})
Counter({'eyes': 7, 'the': 5, 'look': 4, 'into': 3, 'my': 2, 'around': 2, "don't": 1, "you're": 1, 'under': 1})


# 1.13. Sorting a List of Dictionaries by a Common Key

**Problem:**
You have a list of dictionaries and you would like to sort the entries according to one
or more of the dictionary values.

**Solution:**
Sorting this type of structure is easy using the operator module’s itemgetter function.
Let’s say you’ve queried a database table to get a listing of the members on your website,
and you receive the following data structure in return:

In [54]:
rows = [
{'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
]

It’s fairly easy to output these rows ordered by any of the fields common to all of the
dictionaries. For example:

In [57]:
# Import the 'itemgetter' function from the 'operator' module
from operator import itemgetter

# Assuming 'rows' is a list of dictionaries

# Sort the list of dictionaries 'rows' based on the 'fname' key using the 'itemgetter' function
rows_by_fname = sorted(rows, key=itemgetter('fname'))

# Sort the list of dictionaries 'rows' based on the 'uid' key using the 'itemgetter' function
rows_by_uid = sorted(rows, key=itemgetter('uid'))

# Print the sorted list of dictionaries based on 'fname'
print("Sorted by 'fname':")
print(rows_by_fname)

# Print the sorted list of dictionaries based on 'uid'
print("\nSorted by 'uid':")
print(rows_by_uid)

#The itemgetter() function can also accept multiple keys. For example, this code
rows_by_lfname = sorted(rows, key=itemgetter('lname','fname'))
print("\nSorted by 'lname','fname':")
print(rows_by_lfname)


Sorted by 'fname':
[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]

Sorted by 'uid':
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]

Sorted by 'lname','fname':
[{'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}]


The functionality of itemgetter() is sometimes replaced by lambda expressions. For
example:

In [59]:
rows_by_fname = sorted(rows, key=lambda r: r['fname'])
rows_by_lfname = sorted(rows, key=lambda r: (r['lname'],r['fname']))
print("Sorted by 'fname':")
print(rows_by_fname)
print("\nSorted by 'lname','fname':")
print(rows_by_lfname)

Sorted by 'fname':
[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]

Sorted by 'lname','fname':
[{'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}]


Last, but not least, don’t forget that the technique shown in this recipe can be applied
to functions such as min() and max(). For example:

In [61]:
print(min(rows,key = itemgetter("uid")))
print(max(rows,key = itemgetter("uid")))

{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}
{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}


# 1.14. Sorting Objects Without Native Comparison Support

**Problem:**
You want to sort objects of the same class, but they don’t natively support comparison
operations.

**Solution:**
The built-in sorted() function takes a key argument that can be passed a callable that
will return some value in the object that sorted will use to compare the objects. For
example, if you have a sequence of User instances in your application, and you want to
sort them by their user_id attribute, you would supply a callable that takes a User
instance as input and returns the user_id. For example:

In [64]:
# Define a 'User' class with an '__init__' method to initialize user objects
# The '__repr__' method is defined to provide a string representation of the object for debugging and display
class User:
    def __init__(self, user_id):
        # Initialize a 'user_id' attribute for each user object
        self.user_id = user_id

    def __repr__(self):
        # Return a string representation of the user object, using the 'user_id'
        return 'User({})'.format(self.user_id)

# Create a list of 'User' objects with different user IDs
users = [User(23), User(3), User(99)]

sorted(users, key=lambda u: u.user_id)


[User(3), User(23), User(99)]

Instead of using lambda, an alternative approach is to use operator.attrgetter():

In [65]:
from operator import attrgetter

sorted(users, key=attrgetter('user_id'))


[User(3), User(23), User(99)]

**Discussion:**

The choice of whether or not to use lambda or attrgetter() may be one of personal
preference. However, attrgetter() is often a tad bit faster and also has the added
feature of allowing multiple fields to be extracted simultaneously. This is analogous to
the use of operator.itemgetter() for dictionaries (see Recipe 1.13). For example, if
User instances also had a first_name and last_name attribute, you could perform a
sort like this:

In [None]:
by_name = sorted(users, key=attrgetter('last_name', 'first_name'))

It is also worth noting that the technique used in this recipe can be applied to functions
such as min() and max(). For example:

In [69]:
print(min(users, key=attrgetter('user_id')))
print(max(users,key=attrgetter("user_id")))


User(3)
User(99)


# 1.15. Grouping Records Together Based on a Field

**Problem:**
You have a sequence of dictionaries or instances and you want to iterate over the data
in groups based on the value of a particular field, such as date.

**Solution:**
The itertools.groupby() function is particularly useful for grouping data together
like this. To illustrate, suppose you have the following list of dictionaries:

In [70]:
rows = [
{'address': '5412 N CLARK', 'date': '07/01/2012'},
{'address': '5148 N CLARK', 'date': '07/04/2012'},
{'address': '5800 E 58TH', 'date': '07/02/2012'},
{'address': '2122 N CLARK', 'date': '07/03/2012'},
{'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
{'address': '1060 W ADDISON', 'date': '07/02/2012'},
{'address': '4801 N BROADWAY', 'date': '07/01/2012'},
{'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]

Now suppose you want to iterate over the data in chunks grouped by date. To do it, first
sort by the desired field (in this case, date) and then use itertools.groupby(): **bold text**

In [71]:
# Import the 'itemgetter' function from the 'operator' module
# Import the 'groupby' function from the 'itertools' module
from operator import itemgetter
from itertools import groupby

# Assuming 'rows' is a list of dictionaries with a 'date' key

# Sort the list of dictionaries 'rows' based on the 'date' key using the 'itemgetter' function
rows.sort(key=itemgetter('date'))

# Iterate through the grouped items based on the 'date' key
for date, items in groupby(rows, key=itemgetter('date')):
    # Print the date, which is the common key for the group
    print(date)

    # Iterate through the items within the current group
    for i in items:
        # Print each item in the group
        print(i)


07/01/2012
{'address': '5412 N CLARK', 'date': '07/01/2012'}
{'address': '4801 N BROADWAY', 'date': '07/01/2012'}
07/02/2012
{'address': '5800 E 58TH', 'date': '07/02/2012'}
{'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}
{'address': '1060 W ADDISON', 'date': '07/02/2012'}
07/03/2012
{'address': '2122 N CLARK', 'date': '07/03/2012'}
07/04/2012
{'address': '5148 N CLARK', 'date': '07/04/2012'}
{'address': '1039 W GRANVILLE', 'date': '07/04/2012'}


If your goal is to simply group the data together by dates into a large data structure that
allows random access, you may have better luck using defaultdict() to build a
multidict, as described in Recipe 1.6. For example:

In [73]:
# Import the 'defaultdict' class from the 'collections' module
from collections import defaultdict

# Assuming 'rows' is a list of dictionaries with a 'date' key

# Create a defaultdict with lists as default values to organize rows by date
rows_by_date = defaultdict(list)

# Iterate through each row in the list of dictionaries
for row in rows:
    # Append the current row to the list associated with its 'date' key in 'rows_by_date'
    rows_by_date[row['date']].append(row)

# This allows the records for each date to be accessed easily
# For example, printing records for the date '07/01/2012'
for r in rows_by_date['07/01/2012']:
    print(r)


{'address': '5412 N CLARK', 'date': '07/01/2012'}
{'address': '4801 N BROADWAY', 'date': '07/01/2012'}


# 1.16. Filtering Sequence Elements

**Problem:**
You have data inside of a sequence, and need to extract values or reduce the sequence
using some criteria.

**Solution:**
The easiest way to filter sequence data is often to use a list comprehension. For example:

In [76]:
mylist = [1, 4, -5, 10, -7, 2, 3, -1]
print([n for n in mylist if n > 0])
print([n for n in mylist if n < 0])


[1, 4, 10, 2, 3]
[-5, -7, -1]


One potential downside of using a list comprehension is that it might produce a large
result if the original input is large. If this is a concern, you can use generator expressions
to produce the filtered values iteratively. For example:

In [79]:
pos = (n for n in mylist if n > 0)
print(pos)

for x in pos:
    print(x)

<generator object <genexpr> at 0x7f789f5b6420>
1
4
10
2
3


Sometimes, the filtering criteria cannot be easily expressed in a list comprehension or
generator expression. For example, suppose that the filtering process involves exception
handling or some other complicated detail. For this, put the filtering code into its own
function and use the built-in filter() function. For example:

In [81]:
# Define a list of values, including strings representing integers, non-numeric characters, and 'N/A'
values = ['1', '2', '-3', '-', '4', 'N/A', '5']

# Define a function 'is_int' to check if a given value can be converted to an integer
def is_int(val):
    try:
        # Attempt to convert the value to an integer
        x = int(val)
        # If successful, return True
        return True
    except ValueError:
        # If a ValueError occurs, return False
        return False

# Use the 'filter' function to create a list of values that can be converted to integers
ivals = list(filter(is_int, values))
#filter() creates an iterator, so if you want to create a list of results, make sure you also
#use list() as shown.

# Print the resulting list of integer values
print(ivals)


['1', '2', '-3', '4', '5']


**Discussion:**

List comprehensions and generator expressions are often the easiest and most straightforward
ways to filter simple data. They also have the added power to transform the
data at the same time. For example:

In [82]:
import math

mylist = [1, 4, -5, 10, -7, 2, 3, -1]

# Use a list comprehension to create a new list containing the square root of positive numbers in 'mylist'
# The condition 'if n > 0' filters out non-positive numbers
sqrt_list = [math.sqrt(n) for n in mylist if n > 0]

# Print the resulting list of square roots
print(sqrt_list)




[1.0, 2.0, 3.1622776601683795, 1.4142135623730951, 1.7320508075688772]

One variation on filtering involves replacing the values that don’t meet the criteria with
a new value instead of discarding them. For example, perhaps instead of just finding
positive values, you want to also clip bad values to fit within a specified range. This is
often easily accomplished by moving the filter criterion into a conditional expression
like this:

In [84]:

# Use a list comprehension to create a new list ('clip_neg') that contains each element in 'mylist'
# If the element is greater than 0, keep it; otherwise, replace it with 0
clip_neg = [n if n > 0 else 0 for n in mylist]

# Print the resulting list after applying the condition
print(clip_neg)

# Use a list comprehension to create a new list ('clip_neg') that contains each element in 'mylist'
# If the element is less than 0, keep it; otherwise, replace it with 0
clip_pos = [n if n < 0 else 0 for n in mylist]
print(clip_pos)


[1, 4, 0, 10, 0, 2, 3, 0]
[0, 0, -5, 0, -7, 0, 0, -1]


Another notable filtering tool is itertools.compress(), which takes an iterable and
an accompanying Boolean selector sequence as input. As output, it gives you all of the
items in the iterable where the corresponding element in the selector is True. This can
be useful if you’re trying to apply the results of filtering one sequence to another related
sequence. For example, suppose you have the following two columns of data:

In [85]:
addresses = [
'5412 N CLARK',
'5148 N CLARK',
'5800 E 58TH',
'2122 N CLARK'
'5645 N RAVENSWOOD',
'1060 W ADDISON',
'4801 N BROADWAY',
'1039 W GRANVILLE',
]

counts = [ 0, 3, 10, 4, 1, 7, 6, 1]

Now suppose you want to make a list of all addresses where the corresponding count
value was greater than 5. Here’s how you could do it:

In [87]:


# Use a list comprehension to create a boolean list ('more5') indicating whether each element in 'counts' is greater than 5
more5 = [n > 5 for n in counts]

# Print the boolean list indicating elements greater than 5
print("Boolean list indicating counts greater than 5:")
print(more5)

# Use the 'compress' function from the 'itertools' module to filter 'addresses' based on the boolean values in 'more5'
# This creates a new list containing only the addresses corresponding to True values in 'more5'
filtered_addresses = list(compress(addresses, more5))

# Print the resulting list of addresses after filtering
print("\nFiltered addresses with counts greater than 5:")
print(filtered_addresses)


Boolean list indicating counts greater than 5:
[False, False, True, False, False, True, True, False]

Filtered addresses with counts greater than 5:
['5800 E 58TH', '4801 N BROADWAY', '1039 W GRANVILLE']


# 1.17. Extracting a Subset of a Dictionary

**Problem:**
You want to make a dictionary that is a subset of another dictionary

**Solution:**
This is easily accomplished using a dictionary comprehension. For example:

In [88]:
prices = {
'ACME': 45.23,
'AAPL': 612.78,
'IBM': 205.55,
'HPQ': 37.20,
'FB': 10.75
}


# Make a dictionary ('p1') of all prices over 200 using a dictionary comprehension
# Only include key-value pairs where the value is greater than 200
p1 = {key: value for key, value in prices.items() if value > 200}

# Print the resulting dictionary of prices over 200
print("Dictionary of prices over 200:")
print(p1)

# Make a dictionary ('p2') of tech stocks using a dictionary comprehension
# Only include key-value pairs where the key is in the set of tech stock names
tech_names = {'AAPL', 'IBM', 'HPQ', 'MSFT'}
p2 = {key: value for key, value in prices.items() if key in tech_names}

# Print the resulting dictionary of tech stocks and their prices
print("\nDictionary of tech stocks and their prices:")
print(p2)


Dictionary of prices over 200:
{'AAPL': 612.78, 'IBM': 205.55}

Dictionary of tech stocks and their prices:
{'AAPL': 612.78, 'IBM': 205.55, 'HPQ': 37.2}


**Discussion:**

Much of what can be accomplished with a dictionary comprehension might also be done
by creating a sequence of tuples and passing them to the dict() function. For example:

In [90]:


# Make a dictionary ('p1') of all prices over 200 using the 'dict' constructor with a generator expression
# Only include key-value pairs where the value is greater than 200
p1 = dict((key, value) for key, value in prices.items() if value > 200)

# Print the resulting dictionary of prices over 200
print("Dictionary of prices over 200:")
print(p1)

# Make a dictionary ('p2') of tech stocks using a dictionary comprehension
# Only include key-value pairs where the key is in the set of tech stock names
tech_names = {'AAPL', 'IBM', 'HPQ', 'MSFT'}
p2 = {key: prices[key] for key in prices.keys() & tech_names}

# Print the resulting dictionary of tech stocks and their prices
print("\nDictionary of tech stocks and their prices:")
print(p2)


Dictionary of prices over 200:
{'AAPL': 612.78, 'IBM': 205.55}

Dictionary of tech stocks and their prices:
{'IBM': 205.55, 'AAPL': 612.78, 'HPQ': 37.2}


# 1.18. Mapping Names to Sequence Elements

**Problem:**
You have code that accesses list or tuple elements by position, but this makes the code
somewhat difficult to read at times. You’d also like to be less dependent on position in
the structure, by accessing the elements by name.

**Solution:**
collections.namedtuple() provides these benefits, while adding minimal overhead
over using a normal tuple object. collections.namedtuple() is actually a factory
method that returns a subclass of the standard Python tuple type. You feed it a type
name, and the fields it should have, and it returns a class that you can instantiate, passing
in values for the fields you’ve defined, and so on. For example:

In [9]:
from collections import namedtuple

Subscriber = namedtuple("Subscriber",['addr', 'joined'])
sub = Subscriber('jonesy@example.com', '2012-10-19')
print(sub)

Subscriber(addr='jonesy@example.com', joined='2012-10-19')
print(sub.addr)
print(sub.joined)

#Although an instance of a namedtuple looks like a normal class instance, it is interchangeable
#with a tuple and supports all of the usual tuple operations such as indexing
#and unpacking. For example:
len(sub)

addr,joine = sub
print(sub.addr)
print(sub.joined)


#A major use case for named tuples is decoupling your code from the position of the
#elements it manipulates. So, if you get back a large list of tuples from a database call,
#then manipulate them by accessing the positional elements, your code could break if,
#say, you added a new column to your table. Not so if you first cast the returned tuples
#to namedtuples.
#To illustrate, here is some code using ordinary tuples:
def compute_cost(records):
    total = 0.0
    for rec in records:
        total += rec[1]*rec[2]
    return total




Subscriber(addr='jonesy@example.com', joined='2012-10-19')
jonesy@example.com
2012-10-19
jonesy@example.com
2012-10-19


# ** Discussion: Mapping Names to Sequence Elements **

One possible use of a namedtuple is as a replacement for a dictionary, which requires
more space to store. Thus, if you are building large data structures involving dictionaries,
use of a namedtuple will be more efficient. However, be aware that unlike a dictionary,
a namedtuple is immutable. For example:

In [11]:
#References to positional elements often make the code a bit less expressive and more
#dependent on the structure of the records. Here is a version that uses a namedtuple:

# Import the 'namedtuple' class from the 'collections' module
from collections import namedtuple

# Create a named tuple 'Stock' with fields 'name', 'shares', and 'price'
Stock = namedtuple("Stock", ["name", "shares", "price"])

# Define a function 'compute_cost' that calculates the total cost of stock records
# Each record is expected to be a tuple or iterable with values for 'name', 'shares', and 'price'
def compute_cost(records):
    # Initialize a variable 'total' to store the cumulative cost
    total = 0.0

    # Iterate through each record in the provided list of records
    for rec in records:
        # Create a 'Stock' named tuple from the record
        s = Stock(*rec)

        # Calculate the cost for the current record and add it to the total
        # Note: The code seems to have a typo (total = s.shares * s.price), it should be total += s.shares * s.price
        total += s.shares * s.price

    # Return the total cost
    return total


s = Stock('ACME', 100, 123.45)
print(s)

#A subtle use of the _replace() method is that it can be a convenient way to populate
#named tuples that have optional or missing fields. To do this, you make a prototype
#tuple containing the default values and then use _replace() to create new instances
#with values replaced. For example:
from collections import namedtuple

Stock = namedtuple('Stock', ['name', 'shares', 'price', 'date', 'time'])

stock_prototype = Stock('', 0, 0.0, None, None)

# Function to convert a dictionary to a Stock
def dict_to_stock(s):
    return stock_prototype._replace(**s)


#Here is an example of how this code would work:
a = {'name': 'ACME', 'shares': 100, 'price': 123.45}
dict_to_stock(a)
b = {'name': 'ACME', 'shares': 100, 'price': 123.45, 'date': '12/17/2012'}
dict_to_stock(b)


Stock(name='ACME', shares=100, price=123.45)


Stock(name='ACME', shares=100, price=123.45, date='12/17/2012', time=None)

# **1.19. Transforming and Reducing Data at the Same Time**

## **Problem:**

You need to execute a reduction function (e.g., sum(), min(), max()), but first need to
transform or filter the data.

## **Solution:**

A very elegant way to combine a data reduction and a transformation is to use a
generator-expression argument. For example, if you want to calculate the sum of
squares, do the following:

In [12]:
nums = [1, 2, 3, 4, 5]

s = sum(x*x for x in nums)

#Here are a few other examples:
# Determine if any .py files exist in a directory
import os

files = os.listdir("dirname")
if any(name.endswith(".py") for name in files):
    print("There be Python!")
else:
    print("Sorry no Python")

# Output a tuple as CSV
s = ('ACME', 50, 123.45)
print(','.join(str(x) for x in s))





FileNotFoundError: [Errno 2] No such file or directory: 'dirname'

# **Discussion:Transforming and Reducing Data at the Same Time**

The solution shows a subtle syntactic aspect of generator expressions when supplied as
the single argument to a function (i.e., you don’t need repeated parentheses). For example,
these statements are the same:

In [14]:
# Data reduction across fields of a data structure
portfolio = [
{'name':'GOOG', 'shares': 50},
{'name':'YHOO', 'shares': 75},
{'name':'AOL', 'shares': 20},
 {'name':'SCOX', 'shares': 65}
]

min_shares = min(s["shares"] for s in portfolio)
print(min_shares)

# Calculate the sum of squares using a generator expression passed as an argument to the 'sum' function
s = sum((x * x for x in nums))  # Pass generator-expr as an argument
# Alternatively, use a more elegant syntax without parentheses
s = sum(x * x for x in nums)  # More elegant syntax

# Print the calculated sum
print("Sum of squares:", s)

# Using a generator argument is often more efficient and elegant than creating a temporary list
# For example, if 'nums' was huge, creating a list might be memory-inefficient
# The generator expression transforms the data iteratively, saving memory
# Alternative implementation without a generator expression
s = sum([x * x for x in nums])

# Print the sum using the alternative implementation
print("\nSum of squares (alternative implementation):", s)

# Example using the 'min' function with a generator expression
# Original: Returns the minimum value of the "shares" key in the portfolio
min_shares = min(s["shares"] for s in portfolio)

# Alternative: Returns the dictionary with the minimum "shares" value in the portfolio
min_shares = min(portfolio, key=lambda s: s["shares"])

# Print the result of the 'min' function using both approaches
print("\nMinimum shares (original):", min_shares)
print("Minimum shares (alternative):", min_shares)


20
Sum of squares: 55

Sum of squares (alternative implementation): 55

Minimum shares (original): {'name': 'AOL', 'shares': 20}
Minimum shares (alternative): {'name': 'AOL', 'shares': 20}


# **1.20. Combining Multiple Mappings into a Single Mapping **

## **Problem:**

You have multiple dictionaries or mappings that you want to logically combine into a
single mapping to perform certain operations, such as looking up values or checking
for the existence of keys.

## **Solution:**

Suppose you have two dictionaries:

In [16]:
a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }

#Now suppose you want to perform lookups where you have to check both dictionaries
#(e.g., first checking in a and then in b if not found). An easy way to do this is to use the
#ChainMap class from the collections module. For example:
from collections import ChainMap

c = ChainMap(a,b)
print(c['x']) # Outputs 1 (from a)
print(c['y']) # Outputs 2 (from b)
print(c['z']) # Outputs 3 (from a)

1
2
3


# **Discussion:Combining Multiple Mappings into a Single Mapping**

A ChainMap takes multiple mappings and makes them logically appear as one. However,
the mappings are not literally merged together. Instead, a ChainMap simply keeps a list
of the underlying mappings and redefines common dictionary operations to scan the
list. Most operations will work. For example:

In [None]:
print(len(c))
print(list(c.keys()))
print(list(c.values()))

#Operations that mutate the mapping always affect the first mapping listed. For example:
c['z'] = 10
c['w'] = 40
del c['x']
a

#A ChainMap is particularly useful when working with scoped values such as variables in
#a programming language (i.e., globals, locals, etc.). In fact, there are methods that make
#this easy:
values = ChainMap()
values['x'] = 1
# Add a new mapping
values = values.new_child()

#As an alternative to ChainMap, you might consider merging dictionaries together using
#the update() method. For example:
a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }

merged = dict(b)
merged.update(a)

merged['x']
merged['y']
merged['z']


