# Item 22: Prefer Helper Classes Over Bookkeeping with Dictionaries and Tuples

- Python's built-in dictionary type is wonderful for maintaining dynamic internal state over the lifetime of an object. By dynamic, I mean situations in which you need to do bookkeeping for an unexpected set of identifiers. For example, say you want to record the grades of a set of students whose names aren't known in advance. You can define a class to sotre the names in a dictionary instead of using a predefined attribute for each student.

In [3]:
class SimpleGradebook(object):
    def __init__(self):
        self._grades = {}
        
    def add_student(self, name):
        self._grades[name] = []
    
    def report_grade(self, name, score):
        self._grades[name].append(score)
    
    def average_grade(self, name):
        grades = self._grades[name]
        return sum(grades) / len(grades)

In [7]:
book = SimpleGradebook()
book.add_student('Isaac Newton')
book.report_grade('Isaac Newton', 90)
print(book.average_grade('Isaac Newton'))

90.0


- Dictionaries are so easy to use there's a danger of overextending them to write brittle code. For example, say you want to extend the SimpleGradebook class to keep a list of grades by subject, not just overall. You can do this by chaging \_grades dictionary to map student names (the keys) to yet another dictionary (the values). The innermost dictionary will map subjects (the keys) to grades (the values).

In [11]:
class BySubjectGradebook(object):
    def __init__(self):
        self._grades = {}
        
    def add_student(self, name):
        self._grades[name] = {}
        
    def report_grade(self, name, subject, grade):
        by_subject = self._grades[name]
        grade_list = by_subject.setdefault(subject, [])
        grade_list.append(grade)
        
    def average_grade(self, name):
        by_subject = self._grades[name]
        total, count = 0, 0
        for grades in by_subject.values():
            total += sum(grades)
            count += len(grades)
        return total / count

- This seems straightforward enough. The report_grade and average_grade methods will gain quite a bit of complexity to deal with the multilevel dictionary, but it's manageable.

In [13]:
book = BySubjectGradebook()
book.add_student('Albert Einstein')
book.report_grade('Albert Einstein', 'Math', 75)
book.report_grade('Albert Einstein', 'Math', 65)
book.report_grade('Albert Einstein', 'Gym', 90)
book.report_grade('Albert Einstein', 'Gym', 95)

- Now, imagine your requirements change again. You also want to track the weight of each score towardd the overall grade in the class so midterms and finals are more important than pop quizzes. One way to implement this feature is to change the innrtmost dictionary; instead of mapping subjects (the keys) to grades (the values), I can use the tuple (score, weight) as values.

In [15]:
class WeightedGradebook(object):
    def __init__(self):
        self._grades = {}
        
    def add_student(self, name):
        self._grades[name] = {}
        
    def report_grade(self, name, subject, score, weight):
        by_subject = self._grades[name]
        grade_list = by_subject.setdefault(subject, [])
        grade_list.append((score, weight))
        
    def average_grade(self, name):
        by_subject = self._grades[name]
        score_sum, score_count = 0, 0
        for subject, scores in by_subject.items():
            subject_avg, total_weight = 0, 0
            for score, weight in scores:
                subject_total_score += score * weight
                total_weight += weight
                subject_avg = subject_total_score / total_weight
                score_sum += subject_avg
                score_count += len(score)                
        return score_sum / score_count

In [19]:
book = WeightedGradebook()
book.add_student('Albert Einstein')
book.report_grade('Albert Einstein', 'Math', 80, 0.10)

- When you see complexity like this happen, it's time to make the leap from dictionaries and tuples to a hierarchy of classes.

- At first, you didn't know you'd need to support weighted grades, so the complexity of additional helper classes seemed unwarranted. Python's built-in dictionary and tuple types made it easy to keep going, adding layer after layer to the internal bookkeeping. But you should avoid doing this for more than one level of nesting(i.e., avoid dictionaries that contain dictionaries). It makes your code hard to read by other programmers and sets you up for a maintenance nightmare.

- As soon as you realize the bookkeeping is getting complicated, break it all out into classes. This lets you provide well-defined interfaces that better encapsulate your data. This also enables you to create a layer of abstraction between your interfaces and your concrete implementations.

### Refactoring to Classes

- You can start moving to classes at the bottom of the dependency tree: a single grade. A class seems too heavyweight for such simple information. A tuple, though, seems appropriate because grades are immutable. Here, I use the tuple (score, weight) to track grades in a list:

In [21]:
grades = []
grades.append((95, 0.45))
total = sum(score * weight for score, weight in grades)
total_weight = sum(weight for _, weight in grades)
average_grade = total / total_weight

In [22]:
average_grade

95.0

- The problem is that plain tuples are positional. When you want to associate more information with a grade, like a set of notes from the teacher, you'll need to rewrite every usage of the two-tuple to be aware that there are now three items present instead of two. Here, I use \_ (the unserscore variable name, a Python convention for unused variables) to capture the third entry in the tuple and just ignore it:

In [23]:
grades = []
grades.append((95, 0.45, 'Great job'))
total = sum(score * weight for score, weight, _ in grades)
total_weight = sum(weight for _, weight, _ in grades)
average_grade = total / total_weight

- This pattern of extending tuples longer and long is similar to deepening layers of dictionaries. As soon as you find yourself going longer than a two-tuple, it's time to condifer another approach.

- The namedtuple type in the collections module does exactly what you need. It lets you easily define tiny, ummutable data classes.

In [25]:
import collections 
Grade = collections.namedtuple('Grade', ('score', 'weight'))

- These classes can be constructed with positional or keyword arguments. The fields are accessible with named attributes. Having named attributes makes it easy to move from a namedtuple to your own class later if your requirements change again and you need to add behaviors to the simple data containers.


** Limitations of namedtuple **
Although useful in many circumstances, it's important to understand when namedtuple can cause more harm than good.

- You can't specify default argument values for namedtuple classes. This makes them unwieldy when your data may have many optional properties. If you find yourself using more than a handful of attributes, defining your own class may be a better choice.

- The attributes values of namedtuple instances are still accessible using numerical indexes and iteration. Especially in externalized APIs, this can lead to unintentional usage that makes it harder to move to a real class later. If you're not in control of all of the usage of your namedtuple instances, it's better to define your own class.

- Next, you can write a class to represent a single subject that contains a set of grades.

In [26]:
class Subject(object):
    def __init__(self):
        self._grades = []
        
    def report_grade(self, score, weight):
        self._grades.append(Grade(score, weight))
        
    def average_grade(self):
        total, total_weight = 0, 0
        for grade in self._grades:
            total += grade.score * grade.weight 
            total_weight += grade.weight
        return total / total_weight

- Then you would write a class to represent a set of subjects that are being studied by a single student.

In [27]:
class Student(object):
    def __init__(self):
        self._subjects = {}
        
    def subject(self, name):
        if name not in self._subjects:
            self._subjects[name] = Subject()
        return self._subjects[name]
    
    def average_grade(self):
        total, count = 0, 0
        for subject in self._subjects.values():
            total += subject.average_grade()
            count += 1
        return total / count

- Finally, you'd write a container for all of the students keyed dynamically by their names

In [28]:
class Gradebook(object):
    def __init__(self):
        self._students = {}
    
    def student(self, name):
        if name not in self._students:
            self._students[name] = Student()
        return self._students[name]

- The line count of these classes is almost double the previous implementation's size. But this code is much easier to read. The example driving the classes is also more clear and extensible.

In [29]:
book = Gradebook()
albert = book.student('Albert Einstein')
math = albert.subject('Math')
math.report_grade(80, 0.10)
print(albert.average_grade())

80.0


- If necessary, you can write backwards-compatible methods to help migrate usage of the old API style to the new hierarchy of objects.

## Things to Remember

- Avoid making dictionaries with values that are other dictionaries or long tuples.
- Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class.
- Move your bookkeeping code to use muliple helper classes when your internal state dictionaries get complicated.

# Item 23: Accept Functions for Simple Interfaces Instead of Classes

- Many of Python's built-in APIs allow you to customize bahavior by passing in a function. These hooks are used by APIs to call back your code while they execute. For example, the list type's sort method takes an optional key argument that's used to determine each index's value for sorting. Here, I sort a list of names based on their lengths by providing a lambda expression as the key hook:

In [30]:
names = ['Socrates', 'Archimedes', 'Plato', 'Aristotle']
names.sort(key=lambda x: len(x))
print(names)

['Plato', 'Socrates', 'Aristotle', 'Archimedes']


- In other languages, you might expect hooks to be defined by an abstract calss. In Python, many hooks are just stateless functions with well-defined arguments and return values. Functions are ideal for hooks because they are easier to describe and simpler to define than classes. Functions workd as hooks because Python has *first-class* functions: Functions and methods can be passed around and referenced like any other value in the language.

- For example, say you want to customize the behavior of the defaultdict class. This data structure allows you to supply a function that will be called each time a missing key is accessed. The function must return the default value the missing key should have in the dictionary. Here,  I define a hook that logs each tiem a key is missing and returns 0 for the default value:


In [31]:
def log_missing():
    print('key added')
    return 0

- Given an initail dictionary and a set of desired increments, I can cause the log_missing function to run and print twice(for 'red' and 'orange').

In [33]:
from collections import defaultdict

current = {'green': 12, 'blue': 3}
increments = [
    ('red', 5),
    ('blue', 17),
    ('orange', 9),
]
result = defaultdict(log_missing, current)
print('Before:', dict(result))
for key, amount in increments:
    result[key] += amount
print('After: ', dict(result))

Before: {'green': 12, 'blue': 3}
key added
key added
After:  {'green': 12, 'blue': 20, 'red': 5, 'orange': 9}


- Supplying functions like log_missing makes APIs easy to build and test because it separates effects from deterministic behavior. For example, say you now want the default value hook passed to defaultdict to count the total number of keys that were missing. One way to achieve this is using a stateful closure. Here, I define a helper function that uses such a closure as the default value hook:

In [35]:
def increment_with_report(current, increments):
    added_count = 0
    
    def missing():
        nonlocal added_count # Stateful closure
        added_count += 1
        return 0
    
    result = defaultdict(missing, current)
    for key, amount in increments:
        result[key] += amount
        
    return result, added_count

- Running this function produces the expected result (2), even though the defaultdict has no idea that the missing hook maintains state. This is another benefit of accepting simple functions for interfaces. It's easy to add functionality later by hiding state in a closure.

In [36]:
result, count = increment_with_report(current, increments)
assert count == 2

- The problem with defining a closure for stateful hooks is that it's harder to read than the stateless function example. Another approach is to define a small class that encapsulates the state you want to track.

In [37]:
class CountMissing(object):
    def __init__(self):
        self.added = 0
        
    def missing(self):
        self.added += 1
        return 0

- In other languages, you might expect that now defaultdict would have to be modified to accommodate the interface of CountMissing. But in Python, thanks to first-class functions, you can reference the CountMissing.missing method directly on an object and pass it to defaultdict as the default value hook. It's trivial to have a methodd satisfy a function interface.

In [38]:
counter = CountMissing()
result = defaultdict(counter.missing, current) # Method ref

for key, amount in increments:
    result[key] += amount
assert counter.added == 2

- Using a helper class like this to provide the behavior of a stateful closure is clearer than the increment_with_report function above. However, in isolation it's still not immediately obvious what the purpose of the CountMissing class is. Who constructs a CountMissing object? Who calls the missing method? Will the class need other public methods to be added in the futuer? Until you see its usage with defaultdict, the class is a mystery.

In [39]:
class BetterCountMissing(object):
    def __init__(self):
        self.added = 0
        
    def __call__(self):
        self.added += 1
        return
    
counter = BetterCountMissing()
counter()
assert callable(counter)

- Here, I use a BetterCountMissing instance as the default value hook for a defaultdict to track the number of missing keys that were added:

In [47]:
counter = BetterCountMissing()
result = defaultdict(counter, current) # relies on __call__
for key, amount in increments:
    if result[key]:
        result[key] += amount
    else: 
        result[key] = 0
        result[key] += amount
assert counter.added == 2

- This is much clearer than the CountMissing.missing example. The \_\_call\_\_ method indicates that a class's instances will be used somewhere a function argument would also be suitable (like API hooks). It directs new readers of the code to the entry point that's reponsible for the class's primary behavior. It provides a strong hint that the goal of the class is to act as a statefull closure.


- Best of all, defaultdict still has no view into what's going on when you use \_\_call\_\_. All that defaultdict requires is a function for the default value hook. Python provdes many different ways to satisft a simple function interface depending on what you need to accomplish

## Things to Remember

- Instead of defining and instantiating classes, functions are often all you need for simple interfaces between components in Python.
- References to functions and methods in Python are first class, meaning they can be used in expression like any other type.
- The \_\_call\_\_ special method enables instances of a class to be called like plain Python functions.
- When you need a function to maintain state, consider defining a class that provides the \_\_call\_\_ method insteadd of defining a stateful closure.