## Item 24: Use @classmethod Polymorphism to Construct Object Generically

* Polymorphism is a way for multiple classes in a hierarchy to implement their own versions of a method.
* This allows many classes to fulfill the same interface or abstract base class while providing different functionality
    * See `Item 28`: Inherit from collections.abc for Customer Container Types

In [None]:
import os
import random

from threading import Thread

In [None]:
TMPDIR = 'test_inputs'

config = {'data_dir': TMPDIR}

### Common class

In [None]:
# replaced by GenericInputData
class InputData:
    def read(self):
        raise NotImplementedError

* Define a class with a read method that must be defined by subclasses.

In [None]:
# replaced by PathInputData(GenericInputData)
class PathInputData(InputData):
    def __init__(self, path):
        super().__init__()
        self.path = path

    def read(self):
        with open(self.path) as f:
            return f.read()

* You could have any number of InputData subclasses like PathInputData.
* Each of them could implement the stanard interface for read to return the bytes of data to process.
* Other InputData subclasses could read from the network, decompress data transparently, etc.

* Abstract interface for the MapReduce worker that consumes the input data in a standard way

In [None]:
# replaced by GenericWorker
class Worker:
    def __init__(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self, other):
        raise NotImplementedError

* Define a concrete subclass of Worker to implement the specific MapReduce function I want to apply:
    * a simple newline counter

In [None]:
# replaced by LineCountWorker(GenericWorker)
class LineCountWorker(Worker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count('\n')

    def reduce(self, other):
        self.result += other.result

### Review of `str` and `count`

In [None]:
l=[1,2,5,4,5,6,7,10]
l.count(5)

In [None]:
# bad, contain both blanks and newlines
s = """This is a very
very
long
string."""
s

In [None]:
s.count("\n")

In [None]:
# one line with proper blanks
s2 = s.replace("\n", " ")
s2

In [None]:
s2.count("\n")

In [None]:
# bad, not include any extra blanks or newlines
s3 = (
    "This is a very"
    "very"
    "long"
    "string."
)
s3

In [None]:
s3.count("\n")

### Back to classes

* classes with interfaces and abstractions, but only useful once objects are constructed.
* Manually build and connect the objects with some helper functions.

In [None]:
def generate_inputs(data_dir):
    for name in os.listdir(data_dir):
        yield PathInputData(os.path.join(data_dir, name))

* Next, create the `LineCountWorker` instances using the `InputData` instances returned by `generate_inputs`.

In [None]:
def create_workers(input_list):
    workers = []
    for input_data in input_list:
        workers.append(LineCountWorker(input_data))
    return workers

* Execute these Worker instances by fanning out the `map` step to multiple threads.
* Then, I call reduce repeatedly to combine the results into one final value.

In [None]:
def execute(workers):
    threads = [Thread(target=w.map) for w in workers]
    for thread in threads: thread.start()
    for thread in threads: thread.join()

    first, *rest = workers
    for worker in rest:
        first.reduce(worker)
    return first.result

* Finally connect all of the pieces together in a function to run each step.

In [None]:
# replaced by mapreduce(worker_class, input_class, config)
def mapreduce(data_dir):
    inputs = generate_inputs(data_dir)
    workers = create_workers(inputs)
    return execute(workers)

* Running this function on a set of test input files works great.

In [None]:
def write_test_files(tmpdir):
    try:
        os.makedirs(tmpdir)
        for i in range(100):
            with open(os.path.join(tmpdir, str(i)), 'w') as f:
                f.write('\n' * random.randint(0, 100))  # including blank lines    
    except OSError:
        pass

In [None]:
write_test_files(TMPDIR)

In [None]:
result = mapreduce(TMPDIR)
print(f'There are {result} lines')

### Generic way to construct objects.


* Python only allows for the single constructor method `_init_`.
    * It's unreasonable to require every `InputData` subclass to have a compatible constructor.

* Use `@classmethod` polymorphism (applies to whole classes instead of their constructed objects).

* `generate_inputs` take a dictionary with a set of configuration parameters that are up to the `GenericInputData` concrete subclass to interpret.

In [None]:
class GenericInputData:
    def read(self):
        raise NotImplementedError
        
    @classmethod
    def generate_inputs(cls, config):
        raise NotImplememtedError

* Use config to find the directory to list for input files.

In [None]:
class PathInputData(GenericInputData):
    def __init__(self, path):
        super().__init__()
        self.path = path
        
    def read(self):
        with open(self.path) as f:
            return f.read()
        
    @classmethod
    def generate_inputs(cls, config):
        data_dir = config['data_dir']
        for name in os.listdir(data_dir):
            yield cls(os.path.join(data_dir, name))   

* Make the `create_workers` helper part of the `GenericWorker` class.
* Use the `input_class` parameter, which must be a subclass of `GeneticInputData`, to generate the necessary inputs.
* Construct instances of the `GenericWorker` concrete subclass using `cls()` as a generic constructor.

In [None]:
class GenericWorker:
    def __init__(self, input_data):
        self.input_data = input_data
        self.result = None
        
    def map(self):
        raise NotImplementedError
        
    def reduce(self, other):
        raise NotImplementedError
        
    @classmethod
    def create_workers(cls, input_class, config):
        workers = []
        for input_data in input_class.generate_inputs(config):
            workers.append(cls(input_data))
        return workers

In [None]:
def create_workers_2(cls, input_class, config):
    workers = []
    for input_data in input_class.generate_inputs(config):
        workers.append(cls(input_data))
    return workers

In [None]:
result_1 = GenericWorker.create_workers(PathInputData, config)

In [None]:
len(result_1), result_1[:3]

In [None]:
result_2 = create_workers_2(GenericWorker, PathInputData, config)

In [None]:
len(result_2), result_2[:3]

In [None]:
# will get AttributeError 
result_3 = str.create_workers(PathInputData, config)

In [None]:
# will get AttributeError
str.create_workers

In [None]:
GenericWorker.create_workers

In [None]:
result_4 = create_workers_2(str, PathInputData, config)

In [None]:
len(result_4), result_4[:3]

In [None]:
type(GenericWorker)

In [None]:
type(str)

In [None]:
class UnboundWorker(GenericWorker):
    pass

In [None]:
result_5 = UnboundWorker.create_workers(PathInputData, config)

In [None]:
len(result_5), result_5[:3]

* Note that the call to input_class generate_inputs is the class polymorphism.

In [None]:
class LineCountWorker(GenericWorker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count('\n')
        
    def reduce(self, other):
        self.result += other.result

* Rewrite the `mapreduce` function that is completely generic.

In [None]:
def mapreduce(worker_class, input_class, config):
    workers = worker_class.create_workers(input_class, config)
    return execute(workers)

In [None]:
result = mapreduce(LineCountWorker, PathInputData, config)
print(f'There are {result} lines')

### Things to Remember

* Python only supports a single constructor per class, the `__init__` method.
* Use `@classmethod` to define alternative constructors for your classes.
* Use class method polymorphism to provide generic ways to build and connect concrete subclasses.