# Functional Paradigms in Python

So you've heard about functional programming, and you want to take a stab at it, but you've mostly been trained in objects and really like thinking that way. It turns out the transition really isn't that complicated if you follow some basic rules. In this lesson we will talk about the primary ways to start taking advantage of functional programming paradigms by partially or completely converting your object-oriented code into functional implementations.

**Note:** this won't cover all cases, and not everything in this lesson is intended to be "correct" functional implementations... the purpose is instead to start seeing how many of the things you know do actually translate well and give you a basis for further refining the skillset.

## What is functional programming?

A good place to start is in talking about what functional programming is, and how it differs from OOP.

In it's more ideal version, functional programming is the removal of state from your application.

<img src="./images/wut.jpg">

Yes, that's what I said. So how do you remove state from your application? I mean, you need *some* state, surely, or your application doesn't actually *do* anything, right?

Well, not so fast. Yes, you need state. But how you manage state makes a big difference.

In the "good old days" (before all these fancy monitors and stereo sound and 30fps 3-D immersive games), state was held in basic variables. Early languages had only global variables, and later languages introduced scoped variables that were meaningful only inside their scope of visibility (such as "within a function" or "within this code block" or "within this file").

It turns out that getting all these variables right and knowing which ones you should be interacting with and which ones you shouldn't was a hard problem to solve. Many approaches were tried, but one that gained significant adoption was the creation of objects that allowed the developer to include state and related behaviors into a single entity that could then be interacted with on the fly in predefined ways.

And all was good with the programming world.

Except...

Well, except it turns out that not everyone was happy with this. There are **many** solid arguments against OOP (in fairness, there are also many arguments *for* it) and people began looking to new approaches. So they dug back into history and revived a concept that had, in fact, been used heavily in particular systems because it was so blindingly fast, easy to parallelize, and easier to understand. Functional Programming was (re-) born!

What systems needed massive parallelization dating back to pre-internet days? Telephone call routing! And phone companies had figured out that by following a particular set of "rules", they received almost everything they needed for parallelization for free. This meant that the computers these applications were running on could arbitrarily start and stop tasks based on priority, complexity, or any other metric without having to concern themselves much with challenges like inter-process communication.

**Note:** yes, that's an over-simplification. Making a massively-parallelized application work correctly does take IPC. But with a functional approach, the kinds of things that need to be commnunicated are much simpler than they are in non-functional paradigms, and many languages handle much of the IPC for you. So from a *practical* perspective you end up writing a lot less code and thinking about synchronization a lot less while still receiving massive acceleration in many cases.

So what are the rules?

The rules focus on making all of your functions atomic and isolated. Functional programming is having a mindset of how your code works if it is isolated and treated as a bunch of atomic tasks rather than a cohesive application. They include things like not sharing state between different tasks (becuase this means you have to synchronize things), not modifying existing state from within functions (for the same reason), writing functions with no state dependency across time (that always return exactly the same result if called with the same parameters), and using variables only as constants (we'll address this later, it's not as bad as it sounds).

By following the rules, you get programs which are generally smaller than their OOP counterparts, objectively faster in many cases (and rarely, if ever, slower), much easier to test and validate, and more reaily understood by new contributors.

One good way to start thinking functionally is to re-write a simple application the new way.

## Converting object-oriented code to a functional implementation

**Note:** this is not doctrine, this is gleaned from experience as a "gut check" to writing code in a functional way. Many guides will show variants of these ideas and even altogether different examples. I highly recommend reading in depth from the resources at the bottom to take this to the next level.

### Sample application

For the remainder of this lesson, we will be looking at the following (overly-) simplistic module:


In [1]:
import csv

from io import BytesIO
from zipfile import ZipFile


class FileToRows():

    def __init__(self, filename):
        self._filename = filename
        parts = filename.split('.')
        self._name, self._extensions = parts[0], parts[1:]

    def __repr__(self):
        return f"FileToRows(filename='{self._filename}')"

    def filename(self):
        return self._filename

    def extensions(self):
        return self._extensions

    def rows(self):
        return list(self)

    def __iter__(self):
        with open(self._filename, 'rb') as fin:
            content = fin.read()
            if self._extensions[-1] == 'zip':
                content = self._zip_to_binary(content)
            content = content.decode('utf-8')
            for row in self._text_to_rows(content):
                yield row

    def _text_to_rows(self, content):
        return csv.reader(line for line in content.split() if line)

    def _zip_to_binary(self, content):
        archive = ZipFile(BytesIO(content))
        file = archive.namelist()[0]
        with open(file, 'rb') as fin:
            return fin.read()

I have created a comma-separated file called `abc.csv` containing ASCII characters and their representations as a character, integer, hexidecimal, and octal values:

```
a,97,0x61,0o141
b,98,0x62,0o142
c,99,0x63,0o143
d,100,0x64,0o144
e,101,0x65,0o145
f,102,0x66,0o146
```

The program will load the file, decompress it (if necessary), and print each row from the file as a list of values:

In [2]:
decoder = FileToRows('abc.csv.zip')
print(decoder)
print(decoder.filename())
print(decoder.extensions())
for row in decoder:
    print(row)

FileToRows(filename='abc.csv.zip')
abc.csv.zip
['csv', 'zip']
['a', '97', '0x61', '0o141']
['b', '98', '0x62', '0o142']
['c', '99', '0x63', '0o143']
['d', '100', '0x64', '0o144']
['e', '101', '0x65', '0o145']
['f', '102', '0x66', '0o146']


## Functions are First-Class Citizens

How do you create a list of all the even numbers? New developers will generally come up with some kind of loop:

In [3]:
def evens():
    result = []
    for i in range(1, 11):
        if i % 2 == 0:
            result.append(i)
    return result

evens()

[2, 4, 6, 8, 10]

The following functions all do the exact same thing as the one above:

```python
def evens_list_comprehension():
    return [i*2 for i in range(1, 6)]

def evens_list_comprehension_with_filter():
    return [i for i in range(1, 11) if i % 2 == 0]

def evens_range_params():
    return list(range(2, 11, 2))
```

In each case we are using a generator (`range(...)`) to give us a list and then modifying the values in a predictable way.

But there's another way to do it in Python that is less thought of: `filter(...)`.

In [4]:
list(filter(lambda x: x %2 == 0, range(1, 11)))

[2, 4, 6, 8, 10]

At first glance that may seem like a strange way to do it, and in this instance I probably would use a comprehension instead, but the point here is that `filter(...)` is a function that takes a *function* as it's first parameter. Filter does not know or care care about the data it receives; it just knows how to call a function once on each element of the input and include or exclude that element in the output based on the result of the function call. This means that `filter` is adaptable to an extrordinary number of purposes. If you know how `filter` works for one case, you know how it works for every case.

This kind of thinking is at the heart of functional programming: describing your algorithm in terms of the functions that transform your data rather than the data itself.

### Rule 1: Eliminate state everywhere you can

Take a look at our object definition. You'll notice that the object contains the following state members: `self._filename` and `self._extensions`. We can access both anywhere in our class methods. But notice the only places they are used are in the `__iter__` method and in two helper methods that don't have a purpose other than object introspection and nicely-formatted output. This means we are holding state when the object is created so that when we want to iterate over it, it already knows the name and metadata of the file to use.

But we could also just explicitly pass the necessary values directly to a function that did the same thing, something like this:

```python
def rows(filename, extensions):
    with open(filename, 'rb') as fin:
        content = fin.read()
        if extensions[-1] == 'zip':
            content = _zip_to_binary(content)
        content = content.decode('utf-8')
        for row in _text_to_rows(content):
            yield row
```

The core refactoring here is to extract a class method to the module level and change the internal references to properties of `self` to instead be input parameters.

**Note:** it is much easier to start with methods that have no references to `self` within the method body, as these are already independent of the object state.

#### Correlary: Eliminate side effects

Honestly, side effects are simply the effct of having state. If we eliminated all state, we eliminate all side effects. Good job, us!

Well, it's not actually that easy. Your code might interact with the filesystem, a database, or, in our case, `sys.stdout` (writing to any output device is a side-effect). But within the code, we are not executing any logic that depends on non-parameter variables that could change between runs.

#### Correlary: Consolidate state that can't be eliminated

If you *must* have state in your module (database connections, for instance), then it can be very helpful to wrap them in some kind of manager. In the most simple case, this can be a `dataclasses.dataclass` or maybe something more complicated in a module that makes the state read-only for consumers and provides helper functions to update it in predictable ways. Learn about the [Flux architecture](https://facebook.github.io/flux/) and one-way data flow for more.


#### Correlary: Everything is constant

We want to treat all parameters and globals as constants: once they have a value, that value doesn't ever change. In other words, we want all globals and parameters to be [immutable](https://realpython.com/courses/immutability-python/).

Within our functions, we're looking for **smells** like this one:

```python
def add_some_key(x: dict):
    x['some_key'] = 123
```

A dictionary was passed in, and the function modified it. This suddenly makes that object stateful. Instead, you should copy anything you want to change and return the result:

```python
def add_some_key(x: dict):
    updated = x.copy()
    updated['some_key'] = 123
    return updated
```

All of the references to the original dictionary are unaffected. We can shallow-copy the original dictionary quickly, and a shallow-copy is sufficient if we are replacing or adding a top-level key/value pair.

Some languages have keywords like `const` to make it easier to enforce this paradigm, but you still have to watch mutable objects like lists and dictionarys for improper use. If you can instead use non-mutable forms (`tuple` instead of `list` and `tuple` of pairs instead of `dict`), then python will prevent you from changing them in improper ways and force you to copy them to make changes.

**Note:** there are legitimate times when having and modifying a mutable object makes sense, such as when using looping variables or within factory methods; however, it is far better to return an immutable if you can.

### Rule 2: Focus on functions

As you look at the behavior, as yourself "what is the operation that is necessary to transform this data?". Then think about what is needed to perform that operation, and that become

### Completed refactoring

After making all the changes above, we are left with this version:

In [5]:
import csv

from io import BytesIO
from tarfile import open as tar_open
from zipfile import ZipFile

def rows(filename):
    with open(filename, 'rb') as fin:
        content = fin.read()
        for function in _build_pipeline(_extensions(filename)):
            content = function(content)
        yield from content

def _extensions(filename):
    return tuple(filename.split('.')[1:])

def _build_pipeline(extensions):
    return tuple(stage for extension in extensions[::-1] for stage in _stages_by_extension[extension])

def _raw_tgz_to_raw(raw):
    archive = tar_open(fileobj=BytesIO(raw), mode='r:gz')
    return archive.extractfile(archive.getmembers()[0]).read()

def _raw_zip_to_raw(raw):
    archive = ZipFile(BytesIO(raw))
    with open(archive.namelist()[0], 'rb') as fin:
        return fin.read()

def _raw_to_text(raw):
    return raw.decode('utf-8')

def _rows_to_filtered_rows(rows):
    return filter(lambda x: x, rows)

_text_to_rows = str.splitlines

def _rows_csv_to_values(rows):
    return csv.reader(rows)

def _rows_tsv_to_values(rows):
    return csv.reader(rows, delimiter='\t')

_common_text_stages = (_raw_to_text, _text_to_rows, _rows_to_filtered_rows)

_stages_by_extension = dict(
    csv=_common_text_stages + (_rows_csv_to_values,),
    tsv=_common_text_stages + (_rows_tsv_to_values,),
    tgz=(_raw_tgz_to_raw,),
    zip=(_raw_zip_to_raw,),
)

We can run through several files to make sure it's working correctly for each variant:

In [6]:
for filename in ('abc.csv', 'abc.csv.zip', 'abc.csv.tgz', 'abc.tsv', 'abc.tsv.zip'):
    print(filename, _extensions(filename))
    for row in rows(filename):
        print(row)

abc.csv ('csv',)
['a', '97', '0x61', '0o141']
['b', '98', '0x62', '0o142']
['c', '99', '0x63', '0o143']
['d', '100', '0x64', '0o144']
['e', '101', '0x65', '0o145']
['f', '102', '0x66', '0o146']
abc.csv.zip ('csv', 'zip')
['a', '97', '0x61', '0o141']
['b', '98', '0x62', '0o142']
['c', '99', '0x63', '0o143']
['d', '100', '0x64', '0o144']
['e', '101', '0x65', '0o145']
['f', '102', '0x66', '0o146']
abc.csv.tgz ('csv', 'tgz')
['a', '97', '0x61', '0o141']
['b', '98', '0x62', '0o142']
['c', '99', '0x63', '0o143']
['d', '100', '0x64', '0o144']
['e', '101', '0x65', '0o145']
['f', '102', '0x66', '0o146']
abc.tsv ('tsv',)
['a', '97', '0x61', '0o141']
['b', '98', '0x62', '0o142']
['c', '99', '0x63', '0o143']
['d', '100', '0x64', '0o144']
['e', '101', '0x65', '0o145']
['f', '102', '0x66', '0o146']
abc.tsv.zip ('tsv', 'zip')
['a', '97', '0x61', '0o141']
['b', '98', '0x62', '0o142']
['c', '99', '0x63', '0o143']
['d', '100', '0x64', '0o144']
['e', '101', '0x65', '0o145']
['f', '102', '0x66', '0o146']


## References

* [Functional programming](https://en.wikipedia.org/wiki/Functional_programming)
* [A Brief History of Programming (Why Functional Programming Matters)](https://becoming-functional.com/a-brief-history-of-programming-c13d87b79337)
* [Immutability in Python](https://realpython.com/courses/immutability-python/)
* [Flux architecture](https://facebook.github.io/flux/)
