# FP pearls

These are just collected thoughts on software engineering.  A poor man's epigrams if you will

- How do you test a function that returns nothing?
- Prefer (real) functions over methods
- Function names should not be sentences
- Separation of concerns includes data from methods
- Composition is the essence of computation
- Pipelines are just another form of Composition
- Make your functions like water: as pure as possible
- Make your functions like water: it can take many forms
- Don't call us, we'll call you
- If you do have to call, we will leave you a message
- Prefer immutable data where you can, unless performance is critical
- Laziness is a virtue: Try to copy data as little as possible
- Laziness is a virtue: Why go to the heap when you can work on the stack?
- Modules should not download the internet with dependencies
- Open source is great, but comes with strings attached
- Decoupling is great...when you can afford it
- Does a number know how to add, or a bike know how to cycle?  So why insist on classes?
- Tests are like coins: they should have two sides
- The harder it is to mock out functionality, the more hardwiring you did
- Learn your IDE tricks to speed up your coding

JVM tidbits

- OOP is not the best Model 
- Use value classes once ready
- Know how the bytecode works
- Dont be afraid of recursion (if using scala, kotlin or clojure)
- Minimize your dependencies

Rust tidbits

- Lifetimes are hard, but references beats copying data


## Prefer functions to methods

Methods, in most laguages, are bound to a class (technically an instance of the class).  They implicitly have a hidden
first argument which is a reference to the class instance itself. It might be called `this` or `self` in many languages.

A true function does not have this implicit argument, and is not bound to an instance.  That being said, a function can
have free variables inside of it, which can be captured from an outer scope (eg closures).  Indeed, one can think of a
closure as a poor man's object.  But closures must take care if they mutate the captured arguments.

But why are functions better than methods?  Firstly, you dont need to instantiate an object to call them.  Secondly, all
the arguments are explicit and under your control (unlike methods, which often use state in the instance to work with).
Lastly, there are sometimes implementation differences between the two, as is with kotlin's method references, or
scala's eta expansion.

In [26]:
// Rust doesn't have methods, because it doesn't have classes.  But it does have structs which are just data.  It also
// has a way to separately add implementations of functions to the struct (or enum).  
#[derive(Debug)]
pub struct Equipment {
    pub name: String,
    pub price: f64,
    pub weight: f64
}

#[derive(Debug)]
pub struct Character {
    name: String,
    equipment: Vec<Equipment>
}

// These look like methods, but there are some a crucial differences.  
// - They do not all implicitly have &self as an argument
// - The implementation of the methods can happen separately from the struct declaration (even in another file!)
impl Character {
    pub fn new(name: String) -> Self {
        Character { name , equipment: vec![]}
    }

    pub fn add_equipment(&mut self, item: Equipment) -> &mut Self {
        self.equipment.push(item);
        self
    }

    pub fn remove_equipment(&mut self, name: &str) -> &mut Self {
        let r1 = 0..self.equipment.len();
        let zipped = r1.zip(self.equipment.iter());
        let mut at = 0;
        let mut found = false;
        for (idx, item) in zipped {
            if item.name == name {
                at = idx;
                found = true;
                break
            }
        }
        if found {
            self.equipment.remove(at);
        }
        self
    }
}

fn test() {
    let mut char = Character::new("Sean".into());
    char
        .add_equipment(Equipment {
            name: "sword".into(),
            price: 1000.00,
            weight: 10.0
        })
        .add_equipment(Equipment {
            name: "shield".into(),
            price: 500.00,
            weight: 7.00
        });
    println!("{char:#?}");
    char.remove_equipment("sword");
    println!("{char:#?}");
}

test()

Character {
    name: "Sean",
    equipment: [
        Equipment {
            name: "sword",
            price: 1000.0,
            weight: 10.0,
        },
        Equipment {
            name: "shield",
            price: 500.0,
            weight: 7.0,
        },
    ],
}
Character {
    name: "Sean",
    equipment: [
        Equipment {
            name: "shield",
            price: 500.0,
            weight: 7.0,
        },
    ],
}


()

## Long named functions are a code smell

This is probably going to be the most controversial epigram.  Think about what most functions do in FP programs.  They
are very concise in what they do.  For example, `map`, `flatMap`, `reduce`, `take`, `filter`.  Because thet only do one
thing, it's easy to understand what they do.

A long named function is probably doing many things, or doing something hard to describe.  Why is it hard to describe?
A function's purpose is to transform data from one kind to another.  That is the ultimate essence of computation: "from
this data, give me this transformed data of some other representation".  Seriously.  That's what computation is.

Now, those short named functions describe the operation, but not its meaning.  For example, `map` means "convert this
container/context of X into a container/context of Y".  It does not tell you that the meaning is "convert this list of
numbers to strings".  In such a case, you could give your function a name like "intsToStrings".

Recall, that a program is just the composition of functions.  This is different from the imperative style of "do this,
then do that".  When you compose functions, you dont need to explain everything just the the the first to last
composition. 

In [27]:
// This name already indicates it is doing several things.
// It is getting data from s3, and calculating a difference.  So create 2 functions
fn takes_data_from_s3_finds_difference_between_json_files() {

}


## Objects as just data

Most engineers were taught OOP programming in school, and worked in OOP languages at work.  As such, they were trained
to think in terms of "Objects".  Blueprints for things with both data, state, and functionality to work on its own
internal state (or other data)

The problem with this is many.  While Objects feel intuitive, it leads to all kinds of decisions about how to model the
data.  The real problem though, is conflating mutable internal state, with immutable plain old data.

Think about it like this.  When you get a message from a websocket, is that just data or is it some class with methods
too?  We work with pure data all the time.  Databases, events, network data, etc.  When we get this data over the wire
or from a filesystem, guess what...it's JUST data.  But we are told in OOP that we must think of this data as an Object.
An instance of some blueprint that contains methods to act on this data.

Everytime you write a POJO, or write a static method, you are witnessing the failure of the OOP paradigm. Everytime you
have to write final, or synchronized, you are witnessing the failure to treat separate data from state.  Let me ask you
this.  When you have a number, is that an object?  Is it data, or does it have some state?  To get six, even if I do
`5.plus(1)` (which is valid scala BTW), did I mutate 5?

Even Java has admitted this value in two respects.  The first is Java 15 record classes.  This is a new way to easily
create POJOs without all the boilerplate.  The second is with their upcoming Valhalla project and value classes (more on
this later)

So, long story short, treat data as data and create functions that work with that data

In [28]:
pub struct Point {
    x: f64,
    y: f64,
    z: f64
}

// Even java has `record` tyoes now, which are _just_ data.  No need to add methods if you dont want

## Composition is the essence of computation

It took me a long time to figure this one out.  Computation is really just the processing of a graph.  We have been
blinded by imperative programming that you do a sequence of steps.

```
x = doThis() // either this is a side effectful function, or it's an effective literal 
y = doThatWithX(x)
doSomethingWithSideEffect(y) // what's going on here?
z = doSomethingElseWithX(x) // did doThatWithX mutate x?
```

We really have two kinds of processing: pure and impure.  Pure programming takes some input and returns some output. It
also neither affects the outside world, nor is affected by it. Impure programming can possibly take no input, and
possibly return no output and either affects or is affected by the outside world.  We need to somehow keep these two
separate, and yet also interoperate with each other.  If you ever heard of the term monad, that's basically what it
does.  It's a bridge between the two.  Monads allow you to wall off the impure from the pure, keeping it
compartmentalized, and carrying the "effect" along for the ride.

We can then treat all computation, including the impure, as a pipeline of functions.  We can even see multithreaded or
async program in this light, by thinking of them as branches in the graph (that usually converge back into the main
trunk).  In fact several async computational runtimes (eg kotlinx coroutines and rust's async are state machines, and
what are state machines? graphs).

What is the advantage to thinking this way?  Many.  The first, is that you should be able to test any section of the
program in isolation, as long as you can construct the data it should have at that point. It's also easier to visualize
as a flow chart.

In [29]:
fn f(x: u32) -> u32 {
    x * 2
}

fn g(x: u32) -> u32 {
    x + 10
}

fn test() -> u32 {
    let g_ans = g(4);
    let f_ans = f(g_ans);
    f_ans
}

test();

// composing these functions vs imperatively building up the solution
f(g(4))

28

## Pipelines are composition 

The mathematical form of composition f of g is sometimes counter intuitive.  Even though f comes first in the alphabet,
we execute g first, and apply its output as the input to f.  So you have to execute right to left.  Pipelines are more
intuitive, as they execute from left to right.  f | g means execute f, and pump its output to the input of g.  And isnt
that what this imperative code is doing?

```
x = someFunc(1)
y = anotherFunc(x)
// equals anotherFunc(someFunc(1))
```

What if we could write this as 

```scala
// scala syntax
someFunc(1) |> anotherFunc(_)
```

Unfortunately, rust does not let us create operator style syntax.  But you could create your own
pipe function.

In [32]:
pub fn pipe<A, B, C>(f1: impl Fn(A) -> B, f2: impl Fn(B) -> C) -> impl Fn(A) -> C {
    move |a: A| f2(f1(a))
}

fn f(x: u32) -> u32 {
    x * 2
}

fn g(x: u32) -> u32 {
    x + 10
}

pipe(g, f)(4)

28


## Make as many functions pure as you can

It's easy to mix in side effects into your code.  Often we do need to make a request to some service, or read from a
file, but try to compartmentalize and batch these up as much as you can.  Even logging can be done purely, if you are
willing to wait for a computation to complete (or error out).

For example, you can build up a string and pass it along.  I wouldnt do this for a large program, but you could for
small sections of code.

Logging is to some degree a side effect of side effectful programming (wow, meta).  While sometimes it is nice to know
what is happening regardless of purity, if you have a pure function, they are simple to test, immune to time, immune to
the outside world, and do not affect the outside world.  In other words, you really shouldn't need to have logging
information (unless you just wanted to know what the input was).  Generally, logging is needed when you have data that
you need to keep track of some intermediate state, what time it executed, or the order of execution.

With pure functions, you shouldnt have to worry about any of that (as long as your program is built as a graph so that
out of order execution is not possible even in multi threaded programming and every call in your function chain is pure)

## Pushing is better than pulling

Ever heard the phrase, "Dont call us, we'll call you"?  If you have heard it of it, it was probably in reference to the
Inversion of Control principle, which is usually (incorrectly) thought of to be synonymous with Dependency Injection.
We are also talking about inversion of control here, but not with dependency injection.

With IOC, we flip the notion on its head of how data is received.  Instead of asking for data, you are told what the
data is.  Developers with experience in event driven programs know this idea well.  You don't ask for (eg poll) the
data. This is inefficient and may result in lost data changes.  While event driven architectures are harder to
implement, they also solve many design problems which are next to impossible to handle correctly otherwise.  How many
times have you needed to know when data in a database changed? Or a file changed?  Or information was available on a
socket?  Or some job that even though it is on a cron schedule fails often and has to be rerun at a later unknown time?

Frequency of the event occuring is not the issue, knowing _when_ data changed is.  Polling is not only expensive for the
client, as it has to sit and wait between intervals (probably consuming a precious thread) but it's also possible that
the polling may miss an update.

The ubiquity of messaging systems like kafka, activemq, mqtt, or SQS shows that there is a reason people want and need
to listen to a stream of data.


## If you do have to pull, tell your clients

If you are going with a pull model, at least notify the consumers when data is ready.  That way they know when to
actually make the poll call.  That's basically how the epoll and select system command in linux works in linux.  It
basically spins in a loop, waiting for a marker to be set, and then the OS knows there is data on the file descriptor or
socket, at which point the OS does make a call to retrieve the new data.

This way solves two of the aforementioned problems.  You wont miss any data changes, and the client wont waste time with
meaningless calls to get data when none is available.  However, this setup isn't much easier than writing an event based
system.  It requires a two way asynchronous communcation ability similar to event driven systems.  However, it is easier
for more people to reason about than reactive architectures.  However, it is also missing functionality of reactive
systems too.

## How do you test functions that return nothing?

If you have a function returning Unit or Void, how do you test it?  Fundamentally, a void returning function is by its
very nature side-effectful.  The only way such a function can be useful is if it is writing to a file, sending a network
request, etc.  Eventually, all code _does_ need to do something effectful (even just printing to the screen), but such
effects can be encapsulated so that it affects nothing else.  

But even so, by not returning a value, how do you test it?  How do you know if the file was written or the network
request made it?  By the same token, functions that take no args are also inherently side effectful.

In [30]:
use std::fs::{File, OpenOptions, self};
use std::io::Write;

// Another problem here is hard coding the path to answer.txt.  what if this changes?
fn write_to_answer(body: &str) {
    let file = OpenOptions::new()
        .append(true)
        .create(true)
        .open("answer.txt");
    if let Ok(mut f) = file {
        f.write_all(body.as_bytes());
    }
}

// Ok, you wrote some text...now how do you make sure it actually got written to file?
write_to_answer("Some new answer\n");