# Part 1: Array-oriented programming

<br><br><br>

## Programming paradigms

Programming paradigms are rough classifications of styles of programming.

<br><br><br>

Some examples that you may have heard of:

| Paradigm | Description | Languages | Emphasizes |
|:--:|:---|:--:|:--:|
| Structured programming | Use `if`, `for`, and `while` instead of `goto` everywhere. | _(nearly all)_ | Program flow |
| Procedural programming | Define subprograms or "procedures" that are today called "functions". | _(nearly all)_  | Distinct tasks |
| Object-oriented programming | Data structures are bundled with the functions that act on them, called "classes", to make the distinction between subsystems more clear. | Python, Java, C++ | Distinct subsystems |
| Actor-based programming | Objects can only be modified through methods, which are concurrent queues. Intended for parallel programming; object-oriented developed out of this as a simplification. | Erlang | Temporal locality |
| Imperative programming | Every statement is an instruction that changes the state of the computer. | Python, Java, C | Low-level algorithms |
| Functional programming | Every function and every expression has a return value. | Lisp, F#, OCaml | Data transformations |
| Strict functional programming | Doesn't allow data structures to change after they're created; all results of a program must be through return values. | Haskell, Elm | Data transformations and nothing else |
| Reactive programming | Relationships between variables are automatically maintained, often used to keep data displayed by a web app up-to-date. | React.js, Svelte, Elm | Model-view separation |
| Declarative programming | Specify what to do, rather than how to do it. | HTML, CSS, YAML, SQL | Stateless documents or queries |
| Logic programming | Programs are logical statements, constraints, and queries, which are automatically deduced. | Prolog | Mathematical proofs |
| Literate programming | Interleave code with human-readable text for instruction. | Jupyter, Python docstrings | Communication between humans |

<br><br><br>

Most programming today is
* structured,
* procedural,
* object-oriented, and
* imperative.

Strict functional programming is often used as an example of a _different_ paradigm.

<br><br><br>

Here is a concrete example of **imperative** versus **functional**.

In [None]:
import numpy as np

<br><br><br>

**Imperatively** compute the square of each element of `input_data` and put it in `output_data`.

In [None]:
input_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
output_data = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0])

for i in range(len(input_data)):
    output_data[i] = input_data[i]**2

output_data

<br><br><br>

**Functionally** do the same:

In [None]:
input_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

def square(x):
    return x**2

output_data = np.fromiter(map(square, input_data), int)

output_data

<br><br><br>

Python has built-in syntax for some functional operations without making the functions explicit:

In [None]:
input_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

output_data = np.asarray([x**2 for x in input_data])

output_data

<br><br><br>

**Array-oriented programming** is like **functional**, but a little different.

In [None]:
input_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

output_data = input_data**2

output_data

<br><br><br>

| Array-oriented programming | Functional programming |
|:--|:--|
| Variables are whole arrays, not elements of an array. | Function arguments or dummy variables are individual elements. |
| Sometimes encourages in-place operations (exceptions: JAX, Awkward Array). | In-place operations are either discouraged or forbidden. |
| Emphasizes changes in _distributions_ of data. | Emphasizes changes in components of a data structure. |

<br><br><br>

## Larger example

Differences in programming styles are easier to see in larger programs.

Below is a calculation of gravitational forces between $n$ planets in 3 dimensions. Don't worry about the details; the point is to see the larger picture.

In [None]:
m = np.array([100, 1, 1])   # sun and a double-planet (a 3-body problem)

# initial position (x) and momentum (p)
x = np.array([[0, 0, 0], [0, 0.9, 0], [0, 1.1, 0]])
p = np.array([[0, 0, 0], [-13, 0, 0], [-10, 0, 0]])

G = 1

<br><br><br>

### Imperative

In [None]:
def imperative_forces(m, x, p):
    total_force = np.zeros_like(x)

    for i in range(len(x)):
        for j in range(i + 1, len(x)):
            if i != j:
                mi, mj = m[i], m[j]
                xi, xj = x[i], x[j]
                pi, pj = p[i], p[j]
                displacement = [
                    xj[0] - xi[0],
                    xj[1] - xi[1],
                    xj[2] - xi[2],
                ]
                distance = np.sqrt(displacement[0]**2 + displacement[1]**2 + displacement[2]**2)
                direction = [
                    displacement[0] / distance,
                    displacement[1] / distance,
                    displacement[2] / distance,
                ]
                force = [
                    G * mi * mj * direction[0] / distance**2,
                    G * mi * mj * direction[1] / distance**2,
                    G * mi * mj * direction[2] / distance**2,
                ]
                total_force[i, 0] += force[0]
                total_force[i, 1] += force[1]
                total_force[i, 2] += force[2]
                total_force[j, 0] += -force[0]
                total_force[j, 1] += -force[1]
                total_force[j, 2] += -force[2]

    return total_force

<br><br><br>

### Functional

In [None]:
from functools import reduce
from itertools import combinations

In [None]:
def functional_forces(m, x, p):
    def negate(vector):
        return [-a for a in vector]

    def add(*vectors):
        return [reduce(lambda a, b: a + b, components) for components in zip(*vectors)]

    def subtract(vectorA, vectorB):
        return add(vectorA, negate(vectorB))

    def magnitude(vector):
        return np.sqrt(reduce(lambda a, b: a + b, map(lambda a: a**2, vector)))

    def force(mi, mj, xi, xj, pi, pj):
        displacement = subtract(xi, xj)
        distance = magnitude(displacement)
        direction = [a / distance for a in displacement]
        return [G * mi * mj * a / distance**2 for a in direction]

    pairwise_forces = [
        ((i, j), force(mi, mj, xi, xj, pi, pj))
        for ((i, (mi, xi, pi)), (j, (mj, xj, pj))) in combinations(enumerate(zip(m, x, p)), 2)
    ]

    def partial_forces(pairwise_forces, i):
        return (
            [force for (_, check), force in pairwise_forces if i == check] +
            [negate(force) for (check, _), force in pairwise_forces if i == check]
        )

    return np.array([add(*partial_forces(pairwise_forces, i)) for i in range(len(m))])

<br><br><br>

### Array-oriented

In [None]:
def array_forces(m, x, p):
    i, j = np.triu_indices(len(x), k=1)
    pw_displacement = x[j] - x[i]
    pw_distance = np.sqrt(np.sum(pw_displacement**2, axis=-1))
    pw_direction = pw_displacement / pw_distance[:, np.newaxis]
    pw_force = G * m[i, np.newaxis] * m[j, np.newaxis] * pw_direction / pw_distance[:, np.newaxis]**2
    total_force = np.zeros_like(x)
    np.add.at(total_force, i, pw_force)
    np.add.at(total_force, j, -pw_force)
    return total_force

<br><br><br>

In [None]:
imperative_forces(m, x, p)

In [None]:
functional_forces(m, x, p)

In [None]:
array_forces(m, x, p)

<br><br><br>

Let's take a minute and scroll over the three implementations.

What do you notice about them? _(Just shout out answers!)_

<br><br><br>

### For fun, let's see them run

In [None]:
import matplotlib.pyplot as plt
from matplotlib import animation
from IPython.display import HTML

In [None]:
def array_step(m, x, p, dt):
    # this is a numerically stable way of updating positions, momenta, and forces
    p += array_forces(m, x, p) * (dt/2)    # half kick
    x += p * dt / m[:, np.newaxis]         # full drift
    p += array_forces(m, x, p) * (dt/2)    # half kick

In [None]:
def plot(m, x, p, dt, num_frames=100, steps_per_frame=10):
    num_particles = len(m)

    history = np.empty((num_frames, num_particles, 2))
    for i in range(num_frames):
        history[i, :, 0] = x[:, 0]
        history[i, :, 1] = x[:, 1]
        for _ in range(steps_per_frame):
            array_step(m, x, p, dt)

    fig, ax = plt.subplots(figsize=(5, 5))

    lines = []
    for j in range(num_particles):
        lines.append(ax.plot(history[:1, j, 0], history[:1, j, 1])[0])
    dots = ax.scatter(history[0, :, 0], history[0, :, 1])

    ax.set_xlim(-2, 2)
    ax.set_ylim(-2, 2)

    def update(i):
        for j, line in enumerate(lines):
            line.set_xdata(history[:i, j, 0])
            line.set_ydata(history[:i, j, 1])
        dots.set_offsets(history[i, :, :])
        return [*lines, dots]

    ani = animation.FuncAnimation(fig=fig, func=update, frames=num_frames, interval=50, blit=True)
    out = HTML(ani.to_jshtml())
    plt.close()
    return out

**Sun, Earth, Moon:**

In [None]:
m = np.array([100, 1, 1], np.float64)
x = np.array([[0, 0, 0], [0, 0.9, 0], [0, 1.1, 0]], np.float64)
p = np.array([[0, 0, 0], [-13, 0, 0], [-10, 0, 0]], np.float64)

plot(m, x, p, dt=0.001)

**The three-body problem:**

In [None]:
a = 0.347111
b = 0.532728
m = np.array([1, 1, 1], np.float64)
x = np.array([[-1, 0, 0], [1, 0, 0], [0, 0, 0]], np.float64)
p = np.array([[a, b, 0], [a, b, 0], [-2 * a, -2 * b, 0]], np.float64)

plot(m, x, p, dt=0.01)

**Chaos!**

In [None]:
m = np.ones(25)
x = np.random.normal(0, 1, (25, 3))
p = np.random.normal(0, 1, (25, 3))

plot(m, x, p, dt=0.0025)

<br><br><br>

### More seriously, let's time it

In [None]:
m = np.ones(500)
x = np.random.normal(0, 1, (500, 3))
p = np.random.normal(0, 1, (500, 3))

In [None]:
%%timeit -n1 -r3

imperative_forces(m, x, p)

In [None]:
%%timeit -n1 -r3

functional_forces(m, x, p)

In [None]:
%%timeit -n1 -r3

array_forces(m, x, p)

<br><br><br>

| Implementation | Lines of Python | Python byte-code instructions | Scaling |
|:--:|:--:|:--:|:--|
| `imperative_forces` | 25 | 214 | for each of the 500×499 pairs of planets |
| `functional_forces` | 12 | 100 | for each of the 500×499 pairs of planets |
| `array_forces` | 9 | 103 | only once, not once per pair of planets |

<br><br><br>

Python's virtual machine, dynamic data types, garbage collection, etc. make stepping through a line of Python code much more time-consuming than a bare-metal machine code instruction. Managing stack frames when calling functions is even worse.

The key thing is that array-oriented programming avoids this overhead in the part of the problem that scales: one Python function call can invoke a billion machine code instructions:

In [None]:
def run(x):
    return x + x

In [None]:
big_array = np.empty(1_000_000_000)

In [None]:
run(big_array)

<br><br><br>

In [None]:
import dis

In [None]:
dis.disassemble(run.__code__)

<br><br><br>

You could see array-oriented programming as a work-around for Python being a "slow language," but it's more general than that. It's equivalent to vectorization at a hardware level, so if you're programming GPUs in any language, you need to be "thinking in arrays."

<br><br><br>

Array-oriented programming is also a good fit, mentally, with doing data analysis.

Of all the array-oriented languages and libraries that have ever existed, most of them have been intended for data analysis:

![](../img/apl-timeline.svg)

<br><br><br>

Consider: which of the following is the most idiomatic Pandas?

In [None]:
import pandas as pd

df = pd.DataFrame({"x": np.random.normal(0, 1, 1000), "y": np.random.normal(0, 1, 1000)})

<br><br><br>

This one?

In [None]:
z = df[ df["x"] > df["y"] ]["x"]**2

Or this one?

In [None]:
z = df.query("x > y")["x"].apply(lambda x: x**2)

Or this one?

In [None]:
z = []
for row in df.itertuples():
    if row.x > row.y:
        z.append(row.x**2)

<br><br><br>

## How to think in arrays

The best way to learn is by solving problems, so we should get to the first challenge exercise soon.

Before that, let's do one together.

<br><br><br>

Compute the length of the curve whose $x$, $y$ positions are given by arrays `x` and `y`:

In [None]:
t = np.linspace(0, 2*np.pi, 10000)
x = np.sin(3*t)
y = np.sin(4*t)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(x, y);

<br><br><br>

Hint: the main steps of most array-oriented problems involve
* slicing arrays to align relevant parts
* mapping a single mathematical operation across all elements of equal-sized arrays
* reducing an array to a scalar.

<br><br><br>

The formula for the length of a line segment is

![](../img/length-by-segment.svg)

You need to find the length of the whole line given by `x` and `y`.

<br><br><br>

<br><br><br>

When you're done, move on to [project.ipynb](project.ipynb)!