# Part 1: Array-oriented programming

## What is "array-oriented programming"?

<br>

The way I'll be using the word, it's a programming paradigm, alongside paradigms like "imperative," "object-oriented," and "functional."

<br><br>

**In array-oriented programming, the primary data type is an array, and most functions perform one operation on all the elements of the array.**

<br><br>

In [2]:
import numpy as np

For instance, this is _not_ array-oriented:

<br>

In [11]:
input_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
output_data = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0])

for i in range(len(input_data)):             # explicitly specifies an order of execution
    output_data[i] = input_data[i]**2        # user says what happens to each element

output_data

array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])

And this is _not_ array-oriented:

<br>

In [12]:
input_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

output_data = np.fromiter(
    map(lambda x: x**2, input_data), int     # still focused on the individual element "x"
)

output_data

array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])

_This_ is array-oriented:

<br>

In [13]:
input_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

output_data = input_data**2                  # implicit indexes, no individual elements

output_data

array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])

As with all programming paradigms, there isn't a sharply defined rule to cleanly separate them, and a single codebase can use several paradigms.

<br>

They are styles, and they're useful because they each bring different programming concepts into the foreground:

| Paradigm | Emphasizes |
|:-:|:-:|
| imperative/procedural | low-level algorithms |
| object-oriented | large-scale program structure |
| actor-based | temporal locality |
| literate | human instruction |
| event-driven | causal structure |
| declarative | properties of desired result |
| symbolic | formula transformations |
| functional | data transformations |
| array-oriented | data distributions |

## What array-oriented programming is good for

<br>

All of the languages/major libraries in which array-oriented programming is a major feature (that I know of):

<br>

<img src="../img/apl-timeline.svg" width="100%">

Almost all of them are intended as _interactive data-analysis_ tools.

(Only Fortran-90 is not interactive.)

<br><br>

* interactive REPL (read-evaluate-print loop)
* concise notation
* unabashedly mathematical

### The grandfather: APL

<table><tr>
    <td width="25%"><img src="../img/apl-book.png" width="100%"></td>
    <td width="50%"><img src="../img/apl-keyboard.jpg" width="100%"></td>
</tr></table>

In [21]:
%%html
<div style="overflow: hidden;"><iframe src="https://tryapl.org/" width="100%" height="380" scrolling="no" style="border: none;"></div>

In [67]:
%%html
<div style="overflow: hidden;"><iframe src="https://app.sli.do/event/rbr8JR3hY4WEZ9CpWm94Xg/embed/polls/d92f941a-23fc-494d-a18b-8163205dc779" width="100%" height="280" scrolling="no" style="border: none;"></div>

**Answers:**

```apl
      (⍳10) - 1
0 1 2 3 4 5 6 7 8 9

      +/(⍳10) - 1
45

      +\(⍳10) - 1
0 1 3 6 10 15 21 28 36 45
```

<br>

APL was too concise! Modern array-oriented programming is looking for the right balance.

<center>
<img src="../img/tshirt.jpg" width="20%">
</center>

### Distributions and interactivity

Array-oriented languages bring data _distributions_ to the foreground.

<br>

In [43]:
from hist import Hist  # histogram library

<br>

Given a large dataset...

In [41]:
dataset = np.random.normal(0, 1, 1000000)  # one MILLION data points

<br>

How are the data _distributed_?

In [42]:
Hist.new.Reg(100, -5, 5).Double().fill(dataset)

What happens if we apply a function to _all values in the distribution_?

In [44]:
dataset2 = dataset**2

<br>

In [45]:
Hist.new.Reg(100, -1, 10).Double().fill(dataset2)

Can anyone guess what this distribution will look like?

In [46]:
dataset3 = np.sin(1/dataset2)

<br>

(I can't.)

In [47]:
Hist.new.Reg(100, -1, 1).Double().fill(dataset3)

**Human readability advantage:**

  * Mathematical expressions are concise, more convenient to type interactivity.

<br>

**Computational advantage:**

  * The right _part_ of the computation is accelerated: the loop over all values in the distribution.

## NumPy

<center>
<img src="../img/Numpy_Python_Cheat_Sheet.svg" width="75%">
</center>

NumPy's version of

```apl
      10 20 30 + 1 2 3
11 22 33
```

is

In [48]:
np.array([10, 20, 30]) + np.array([1, 2, 3])

array([11, 22, 33])

<br><br><br>

This one syntactic feature makes arrays duck-typable in any [closed form](https://en.wikipedia.org/wiki/Closed-form_expression) expression.

In [49]:
def quadratic_formula(a, b, c):
    return (-b + np.sqrt(b**2 - 4*a*c)) / (2*a)

Compute the quadratic formula on one set of scalar values:

In [50]:
a = 5
b = 10
c = -0.1

quadratic_formula(a, b, c)

0.009950493836207741

<br>

Compute the quadratic formula on a million values in arrays:

In [51]:
a = np.random.uniform(5, 10, 1000000)
b = np.random.uniform(10, 20, 1000000)
c = np.random.uniform(-0.1, 0.1, 1000000)

quadratic_formula(a, b, c)

array([ 0.00088191,  0.00325183, -0.00205543, ...,  0.00078065,
       -0.00316134,  0.00641409])