## Numpy Times table example

Suppose you want to create a times-table. Here's how you do it with a loop:

In [25]:
import numpy as np
times_table = np.ones((12,12),dtype=int)
for a in range(1,13):
    for b in range(1,13):
        times_table[a-1,b-1] = a*b
times_table

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

Let's look at how to do much more efficiently this using array operations, using the mathematical regularity
of the structure we're trying to build to help us.

We're going to use broadcaasting to get it done.
Recall that `numpy` uses elementwise operations to interpret the following.

In [24]:
7 * np.arange(1,13)

array([ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84])

This of of course is row 7 of our goal, a times table.  The idea is to do some thing
like this to a 2D array.  Here's how.

In [3]:
ones_12x12 = np.ones((12,12),dtype=int)
ones_12x12

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

In the next example, broadcasting naturally applies muliplication of the array on the right to each
of the **rows** of the array on the left.  

In [34]:
times_table = ones_12x12 * np.arange(1,13)
print(times_table)

[[ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
 [ 1  2  3  4  5  6  7  8  9 10 11 12]]


We now have a table with 1's in the first column, 2's in  the second, and so on.
Transpose this and we have

In [35]:
times_table.T

array([[ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5],
       [ 6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6],
       [ 7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7],
       [ 8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8],
       [ 9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9],
       [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10],
       [11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11],
       [12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12]])

Apply row-wise multiplication of 1-13 again, and we have a times table, created without any loops.

In [36]:
times_table.T * np.arange(1,13)

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

Let's do it all in one step:

In [33]:
(np.ones((12,12),dtype=int) * np.arange(1,13)).T * np.arange(1,13)

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

Next we'll see how this can be done much more easily by exploiting the full power of broadcasting.

### Broadcasting in different dimensions; Two-way broadcasting

Here we use our times table example to try to clarify the distinction between row-wise
and column-wise broadcasting, and to better understand how to exploit it. In the process, describe an alternative way of building the times table that is much simpler.

Broadcasting a 1D array to match a 2D array always works in a **row-wise** direction.  The general rule is, the 1D array has to match the last dimension of the 2D array:
(12,) can broadcast to match (12,12) or (11,12) but not (12,11).

In [51]:
np.ones((11,12)) * np.arange(1,13) 

array([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.]])

In [36]:
# ValueError: operands could not be broadcast together with shapes (12,) (12,11) 
# np.ones((12,11)) * np.arange(1,13) 

Consider the version that worked.  To get 

```
ones_11x12 * np.arange(1,13)
```

to work. we broadcast the 1D vector

```
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
```

into a 2D vector each of whose **rows** is a copy of
the 1D vector.

```
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])
```
This has the shape we want, 11x12, enabling 
us to do do element-wise multiplication with the 11x12 ones matrix.

In the case of the 12x11 matrix, row-wise broadcasting from (12,) to 12x12 is of 
no use, so this is where we want columnwise broadcasting. That is achieved by reshaping the 1D 
vector of shape (12,) into a 2D vector of shape (12,1)

```
>>>> col_vec = np.arange(1,13).reshape(12,-1)
>>> col_vec
array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])
```

And a 12x1 2D array **can** be broadcast to a 12x11 shape.  In general with 2D arrays,
if there is a dimension of size 1, broadcasting can happen along that dimension.
So 

```
MxN * Mx1
MxN * 1xN
MxN * 1x1
```

all work.  And of course the last case is just multiplication by a scalar.

Demonstrating.

In [43]:
col_vec = np.arange(1,13).reshape(12,-1);  print(col_vec.shape)
col_vec

(12, 1)


array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])

In [53]:
np.ones((12,12)) * col_vec

array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.],
       [10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
       [11., 11., 11., 11., 11., 11., 11., 11., 11., 11., 11., 11.],
       [12., 12., 12., 12., 12., 12., 12., 12., 12., 12., 12., 12.]])

Now the surprise, if you're not used
to broadcasting rules: We only need 
multiply  `col_vec` by the 1D vector `np.arange(1,13)` to get our times table.

The reason is that in this case **both** operands are broadcastable,
and in that situation, **both** operands can be broadcast: `col_vec` 
will be broadcast columnwise, giving us 12 1-12 columns;
the 1D vec will complete the product by broadcasting rowwise.


```python
array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])
*
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
```

becomes


```python
array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.],
       [10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
       [11., 11., 11., 11., 11., 11., 11., 11., 11., 11., 11., 11.],
       [12., 12., 12., 12., 12., 12., 12., 12., 12., 12., 12., 12.]])
*
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])
```
and we have  our times table:

In [2]:
import numpy as np

col_vec = np.arange(1,13).reshape(12,-1);  print(col_vec.shape,end=" ")
one_twelve = np.arange(1,13);  print(one_twelve.shape)
col_vec *  one_twelve

(12, 1) (12,)


array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

Note: What's interesting about the above is that it is being done
as a multiplication with the help of **two way** broadcasting.  In this case, the same result
can be achieved by matrix multiplication of a column vector
with a row vector.

In [14]:
col_vec@col_vec.T

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

This is what's known as the **outer product**; mathematically, this is usually conceived as an operation
between two vectors (hence 1D), and numpy provides this as a function that will accept vectors.

```python
one_twelve = np.arange(1,13)
np.outer(one_twelve,one_twelve)
```

is equivalent to the above matrix product.

In general, there aren't other operations that easily achieve the effects of broadcasting,
especially two-way broadcasting. For example, for creating an addition table:

In [47]:
add = np.arange(13)[:,np.newaxis] + np.arange(13)
add

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17],
       [ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
       [ 7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [ 8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]])

is hard to beat.  

In [48]:
add[3,7]

10

Broadcasting is a very powerful tool.  The efficiency gains are
demonstrated at the end of the notebook.

Another example: a "times-table" with strings, used to create a sequence of unique ids of length `n*26`:

In [40]:
import numpy as np
import string

n=5
x=np.array(list(string.ascii_lowercase),dtype=object)
string_times_table = x[:,np.newaxis]*np.arange(1,n+1)
print(string_times_table)
# Flatten in column-major (=Fortran-style) order
# print(string_array.flatten.__doc__)
string_times_table.flatten(order="F")

[['a' 'aa' 'aaa' 'aaaa' 'aaaaa']
 ['b' 'bb' 'bbb' 'bbbb' 'bbbbb']
 ['c' 'cc' 'ccc' 'cccc' 'ccccc']
 ['d' 'dd' 'ddd' 'dddd' 'ddddd']
 ['e' 'ee' 'eee' 'eeee' 'eeeee']
 ['f' 'ff' 'fff' 'ffff' 'fffff']
 ['g' 'gg' 'ggg' 'gggg' 'ggggg']
 ['h' 'hh' 'hhh' 'hhhh' 'hhhhh']
 ['i' 'ii' 'iii' 'iiii' 'iiiii']
 ['j' 'jj' 'jjj' 'jjjj' 'jjjjj']
 ['k' 'kk' 'kkk' 'kkkk' 'kkkkk']
 ['l' 'll' 'lll' 'llll' 'lllll']
 ['m' 'mm' 'mmm' 'mmmm' 'mmmmm']
 ['n' 'nn' 'nnn' 'nnnn' 'nnnnn']
 ['o' 'oo' 'ooo' 'oooo' 'ooooo']
 ['p' 'pp' 'ppp' 'pppp' 'ppppp']
 ['q' 'qq' 'qqq' 'qqqq' 'qqqqq']
 ['r' 'rr' 'rrr' 'rrrr' 'rrrrr']
 ['s' 'ss' 'sss' 'ssss' 'sssss']
 ['t' 'tt' 'ttt' 'tttt' 'ttttt']
 ['u' 'uu' 'uuu' 'uuuu' 'uuuuu']
 ['v' 'vv' 'vvv' 'vvvv' 'vvvvv']
 ['w' 'ww' 'www' 'wwww' 'wwwww']
 ['x' 'xx' 'xxx' 'xxxx' 'xxxxx']
 ['y' 'yy' 'yyy' 'yyyy' 'yyyyy']
 ['z' 'zz' 'zzz' 'zzzz' 'zzzzz']]


array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
       'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
       'aa', 'bb', 'cc', 'dd', 'ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk',
       'll', 'mm', 'nn', 'oo', 'pp', 'qq', 'rr', 'ss', 'tt', 'uu', 'vv',
       'ww', 'xx', 'yy', 'zz', 'aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff',
       'ggg', 'hhh', 'iii', 'jjj', 'kkk', 'lll', 'mmm', 'nnn', 'ooo',
       'ppp', 'qqq', 'rrr', 'sss', 'ttt', 'uuu', 'vvv', 'www', 'xxx',
       'yyy', 'zzz', 'aaaa', 'bbbb', 'cccc', 'dddd', 'eeee', 'ffff',
       'gggg', 'hhhh', 'iiii', 'jjjj', 'kkkk', 'llll', 'mmmm', 'nnnn',
       'oooo', 'pppp', 'qqqq', 'rrrr', 'ssss', 'tttt', 'uuuu', 'vvvv',
       'wwww', 'xxxx', 'yyyy', 'zzzz', 'aaaaa', 'bbbbb', 'ccccc', 'ddddd',
       'eeeee', 'fffff', 'ggggg', 'hhhhh', 'iiiii', 'jjjjj', 'kkkkk',
       'lllll', 'mmmmm', 'nnnnn', 'ooooo', 'ppppp', 'qqqqq', 'rrrrr',
       'sssss', 'ttttt', 'uuuuu', 'vvvvv', 'wwwww', 'xxxxx', 'yyyyy',
 

### Example Exercises

Convert to  percentages: 
Each cell reports the percentage of a column's total count
represented by that cell's count.

First sum the columns; then divide by the column totals.

In [23]:
import numpy as np
a = np.arange(24)
(rows,cols) = (4,6)
a2d = a.reshape((rows,cols))
print('a2d: ')
print(a2d)
col_sums = a2d.sum(axis=0)
print()
print('col sums: ')
print(col_sums)

a2d: 
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]

col sums: 
[36 40 44 48 52 56]


Now try to divide each row by  col_sums.

That is, do the equivalent of what the following for-loop does, but without
a for-loop.  You can use this output to check your results.

In [24]:
np.array([a2d[i,:]/col_sums for i in range(rows)])

array([[0.        , 0.025     , 0.04545455, 0.0625    , 0.07692308,
        0.08928571],
       [0.16666667, 0.175     , 0.18181818, 0.1875    , 0.19230769,
        0.19642857],
       [0.33333333, 0.325     , 0.31818182, 0.3125    , 0.30769231,
        0.30357143],
       [0.5       , 0.475     , 0.45454545, 0.4375    , 0.42307692,
        0.41071429]])

This can be done easily because the 1D col_sums 
vector computed in the last cell will broadcast rowwise.

Write the answer to this easy problem in the cell below.
The correct answer is a few cells below.

In [64]:

print(f'{a2d.shape} divided by {col_sums.shape}')

col_percentages = a2d/col_sums
col_percentages

(4, 6) divided by (6,)


array([[0.        , 0.025     , 0.04545455, 0.0625    , 0.07692308,
        0.08928571],
       [0.16666667, 0.175     , 0.18181818, 0.1875    , 0.19230769,
        0.19642857],
       [0.33333333, 0.325     , 0.31818182, 0.3125    , 0.30769231,
        0.30357143],
       [0.5       , 0.475     , 0.45454545, 0.4375    , 0.42307692,
        0.41071429]])

Confirm that these are indeed percentages by summing the cols of `percentages`.

In [75]:
col_percentages.sum(axis=0)

array([1., 1., 1., 1., 1., 1.])

Now the more challenging problem.  The following cell computes
row sums for the 4 rows of `a2d`.  Use that row
sum vector to produce a 2d array `row_percentages` in which each cell represents
the percentage of a **row's** total count represented by that cell's count.

In [26]:
row_sums = a2d.sum(axis=1)
row_sums

array([ 15,  51,  87, 123])

So if we wrote a  for loop to compute `row_percentages`
it would be:

In [27]:
(rows,cols) = a2d.shape
np.array([a2d[:,i]/row_sums for i in range(cols)]).T

array([[0.        , 0.06666667, 0.13333333, 0.2       , 0.26666667,
        0.33333333],
       [0.11764706, 0.1372549 , 0.15686275, 0.17647059, 0.19607843,
        0.21568627],
       [0.13793103, 0.14942529, 0.16091954, 0.17241379, 0.18390805,
        0.1954023 ],
       [0.14634146, 0.15447154, 0.16260163, 0.17073171, 0.17886179,
        0.18699187]])

Your next challenge:
Define `row_percentages` a different (more efficient way),
through a single array/by array division.

Confirm the correctness of your answer by showing each row
of `row_percentages` sums to 1.

In [66]:
# (4,6) divided by (4,1)
print(f'{a2d.shape} divided by {row_sums[:,np.newaxis].shape}')
row_percentages = a2d/row_sums[:,np.newaxis]
row_percentages

(4, 6) divided by (4, 1)


array([[0.        , 0.06666667, 0.13333333, 0.2       , 0.26666667,
        0.33333333],
       [0.11764706, 0.1372549 , 0.15686275, 0.17647059, 0.19607843,
        0.21568627],
       [0.13793103, 0.14942529, 0.16091954, 0.17241379, 0.18390805,
        0.1954023 ],
       [0.14634146, 0.15447154, 0.16260163, 0.17073171, 0.17886179,
        0.18699187]])

In [67]:
row_percentages.sum(axis=1)

array([1., 1., 1., 1.])

### Vector Quantization Example

For a slightly more computationally sophisticated example of broadcasting, have
a look at [this numpy tutorial on broadcasting, especially the section entitled "A practical Example: Vector Quantization".](https://numpy.org/doc/stable/user/basics.broadcasting.html)

This is essentially the same idea as the van der Plas's KNN example, discussed in the Broadcasting slides.

### Timing results

Array computing is tricky, and efficiency issues don't always work out the way you think they will.

Even the most experienced programmers know that they best evidence is to try it, so below we try, and find
a significant speedup factor using broadcasting or matrix multiplication operations.  As always, mileage may vary on your individual machines.  

The one result that should be robust, though, is that the Python for-loop is the slowest,
by about an order of magnitude.

In [41]:
import timeit
import numpy as np

num_iters = 100_000

py_secs = timeit.timeit("""for a in range(1,13): 
                            for b in range(1,13): 
                              times_table[a-1,b-1] = a*b""",
                        setup="import numpy as np;times_table = np.ones((12,12),dtype=int)",                        
                        number=num_iters)

# Broadcasting Times
array_secs = timeit.timeit('one_twelve.reshape((12,1)) * one_twelve',
                            setup="import numpy as np;one_twelve=np.arange(1,13)",
                            number=num_iters)

# np.outer()
array_secs2 = timeit.timeit("""np.outer(one_twelve,one_twelve)""",
                            setup="import numpy as np;one_twelve=np.arange(1,13)",
                            number = num_iters)

# Matrix multiplication
array_secs3 = timeit.timeit("""one_twelve.reshape((12,1))@one_twelve.reshape((1,12))""",
                            setup="import numpy as np;one_twelve=np.arange(1,13)",
                            number = num_iters)


print (f"Normal Python: {py_secs:12.3f} sec")

print(f"NumPy Times: {array_secs:>14.3f} sec")

print(f"NumPy Outer Product: {array_secs2:>6.3f} sec")

print(f"NumPy Matrix Product: {array_secs3:.3f} sec")


Normal Python:        2.139 sec
NumPy Times:          0.207 sec
NumPy Outer Product:  0.288 sec
NumPy Matrix Product: 0.173 sec
