In [2]:
import pandas as pd
import numpy as np

([Link to broadcasting docs](https://docs.scipy.org/doc/numpy-1.15.0/user/basics.broadcasting.html))

# What is broadcasting?

- An easy way to handle arithmetic between vectors

### Example

- Let's say we have three different starting values: [1,2,3]
    - Let's say we also have four different multipliers: [0.5, 0.75, 1.0, 1.25]
    
- If we want to know the value for each starting value and each multiplier, we could loop through them:

In [3]:
for starting_value in [1,2,3]:
    for multiplier in [0.5,0.75,1,1.25]:
        value = starting_value*multiplier
        print('Starting value = {}; Multiplier = {}; Value = {}'.format(starting_value, multiplier, value))

Starting value = 1; Multiplier = 0.5; Value = 0.5
Starting value = 1; Multiplier = 0.75; Value = 0.75
Starting value = 1; Multiplier = 1; Value = 1
Starting value = 1; Multiplier = 1.25; Value = 1.25
Starting value = 2; Multiplier = 0.5; Value = 1.0
Starting value = 2; Multiplier = 0.75; Value = 1.5
Starting value = 2; Multiplier = 1; Value = 2
Starting value = 2; Multiplier = 1.25; Value = 2.5
Starting value = 3; Multiplier = 0.5; Value = 1.5
Starting value = 3; Multiplier = 0.75; Value = 2.25
Starting value = 3; Multiplier = 1; Value = 3
Starting value = 3; Multiplier = 1.25; Value = 3.75


- If we wanted to calculate these values using matrix multiplication, we could use:

$$
\begin{bmatrix}1\\ 2\\ 3\end{bmatrix} \begin{bmatrix}0.5 & 0.75 & 1.0 & 1.25\end{bmatrix} = \begin{bmatrix}0.5 & 0.75 & 1.0 & 1.25\\ 1.0 & 1.5 & 2 & 2.5\\ 1.5 & 2.25 & 3.0 & 3.75\end{bmatrix}
$$

- To do this with numpy, we can do it as:

In [4]:
array_starting_values = np.array([[1],[2],[3]])
array_multipliers = np.array([[0.5, 0.75, 1, 1.25]])
array_starting_values * array_multipliers

array([[0.5 , 0.75, 1.  , 1.25],
       [1.  , 1.5 , 2.  , 2.5 ],
       [1.5 , 2.25, 3.  , 3.75]])

- **Recall**: in linear algebra, in order to multiply two matrices, A and B, the number of columns in A must equal the number of rows in B. Thus, if A is an m x n matrix and B is an r x s matrix, n = r.
    - If we look at the shapes of our arrays, we can see that this condition is satisfied

In [5]:
array_starting_values.shape, array_multipliers.shape

((3, 1), (1, 4))

- *But what if we flipped the order?*

In [6]:
array_multipliers*array_starting_values

array([[0.5 , 0.75, 1.  , 1.25],
       [1.  , 1.5 , 2.  , 2.5 ],
       [1.5 , 2.25, 3.  , 3.75]])

- It still works
    - All we need is for the two arrays to be compatible to be multiplied in some order
        - The logic is explained [here](https://docs.scipy.org/doc/numpy-1.15.0/user/basics.broadcasting.html#general-broadcasting-rules)

- We can do any arithmetic on these arrays
    - Not just multiplication

In [7]:
array_starting_values + array_multipliers

array([[1.5 , 1.75, 2.  , 2.25],
       [2.5 , 2.75, 3.  , 3.25],
       [3.5 , 3.75, 4.  , 4.25]])

In [8]:
array_starting_values / array_multipliers

array([[2.        , 1.33333333, 1.        , 0.8       ],
       [4.        , 2.66666667, 2.        , 1.6       ],
       [6.        , 4.        , 3.        , 2.4       ]])

- To be clear, let's take another example

In [9]:
array_1 = np.array([1,2,3])
array_2 = np.array([4,5,6])

- These arrays are both one dimensional
    - This is shown below

In [15]:
array_1.ndim, array_2.ndim

(1, 1)

- What happens if we add them together?

In [16]:
array_1 + array_2

array([5, 7, 9])

- It was just element-wise vector addition
    - But what if we increase their dimension from 1 to 2?
        - First, we'll increase the dimension of `array_1`

In [17]:
array_1 = array_1[np.newaxis, :]

In [19]:
array_1+array_2

array([[5, 7, 9]])

- Same as last time
    - Now, we increase the dimension of `array_2`

In [20]:
array_2 = array_2[np.newaxis, :]

In [21]:
array_1+array_2

array([[5, 7, 9]])

- Again, same thing
    - Now, what if we pivot `array_2`?
        - i.e. we transpose it from a row vector to a column vector

In [23]:
array_2 = np.transpose(array_2)

In [24]:
array_1+array_2

array([[5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

- Now, it has looped through all the row and column values, and added them individually
    - What if we reverse the order?

In [25]:
array_2+array_1

array([[5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

- The same thing
    - Again, we see that numpy can perform arithmetic on vectors of different shapes, as long as they can be rearranged into the proper order

_____

# Using broadcasting to assign curves

- Let's say we have a lookup table that has the expected number of eggs per week laid by chickens of different breeds

In [90]:
df_chickens = pd.read_csv('data/chicken1.csv')

In [91]:
df_chickens.head()

Unnamed: 0,CHICKEN BREED NAME,Week 1,Week 2,Week 3,Week 4,Week 5,Week 6,Week 7,Week 8,Week 9,Week 10
0,Ameraucana,3,6,1,3,9,6,9,3,9,1
1,Ameriflower,5,1,15,1,5,10,15,5,15,10
2,Ancona,5,10,10,15,1,15,15,1,5,15
3,Andalusian,3,9,3,3,1,1,3,3,3,1
4,Antwerp Belgian Bantam,2,4,2,4,4,6,1,2,2,4


- Now let's say we have 10 chickens that are not all the same age
    - We'll assign their ages and breeds randomly

In [97]:
np.random.seed(0)
n_chickens = 10
df_coop = pd.DataFrame()
df_coop['Breed'] = np.random.choice(df_chickens['CHICKEN BREED NAME'], size=n_chickens)
df_coop['Week'] = np.random.randint(1,5,size=n_chickens)

In [98]:
df_coop

Unnamed: 0,Breed,Week
0,Marans,3
1,Naked Neck (Turken),2
2,Star,3
3,Super Blue Egg Layer,4
4,Super Blue Egg Layer,4
5,Ayam Cemani,3
6,Cubalaya,1
7,Jersey Giant,2
8,White Faced Black Spanish,2
9,Booted Bantam,2


- Now, we want to estimate the number of eggs produced by our coop for the next 10 weeks
    - We can do this by mapping the values in the array

- But first, we need to add a new column to the table for week 11, with all values equal to zero
    - It'll come in handy later

In [99]:
df_chickens['Week 11'] = 0
array_eggs = df_chickens.iloc[:,1:].values

In [100]:
rows = np.array([list(df_chickens['CHICKEN BREED NAME']).index(x) for x in df_coop['Breed']])

#need to subtract 1 from the line below since the index starts at 0
cols = (df_coop['Week'].values[:,np.newaxis] + np.arange(10))-1

#now, since we made that week 11 column, we can map all values greater than 10 to it
cols = np.minimum(cols, 10)

In [101]:
df_coop = df_coop.join(pd.DataFrame(array_eggs[rows, cols], columns = ['Eggs laid: {}'.format(x) for x in range(1,11)]))
df_coop

Unnamed: 0,Breed,Week,Eggs laid: 1,Eggs laid: 2,Eggs laid: 3,Eggs laid: 4,Eggs laid: 5,Eggs laid: 6,Eggs laid: 7,Eggs laid: 8,Eggs laid: 9,Eggs laid: 10
0,Marans,3,1,4,1,5,15,2,4,9,0,0
1,Naked Neck (Turken),2,6,1,5,1,5,6,8,3,3,0
2,Star,3,1,4,1,5,15,2,4,9,0,0
3,Super Blue Egg Layer,4,9,6,5,15,10,6,8,0,0,0
4,Super Blue Egg Layer,4,9,6,5,15,10,6,8,0,0,0
5,Ayam Cemani,3,1,4,1,5,15,2,4,9,0,0
6,Cubalaya,1,3,6,10,15,1,4,4,9,3,4
7,Jersey Giant,2,6,1,5,1,5,6,8,3,3,0
8,White Faced Black Spanish,2,6,1,5,1,5,6,8,3,3,0
9,Booted Bantam,2,6,1,5,1,5,6,8,3,3,0


- Now, since we have this process down, we can change the `n_chickens` to 100,000

In [105]:
n_chickens = 100000
df_coop = pd.DataFrame()
df_coop['Breed'] = np.random.choice(df_chickens['CHICKEN BREED NAME'], size=n_chickens)
df_coop['Week'] = np.random.randint(1,5,size=n_chickens)

rows = np.array([list(df_chickens['CHICKEN BREED NAME']).index(x) for x in df_coop['Breed']])

cols = (df_coop['Week'].values[:,np.newaxis] + np.arange(10))-1
cols = np.minimum(cols, 10)

In [110]:
df_coop = df_coop.join(pd.DataFrame(array_eggs[rows[:,np.newaxis], cols],
                                    columns = ['Eggs laid: {}'.format(x) for x in range(1,11)]))
df_coop

Unnamed: 0,Breed,Week,Eggs laid: 1,Eggs laid: 2,Eggs laid: 3,Eggs laid: 4,Eggs laid: 5,Eggs laid: 6,Eggs laid: 7,Eggs laid: 8,Eggs laid: 9,Eggs laid: 10
0,Polish,1,2,2,2,4,4,1,2,1,6,2
1,Java,2,9,1,3,6,3,1,6,1,3,0
2,Sultan,2,2,1,1,1,1,1,2,2,2,0
3,Olive Egger - MPC,1,3,9,6,6,1,1,9,9,9,9
4,Old English Game,3,1,1,1,1,1,6,6,6,0,0
5,Buckeye,2,1,9,1,3,9,1,3,3,6,0
6,Polish,4,4,4,1,2,1,6,2,0,0,0
7,Favaucana,4,10,10,1,5,15,1,10,0,0,0
8,Easter Eggers,1,4,1,12,1,12,4,8,12,8,4
9,Super Blue Egg Layer,3,15,15,1,5,15,10,15,10,0,0
