# Exercises 


## 1. Regression factors
The formula for the regression coefficients is

$\beta = (X'X)^{(-1)}X'Y $

But the data is a bit messed up, meaning that the format of the independent variables are saved in a flat array. That means we have a 1xN vector. I.e. the data was changed from that: 

![data before](../images/data_before.png)

to that:

![data after](../images/data_after.png)


The array contains the following variables: 

- Sale (in Dollars) - Amount of money received by the store
- Pack Size - Number of bottles per item
- State Bottle Cost - Cost of producing the bottle 
- Packs Sold - Amount of bottles sold
- Bottle Volume (in ml) - How many ml each bottle has



Question: Determine the regression coefficents of the following OLS regression

$Sale = \beta_0 + \beta_1 * (Pack Size) + \beta_2 * (State Bottle Cost) + \beta_3 * (Packs Sold) + \beta_4 * (Bottle Volume) + \epsilon $

In [39]:
from numpy.random import Generator, PCG64

rng = Generator(PCG64(seed=42))
data = rng.standard_normal(500000)
# this is pseudorandom and deterministic, all the values are always the same so this works:
assert int(np.sum(data)) == -253

# now the values are random, so the coeffiecients are going to be close to 0
# let's introduce a bias that will later appear in column 4
for i in range(4, data.shape[0], 4):
    data[i] += data[i - 4] * 0.8

### Numpy

In [38]:
# Numpy Way
import numpy as np

# Reshaping array from a 1x500000 format to a 5x100000 format
reshaped_data = data.reshape(100_000, -1)

# Separating the Sale variable from the rest
independent = reshaped_data[:,1:]
Y = float_data[:,0]

# Creating a column with only ones and add that to the numpy array as a column (this is done for the intercept)
ones = np.ones(independent.shape[0])
X = np.c_[ones, independent]

# Applying regression coefficient formula
X_prime = np.transpose(X)

inverse_part = np.linalg.inv(np.dot(X_prime, X))
X_prime_Y = np.dot(X_prime, Y)
beta = np.dot(inverse_part, X_prime_Y)

# Printing the coefficients
# the last one is far from 0, is the one we "biased" earlier
print(beta)

[-0.0036857   0.00221297  0.00262742 -0.00061504  0.38359158]


## 2. Matrix multiplication

You are given the 20×20 grid below

```
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48
```

What is the greatest product of four adjacent numbers in the same direction (up, down, left, right, or diagonally) in the 20×20 grid?

### Hints

* Get the data in a workable format first
* Remember you can use a numpy array as index, e.g. `myarray[tuple(myindex)]`
* If you represent the position as a x,y array a direction is an array to sum to it 

In [1]:
num = ("""
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48""".replace("\n", " "))

In [10]:
import numpy as np
grid = np.array([int(n) for n in num.split()]).reshape(20, 20)
print(grid.shape)
print(grid[19][19])

(20, 20)
48


In [28]:
offsets = [np.array(o) for o in ((0, 1), (1, 0), (1, 1))]
SEARCH_LENGTH = 4
# always positive, so -1 is fine as a dummy value. Otherwise use None and add a check for it
max_so_far = -1
max_position_so_far = ""
best_candidate_so_far = []
for x in range(grid.shape[0] - SEARCH_LENGTH):
    for y in range(grid.shape[1] - SEARCH_LENGTH):
        for offset in offsets:
            candidate = []
            cur_pos = np.array([x, y])
            for _ in range(SEARCH_LENGTH):
                candidate.append(grid[tuple(cur_pos)])
                cur_pos += offset
            # print("x", x)
            # print("y", y)
            # print("offset", offset)
            # print("candidate:", candidate)
            if np.prod(candidate) > max_so_far:
                max_so_far = np.prod(candidate)
                max_position_so_far = f"Start from {x}, {y} in direction {offset}"
                best_candidate_so_far = candidate

# note that coordinates are transposed, X is the line number and Y is the position in the line
print("Max found", max_so_far)
print(best_candidate_so_far)
print(max_position_so_far)
    

Max found 51267216
[np.int64(66), np.int64(91), np.int64(88), np.int64(97)]
Start from 6, 15 in direction [1 0]
