![](./images/Matrix.svg.png)

### Numpy and Matrices
The numpy package will provide you a great toolbox of optimized mathematical operations including the numpy array, the most convenient way to store vector and matrix data for computation. Here, we'll look at some basic operations in numpy.

In [6]:
A = np.array([np.random.randint(0,10) for i in range(12)]).reshape((3,4))
B = np.array([np.random.randint(0,10) for i in range(12)]).reshape((3,4))

x = np.random.randn(4)
y = np.random.randn(5)

print('A: {}'.format(A))
print('\n')
print('B: {}'.format(B))

print('\n\n')
print('x: {}'.format(x))
print('\n\n')
print('y: {}'.format(y))

A: [[7 8 4 3]
 [4 1 1 6]
 [0 5 2 6]]


B: [[6 6 7 9]
 [5 8 6 3]
 [9 5 5 3]]



x: [-0.62973874  0.48290411 -0.03709649 -0.35250661]



y: [-2.42623614 -1.36089824  1.36234004  0.56839141  0.24492674]


In [7]:
print('A+B:\n', A+B, '\n\n') # matrix addition
print('A-B:\n', A-B, '\n\n') # matrix subtraction
print('A*B:\n', A*B, '\n\n') # ELEMENTWISE multiplication
print('A/B:\n', A/B, '\n\n') # ELEMENTWISE division


print('A*x:\n', A*x, '\n\n') # multiply columns by x
print('A.T:\n', A.T, '\n\n') # transpose (just changes row/column ordering)
print('x.T:\n', x.T, '\n\n') # does nothing (can't transpose 1D array)

A+B:
 [[13 14 11 12]
 [ 9  9  7  9]
 [ 9 10  7  9]] 


A-B:
 [[ 1  2 -3 -6]
 [-1 -7 -5  3]
 [-9  0 -3  3]] 


A*B:
 [[42 48 28 27]
 [20  8  6 18]
 [ 0 25 10 18]] 


A/B:
 [[1.16666667 1.33333333 0.57142857 0.33333333]
 [0.8        0.125      0.16666667 2.        ]
 [0.         1.         0.4        2.        ]] 


A*x:
 [[-4.40817115  3.86323289 -0.14838595 -1.05751982]
 [-2.51895494  0.48290411 -0.03709649 -2.11503964]
 [-0.          2.41452056 -0.07419298 -2.11503964]] 


A.T:
 [[7 4 0]
 [8 1 5]
 [4 1 2]
 [3 6 6]] 


x.T:
 [-0.62973874  0.48290411 -0.03709649 -0.35250661] 




### 1. Generating Test Data
Generate two matrices of random data, A and B. Make matrix A a 3x4 matrix, and make B 4x 

### Polynomial Functions
Soon, we're going to expand our simple linear regression into the more generalized linear regression involving multiple variables. Instead of looking at the Gross Domestic Sales of a movie in terms of its budget alone, we'll consider more variables such as ratings and reviews to improve our predictions. 

When doing this, we will have a matrix of data where each column is a specific feature such as the budget, or the imdb review score, while each row will be an observance, one of the movies in our dataset.

$x_1\bullet w_1 + x_2\bullet w_2 + x_3\bullet w_3 + ... = y$

In [11]:
pwd

'/Users/matthew.mitchell/Documents/Learn_CO/Data_Science_Immersive/Mod3/python_linear_algebra_matrices'

In [12]:
cd '/Users/matthew.mitchell/Documents/Learn_CO/Data_Science_Immersive/Mod3/python_linear_regression_lab'

/Users/matthew.mitchell/Documents/Learn_CO/Data_Science_Immersive/Mod3/python_linear_regression_lab


In [25]:
import pandas as pd
x = pd.read_excel('movie_data_detailed_with_ols.xlsx')
x = x[['budget', 'imdbRating','Metascore', 'imdbVotes']]
x.head()

Unnamed: 0,budget,imdbRating,Metascore,imdbVotes
0,13000000,6.8,48,206513
1,45658735,0.0,0,0
2,20000000,8.1,96,537525
3,61000000,6.7,55,173726
4,40000000,7.5,62,74170


In [26]:
x = np.array(x)
x

array([[1.3000000e+07, 6.8000000e+00, 4.8000000e+01, 2.0651300e+05],
       [4.5658735e+07, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
       [2.0000000e+07, 8.1000000e+00, 9.6000000e+01, 5.3752500e+05],
       [6.1000000e+07, 6.7000000e+00, 5.5000000e+01, 1.7372600e+05],
       [4.0000000e+07, 7.5000000e+00, 6.2000000e+01, 7.4170000e+04],
       [2.2500000e+08, 6.3000000e+00, 2.8000000e+01, 1.2876600e+05],
       [9.2000000e+07, 5.3000000e+00, 2.8000000e+01, 1.8058500e+05],
       [1.2000000e+07, 7.8000000e+00, 5.5000000e+01, 2.4008700e+05],
       [1.3000000e+07, 5.7000000e+00, 4.8000000e+01, 3.0576000e+04],
       [1.3000000e+08, 4.9000000e+00, 3.3000000e+01, 1.7436500e+05],
       [4.0000000e+07, 7.3000000e+00, 9.0000000e+01, 3.9839000e+05],
       [2.5000000e+07, 7.2000000e+00, 5.8000000e+01, 7.5884000e+04],
       [5.0000000e+07, 6.2000000e+00, 5.2000000e+01, 7.6001000e+04],
       [1.8000000e+07, 7.3000000e+00, 7.8000000e+01, 1.7098600e+05],
       [5.5000000e+07, 7.8000000e+

### 1. Write a function that predicts a vector of model predictions $\hat{y}$ given a matrix of data x, and a vector of coefficient weights w.   
Mathematically:   
$x_1\bullet w_1 + x_2\bullet w_2 + x_3\bullet w_3 + ... = y$

In [None]:
def poly_regress_predict(x,w):
    return y_hat

### 2. Systems of Equations
If you recall from your earlier life as a algebra student:

$2x +10 = 18$ has a unique solution; one variable, one equation, one solution

Similarly, two variables with two equations has one solution*   
$x+y=4$  
$2x+2y=10$

However, if we allow 2 variables with only 1 equation, we can have infinite solutions.
$x+y=4$

*(An inconsistent system will have no solution and a system where the second equation is a multiple of the first will have infinite solutions)

### Representing Data as Matrices

1. Write a Matrix to represent this system:   
$x+y=4$  
$2x+2y=10$

2. Multiply your matrix by 3. What is the resulting system of equations? 

3. Solve both your original Matrix and the new Matrix that was the orginal, multiplied by 3.