## Dot Product

The dot product is a crucial mathematical operation that we'll be using in many algorithms going forward.  
It is defined as the sum of the products of the corresponding elements of two vectors.  

Mathematically:  
$ a = [a_1, a_2,...a_n]$  
$ b = [b_1, b_2,...b_n]$  
   
$ a \bullet b = \sum_{i=1}^{n} a_ib_i + a_2b_2 + ... + a_nb_n$

In [1]:
import numpy as np
a = np.array(range(5))
b = np.array(range(5,10))
print('a :', a)
print('b :', b)

a : [0 1 2 3 4]
b : [5 6 7 8 9]


### 1. Write a function to calculate the dot product.

In [2]:
def dot_product(a,b):
#Your code goes here
    return sum(a*b)

### 2. Dot Product 2
Great! The dot product of a and b can also be calculated by:

$a\bullet b = a^Tb$ 

Recall that $a^T$ is the transpose of a.

Write a second function that calculates the dot product of a and b using this alternative calculation.

In [3]:
def dot_product2(a,b):
    #Your code goes here
    return np.matmul(a.transpose(), b)

### Polynomial Functions
Soon, we're going to expand our simple linear regression into the more generalized linear regression involving multiple variables. Instead of looking at the Gross Domestic Sales of a movie in terms of its budget alone, we'll consider more variables such as ratings and reviews to improve our predictions. 

When doing this, we will have a matrix of data where each column is a specific feature such as the budget, or the imdb review score, while each row will be an observance, one of the movies in our dataset.

$x_1\bullet w_1 + x_2\bullet w_2 + x_3\bullet w_3 + ... = y$

For example

In [4]:
import pandas as pd
x = pd.read_excel('movie_data_detailed_with_ols.xlsx')
x = x[['budget', 'imdbRating','Metascore', 'imdbVotes']]
x.head()

Unnamed: 0,budget,imdbRating,Metascore,imdbVotes
0,13000000,6.8,48,206513
1,45658735,0.0,0,0
2,20000000,8.1,96,537525
3,61000000,6.7,55,173726
4,40000000,7.5,62,74170


In [5]:
x = np.array(x)
x

array([[1.3000000e+07, 6.8000000e+00, 4.8000000e+01, 2.0651300e+05],
       [4.5658735e+07, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
       [2.0000000e+07, 8.1000000e+00, 9.6000000e+01, 5.3752500e+05],
       [6.1000000e+07, 6.7000000e+00, 5.5000000e+01, 1.7372600e+05],
       [4.0000000e+07, 7.5000000e+00, 6.2000000e+01, 7.4170000e+04],
       [2.2500000e+08, 6.3000000e+00, 2.8000000e+01, 1.2876600e+05],
       [9.2000000e+07, 5.3000000e+00, 2.8000000e+01, 1.8058500e+05],
       [1.2000000e+07, 7.8000000e+00, 5.5000000e+01, 2.4008700e+05],
       [1.3000000e+07, 5.7000000e+00, 4.8000000e+01, 3.0576000e+04],
       [1.3000000e+08, 4.9000000e+00, 3.3000000e+01, 1.7436500e+05],
       [4.0000000e+07, 7.3000000e+00, 9.0000000e+01, 3.9839000e+05],
       [2.5000000e+07, 7.2000000e+00, 5.8000000e+01, 7.5884000e+04],
       [5.0000000e+07, 6.2000000e+00, 5.2000000e+01, 7.6001000e+04],
       [1.8000000e+07, 7.3000000e+00, 7.8000000e+01, 1.7098600e+05],
       [5.5000000e+07, 7.8000000e+

### 3. Write a function that predicts a vector of model predictions $\hat{y}$ given a matrix of data x, and a vector of coefficient weights w.   
Mathematically:   
$x_1\bullet w_1 + x_2\bullet w_2 + x_3\bullet w_3 + ... = y$

In [6]:
def poly_regress_predict(x,w):
    y_hat = np.dot(x,w)
    return y_hat