The goal of this notebook is to pool a lot of different tools used for regression analysis.  I will have some code implementing pieces of them (and accompanying plots) - this header will probably be removed ultimately and replaced with something nicer. 

### Ordinary Least Squares and Linear Regression
These two concepts almost always go hand in hand.  Linear Regression is a method for trying to explain certain variables (typically called y) with other variables.  The name is somewhat confusing because the model isn't limited to expressions of the form  y = mx + b, because we can accept version of x that are non-linear (for example y = m*x^2) describes a method rather then a form of analysis.  To use ordinary least squares - simply minimize the squared difference between your predictions and your observations.  By doing this you arrive at the set of parameters that best explains the relationship between your regressors (variables you measure about hte real world) and some underlying observation that you are interested in. Note - it only minimizes the distance between observations and predictions along one axis - the one along which the variable you are trying to predict lies.  Lets take a look at some code which can run Ordinary Least Squares Regression:

In [6]:
import numpy as np

from bokeh.plotting import figure
from bokeh.io import output_notebook, push_notebook, show
import bokeh.layouts as layout
from bokeh.models import widgets as wid
from ipywidgets import interact
from numpy import pi

output_notebook()

In [5]:
def ordinaryLeastSquares(y, x):
    '''
    :param y: the vector of observations in the variable of interest
    :param x: the matrix of observed regressors
    :return betas: the variables which represent the best relationship between X and Y
    '''
    gramian = np.linalg.inv(np.matmul(np.transpose(x), x))
    betas = np.matmul(gramian, np.matmul(np.transpose(x), y))
    
    return betas   

Linear regression uses ordinary least squares to fit itself.  I will now write an automated script which fits the code using OLS and plots it.  it will plot with bokeh so that you can view the variaous different variables and how they relate.

In [38]:
def linRegressionPlot(y_data, x_data, y_name, x_names):
    '''
    :parma y_data: the observations of the data
    :param X_data: the matrix of all other observe values
    :param y_names: the names of the y variable
    :param x_names: the names of the x variables
    :return var_plot: the plot which shows the linear relationship between x[0] and y
    :return
    '''
    #fit data:
    betas = ordinaryLeastSquares(y_data, x_data)
    y_hat = np.matmul(x_data, betas)
    
    #set up plot looks:
    var_plot = figure(title = 'Fit and Relationship Between ' + y_name + ' and ' + x_names[0],
           plot_width = 950, plot_height = 400)
    var_plot.xgrid.grid_line_color = None
    var_plot.ygrid.grid_line_color = None
    var_plot.xaxis.major_label_orientation = pi/3
    var_plot.axis.major_label_text_font_size = "10pt"
    var_plot.title.text_font_size = "15px"
    var_plot.title.align = 'center'
    
    #scatterplot of data:
    data_source = dict(x = x_data[:,0], y = y_data)
    c = var_plot.circle(x = 'x', y = 'y', color = 'navy', size = 15, source = data_source)
    
    #fit line:
    line_data = dict(x = x_data[:,0], y = y_hat)
    l = var_plot.line(x = 'x', y = 'y', color = 'tomato', source = line_data)
    
    return var_plot, y_hat, c, l

def linPlotUpdate(pass_dict, selected_x):
    '''
    :param pass_dict: a dictionary containing all of the things passed
    :param selected_x: the name of the x to make the new plot for. 
    '''
    x_names = pass_dict['x_nam']; y_name = pass_dict['y_name']; x_data = pass_dict['x']; y_data = pass_dict['y']
    p = pass_dict['p']; c = pass_dict['c']; l = pass_dict['l']; y_hat = pass_dict['y_h']
    
    x_num = x_names.index(selected_x)
    p.title.text = 'Fit and Relationship Between ' + y_name + ' and ' + x_names[x_num]
    new_data_source = dict(x = x_data[:,x_num], y = y_data)
    new_line_data = dict(x = x_data[:,x_num], y = y_hat)
    c.data_source.data = new_data_source
    l.data_source.data = new_line_data
    
    push_notebook()

In [45]:
y_data = np.asarray([[1],[2],[3],[4],[5]])
x_data = np.asmatrix([[1,3.2],[2,6],[3,9],[4,12],[5,15]])
y_name = 'y'
x_names = ['x1', 'x2']
p, y_hat, c, l = linRegressionPlot(y_data, x_data, y_name, x_names)
pass_dict = {'x_nam': x_names, 'y_name': y_name, 'x': x_data, 'y': y_data,
             'p':p, 'c':c, 'l':l, 'y_h': y_hat}
show(p, notebook_handle = True)

In [46]:
interact(linPlotUpdate, pass_dict = [pass_dict], selected_x = x_names)

interactive(children=(Dropdown(description='pass_dict', options=({'x_nam': ['x1', 'x2'], 'y_name': 'y', 'x': m…

<function __main__.linPlotUpdate(pass_dict, selected_x)>