what are the parts that go into building a linear model? 
use the concept of a Taylor Series to focuse on the parameters slope and intercept, how they define the model, and how to interpret them in several applied contexts
also learn how to find the model that best fits the data by computing the optimal values of slope and intercept, using least-squares, numpy, statsmodels, and scikit-learn

#### What Makes a Model Linear?

build models from linear components and then see how to find the optimal parameters for those components so that our model best fits our data 

to see the model in terms of components, we can use the concept of the Taylor Series, Taylor series are extremely useful and examples can be found in math, science, and engineering 

the 3 things to know about Taylor Series:
- the approximate any curve
- they are expressed as polynomials, a series of terms summed together with each term being a product of a coefficient and a power of x, the terms are referred to by order, "the nth order" 
- in many applications you only need the 0th and first order to obtain a very good approximation, linear models can be thought of as "first order" models where the nonlinear terms (second order terms and higher) have been ignored 

zeroth order term is a0
first order term is a1
quadratic term is a2
y = a~0~ + a~1~x + a~2~x^2^

first order is linear so we'll set the defaults for this general linear model with a2=0 

remember that adding more terms can lead to overfitting, the model may not fit new data that wasn't used to train the model

In [None]:
# exercise example, this is a general model
# Define the general model as a function
def model(x, a0=3, a1=2, a2=0):
    return a0 + (a1*x) + (a2*x*x)

# Generate array x, then predict y values for specific, non-default a0 and a1
x = np.linspace(-10, 10, 21)
y = model(x)

# Plot the results, y versus x
fig = plot_prediction(x, y)

In [None]:
# exercise example, optimize the model or fit it to a new measured data set, xd and yd
# find the specific values for model parameters a0, a1 where the model data and the measured data lune up on a plot
# this is an iterative visualization strategy, you start with a good guess for model parameters, pass them to the model, 
# over-plot the resulting modeled data on the measured data, and then chick that the line passes through the points
# if the line doesn't pass through the points then change the model parameters and try again

# Complete the plotting function definition
def plot_data_with_model(xd, yd, ym):
    fig = plot_data(xd, yd)  # plot measured data
    fig.axes[0].plot(xd, ym, color='red')  # over-plot modeled data
    plt.show()
    return fig

# Select new model parameters a0, a1, and generate modeled `ym` from them.
a0 = 150
a1 = 25
ym = model(xd, a0, a1)

# Plot the resulting model to see whether it fits the data
fig = plot_data_with_model(xd, yd, ym)

#### Interpreting Slope and Intercept 

so far we've seen how linear models are built from zeroth and first order terms/components but
what about the coefficients of those terms (the parameters of the model)?
in this lesson you'll learn how to interpret the model parameters, intercept, and slope in specific contexts to see what they can tell us about the linear relationship being modeled 

the **model** is: 'y = a0 + a1*x'*
**x** is an independent variable, like time, which changes on its own
**y** is a dependent variable, like distance traveled, which changes in response to the x changing
a **model prediction** is the result of using a specific value like x=10 with the model to compute a specific value for y
the **a0** parameter is the intercept, it gives the value of y where the line intercepts the vertical line at x=0
the **a1** parameter is the slope, it's a measure of how sensitive a dependence exists between the two variables x and y, or how two variable co-vary, the average amount that the dependent variable changes when the independent variable changes 

for a given x and model parameters a0 and a1 we compute a prediction for the value of y as a0+a1x
the slope values could vary if the real data has variations, this variation in slop is **spread** or **uncertainty** in our value for a1

sometimes two things are defined as having a linear relationship for the convenience of "rescaling the rule", this is the idea behind unit conversion, one example is the relationship between Celcius and Fahrenheit temperature scales, it's a special case where both dimensions (what are normally the independent and dependent variables) are measuring the exact same thing, they're just different ways of measuring a single physical variable: the temperature, sometimes it's not very clear which variable is dependent or independent 

In [None]:
# exercise example
# this explore the conversion between the Fahrenheit and Celsius temperature scales 
# as a demonstration of interpreting slope and intercept of a linear relationship within a 
# physical context
# Complete the function to convert C to F
def convert_scale(temps_C):
    (freeze_C, boil_C) = (0, 100)
    (freeze_F, boil_F) = (32, 212)
    change_in_C = boil_C - freeze_C
    change_in_F = boil_F - freeze_F
    slope = change_in_F / change_in_C
    intercept = freeze_F - freeze_C
    temps_F = intercept + (slope * temps_C)
    return temps_F

# Use the convert function to compute values of F and plot them
temps_C = np.linspace(0, 100, 101)
temps_F = convert_scale(temps_C)
fig = plot_temperatures(temps_C, temps_F)

In [None]:
# Compute an array of velocities as the slope between each point
diff_distances = np.diff(distances)
diff_times = np.diff(times)
velocities = diff_distances / diff_times

# Chracterize the center and spread of the velocities
v_avg = np.mean(velocities)
v_max = np.max(velocities)
v_min = np.min(velocities)
v_range = v_max - v_min

# Plot the distribution of velocities
fig = plot_velocity_timeseries(times[1:], velocities)

In [None]:
# Import ols from statsmodels, and fit a model to the data
from statsmodels.formula.api import ols
model_fit = ols(formula="masses ~ volumes", data=df)
model_fit = model_fit.fit()

# Extract the model parameter values, and assign them to a0, a1
a0 = model_fit.params['Intercept']
a1 = model_fit.params['volumes']

# Print model parameter values with meaningful names, and compare to summary()
print( "container_mass   = {:0.4f}".format(a0) )
print( "solution_density = {:0.4f}".format(a1) )
print( model_fit.summary() )

#### Model Optimization

this will be a quantitative method for finding the optimal parameters values that result in the one model that fits the data better than all others of the same form 

we'll express the fitting in terms of an optimizatieon problem and show optimization minimizes the errors expressed in the form of a cost function of the residuals

a Taylor Series (such as a first order linear model) is always an approximation but the difference between the model and data can be quantified with something called a residual
to quantify the overall difference you want to sum the residuals but in this case the positive and negative residuals sum to zero because they cancel each other out so we'll use the square residuals 
squaring the residuals has two benefits, the first is that they don't cancel each other out since they're all positive, and the second is that they penalize larger residuals disproportionately more than smaller residuals, this is good for trying to find a quantity or cost function to constrain our optimization of model parameters 

the single aggregate quantity we use choose in guiding how to optimize the model will be the sum of the squared residuals, **RSS**, it'll be used to find the optimal model parameters a0 and a1
the goal is to vary the model parameters (a0 and a1) until the measure of model fit (RSS) is the smallest
once you do that, minimization of RSS will have given you the optimal values for the model parameters

In [None]:
# residuals
residuals = y_model - y_data
len(residuals) == len(y_data)

# summing residuals, the positive and negative will cancel out and sum to zero
residuals = y_model - y_data
print(np.sum(residuals))

# rss
resid_squared = np.square(y_model - y_data)
RSS = np.sum(resid_squared)

In [None]:
# exercise example 
# Load the data
x_data, y_data = load_data()

# Model the data with specified values for parameters a0, a1
y_model = model(x_data, a0=150, a1=25)

# Compute the RSS value for this parameterization of the model
rss = np.sum(np.square(y_data - y_model))
print("RSS = {}".format(rss))

In [None]:
# exercise example, create a function to visually compare model and data, also compute and print the rss
def compute_rss_and_plot_fit(a0, a1):
    xd, yd = load_data()
    ym = model(xd, a0, a1)
    residuals = ym - yd
    rss = np.sum(np.square(residuals))
    summary = "Parameters a0={}, a1={} yield RSS={:0.2f}".format(a0, a1, rss)
    fig = plot_data_with_model(xd, yd, ym, summary)
    return rss, summary

# Chose model parameter values and pass them into RSS function
rss, summary = compute_rss_and_plot_fit(a0=150, a1=25)
print(summary)

In [None]:
# exercise example, compute and visualize how RSS varies for different values of model parameters
# hold the intercept constant while you vary the slope
# for each slope value compute the model values and the resulting RSS
# you'll then have an array of RSS values so determine the minimal RSS value, in code, 
# and from that determine the slope the resulted in that minimal RSS

# we started with rss_list to make it easy to .append() but then later converted to numpy.array() 
# to gain access to all the numpy methods

# Loop over all trial values in a1_array, computing rss for each
a1_array = np.linspace(15, 35, 101)
for a1_trial in a1_array:
    y_model = model(x_data, a0=150, a1=a1_trial)
    rss_value = compute_rss(y_data, y_model)
    rss_list.append(rss_value)

# Find the minimum RSS and the a1 value from whence it came
# convert rss_list to a np.array()
rss_array = np.array(rss_list)
# find the minimum value of the array
best_rss = np.min(rss_array) 
# find the corresponding trial value
best_a1 = a1_array[np.where(rss_array==best_rss)]
print('The minimum RSS = {}, came from a1 = {}'.format(best_rss, best_a1))

# Plot your rss and a1 values to visually confirm answer, do the values found agree with the figure?
fig = plot_rss_vs_a1(a1_array, rss_array)

#### Least-Squares Optimization

minimization is a type of optimization problem
for linear models there's a analytic formula that provides the quantitative solution to this optimization problem, with analytic meaning a formula that can generate an exact numeric value simply from direct substiution instead of approximation 
more complex models don't have an analytic solution like this, they would have to be solved with numerical approximation or numeric integration 

this lesson will teach several ways to solve this optimization with code, these tools work for linear models but can also be adapted to solve more complex models beyond this course 

by setting RSS slope = 0 and using some calculus, you'll see that: 
* a1=covariance(x,y)/variance(x) 
* a0=mean(y)-a1*mean(x)

least-square doesn't guarantee zero error (there's no perfect solution) but in certain cases (like a linear model) least-squares is the best you can do

In [None]:
# optimized by Numpy
# first, compute the means of x and y
x_mean = np.mean(x)
y_mean = np.mean(y)

# compute the deviations by subtracting the mean for all values in each array
x_dev = x - x_mean
y_dev = y - y_mean

# compute the slope a1 as the covariance 
a1 = np.sum(x_dev * y_dev) / np.sum(x_dev ** 2) # squaring could also be np.square(x_dev)

# use the slope a1 and the means of x and y to compute the intercept a0
a0 = y_mean - (a1 * x_mean)

# exercise example
# Use the those optimal model parameters a0, a1 to build a model
y_model = model(x, a0, a1)

# plot to verify that the resulting y_model best fits the data y
fig, rss = compute_rss_and_plot_fit(a0, a1)

In [None]:
# optimized by Scipy
# linear models are a special case because algebraic formulas like the Numpy one above do not exist for more complex models 
# the scipy optimization module can solve more general optimization problems, not just least-squares

from scipy  import optimize

# load the data and define the model form as a function
x_data, y_data = load_data()

def model_func(x, a0, a1):
    return a0 + (a1 * x)

# use curve_fit and pass in the model function and data 
param_opt, param_cov = optimize.curve_fit(model_func, x_data, y_data)

# the param_opt output gives the model parameter values that minize RSS, they can be indexed like a list
# this code unpacks the results 
a0 = param_opt[0] # a0 is the intercept in y = a0 + a1 * x
a1 = param_opt[1] # a1 is the slope in y = a0 + a1 * x

# test and verify the answer
fig, rss = compute_rss_and_plot_fit(a0, a1)

In [None]:
# optimized by Statsmodels
# ordinary least squares, ols, will solve the same optimization problem 

from statsmodels.formula.api import ols

# it's easier to use a Pandas DataFrame with statsmodels so repack the data before passing to the ols method
x_data, y_data = load_data()
df = pd.DataFrame(dict(x_name=x_data, y_name=y_data))

# use a string statement in ols(), it's in the form "y is proportional to x", and fit 
model_fit = ols(formula="y_name ~ x_name", data=df).fit()

# thi fitted model can now be used to make predictions 
y_model = model_fit.predict(df)
# over-plot y_data with y_model
fig = plot_data_with_model(x_data, y_data, y_model)
x_model = x_data

# ...or to just extract the optimal parameter values
a0 = model_fit.params['Intercept']
a1 = model_fit.params['x_name']

# Visually verify that these parameters a0, a1 give the minimum RSS
fig, rss = compute_rss_and_plot_fit(a0, a1)