Curve Fitting
=============

SciPy provides many methods to fit curves to your data points. The fitted curves can be a much cleaner way to represent your data.

Question 1
----------
Plot the following noisy, sinusoidal data produced for you with a randomized generator.

In [None]:
import numpy as np
from numpy import pi
from scipy.stats import norm
from matplotlib.pyplot import plot, title
%matplotlib inline

In [None]:
def noisy_data(x):
    phase = pi / 4
    frequency = 0.85
    noise = norm.rvs(size=x.shape) * 0.5
    return np.sin(x * frequency + phase) + noise

x = np.linspace(-pi, pi, 100)
y = noisy_data(x)

In [None]:
# your code goes here

<div class="btn-group"><button class="btn" onclick="IPython.canopy_exercise.toggle_solution('2')">Solution</button></div>

In [None]:
plot(x, y, "b.")

Question 2
----------

Take a look at your data. Could you use a polynomial to fit the data? What order polynomial is appropriate?
Plot your fit against the original data.
How good is the fit? Can you make the fit better by increasing the polynomial order?

<div class="btn-group"><button class="btn" onclick="IPython.canopy_exercise.toggle_hint('4')">Hint</button></div>

`1 - R ** 2` can be calculated by dividing the sum of the squared model error by the overall variation.

In [None]:
from numpy import polyfit, poly1d

In [None]:
# your code goes here

<div class="btn-group"><button class="btn" onclick="IPython.canopy_exercise.toggle_solution('5')">Solution</button></div>

In [None]:
fit = polyfit(x, y, 3)

fit_func = poly1d(fit)

plot(x, y, 'b.')
plot(x, fit_func(x), 'r-')

# Standard error is the square root of the average squared distance
# of the data points from the model points
model_sq_err = (y - fit_func(x)) ** 2
std_err = np.sqrt(model_sq_err.mean())

# Correlation coefficient (R2) is the amount of variation explained by the model
total_sq_err = (y - y.mean()) ** 2
r2 = 1 - model_sq_err.sum() / total_sq_err.sum()

print "std err = {}".format(std_err)
print "r2 = {}".format(r2)

Question 3
----------

You find out that your data has the form `y=sin(ωx+ϕ)`. Set up a function to fit the data to this form.

In [2]:
# your code goes here

<div class="btn-group"><button class="btn" onclick="IPython.canopy_exercise.toggle_solution('6')">Solution</button></div>

In [3]:
def fit_func(x, w, phi):
    return np.sin(x * w + phi)

Question 4
----------

Use `curve_fit()` to find the best fit for the frequency and phase shift parameters in the data.

In [4]:
from scipy.optimize import curve_fit

In [5]:
# your code goes here

<div class="btn-group"><button class="btn" onclick="IPython.canopy_exercise.toggle_solution('8')">Solution</button></div>

In [6]:
coeffs, cov_matrix = curve_fit(fit_func, x, y, [0, 1])

plot(x, y, 'b.')
plot(x, fit_func(x, *coeffs), 'r-')
title('y = sin({:.3}x + {:.3})'.format(*coeffs))

# Standard error is the square root of the average squared distance
# of the data points from the model points
model_sq_err = (y - fit_func(x, *coeffs)) ** 2
std_err = np.sqrt(model_sq_err.mean())

# Correlation coefficient (R2) is the amount of variation explained by the model
overall_sq_err = (y - y.mean()) ** 2
r2 = 1 - model_sq_err.sum() / overall_sq_err.sum()

print "std err = {}".format(std_err)
print "r2 = {}".format(r2)

SyntaxError: invalid syntax (<ipython-input-6-6132f081fb0c>, line 16)

Copyright 2008-2016, Enthought, Inc.  
Use only permitted under license.  Copying, sharing, redistributing or other unauthorized use strictly prohibited.  
http://www.enthought.com