<hr style="height: 1px;">
<i>This notebook was authored by the 8.S50x Course Team, Copyright 2022 MIT All Rights Reserved.</i>
<hr style="height: 1px;">
<br>

<h1>Guided Problem Set 6: Matched Filtering Part I - Time Domain </h1>


<a name='section_6_0'></a>
<hr style="height: 1px;">


## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P6.0 Overview</h2>


<h3>Navigation</h3>

<table style="width:100%">
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_6_1">P6.1 What is Matched Filtering?</a>
        </td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#problems_6_1">P6.1 Problems</a>
        </td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_6_2">P6.2 Fitting in the Time Domain: Part I</a>
        </td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#problems_6_2">P6.2 Problems</a>
        </td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_6_3">P6.3 Fitting in the Time Domain: Part II</a>
        </td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#problems_6_3">P6.3 Problems</a>
        </td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_6_4">P6.4 Sweeping the Time Window</a>
        </td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#problems_6_4">P6.4 Problems</a></td>
    </tr>
</table>

<h3>Summary</h3>

**P6.1 What is Matched Filtering**
<ul>
    <li>text needed</li>
</ul>

**P6.2 Fitting in the Time Domain: Part I**
<ul>
    <li>text needed</li>
</ul>

**P6.3 Fitting in the Time Domain: Part II**
<ul>
    <li>text needed</li>
</ul>


**P6.4 Sweeping the Time Window**
<ul>
    <li>text needed</li>
</ul>

<h3>Importing Libraries and Data</h3>

Before beginning, run the cell below to import the relevant libraries for this notebook. 
Optionally, set the plot resolution and default figure size.


In [None]:
#>>>RUN

!pip install lmfit

In [None]:
#>>>RUN

import numpy as np
import matplotlib.pyplot as plt
from scipy.io.wavfile import write

from lmfit import Model, Parameters
import scipy.stats as stats
from scipy.stats import chisquare
from multiprocessing import Pool


#set plot resolution
%config InlineBackend.figure_format = 'retina'

#set default figure size
plt.rcParams['figure.figsize'] = (9,6)

<a name='section_6_1'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P6.1 What is Matched Filtering?</h2>    

| [Top](#section_6_0) | [Previous Section](#section_6_0) | [Problems](#problems_6_1) | [Next Section](#section_6_2) |


<h3>Overview</h3>

The purpose of matched filtering is to scan big data sets looking for some kind of signal. LIGO does this to look for gravitational waves in their strain data. Matched filtering is also done in many other fields.

The purpose is usually to create some kind of plot of signal to noise ratio (SNR) over your data. For LIGO, this is a 2D plot with time on the x axis and SNR on the y axis. If you're looking for point sources in astrophysical telescope data, for example, this is an image plot with right ascention and declination as the axes and SNR shown in the image.

Signals look like large spikes in the SNR.

To make this exercise useful to you in the LIGO project, we'll make a model signal that looks kind of like a black hole waveform. Run the below code to load the waveform and plot an example. (This is nearly the same function as was used in recitation 3)

In [None]:
#>>>RUN

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0x98a09fe)

def complicated_model_fn(x, time, lambda_plus, lambda_minus, max_amp, omega_0, omega_max, omega_sigma):
    omega = (omega_max - omega_0) * (np.exp(-np.minimum(x - time, 0)**2 / omega_sigma)) + omega_0
    lambdas = np.array([lambda_plus if xvalue > time else lambda_minus for xvalue in x])
    amplitude = max_amp * np.exp(-abs(x - time) / lambdas)
    return amplitude * np.cos(omega * (x-time))

LAMBDA_PLUS_TRUE = 1.0
LAMBDA_MINUS_TRUE = 4
MAX_AMP_TRUE = 1.2
OMEGA_0_TRUE = 3.0
OMEGA_MAX_TRUE = 6.0
OMEGA_SIGMA_TRUE = 4.0
TIME_TRUE = 50.0

xi = np.linspace(TIME_TRUE-15, TIME_TRUE+5, 200)
true_yi = complicated_model_fn(xi, TIME_TRUE, LAMBDA_PLUS_TRUE, LAMBDA_MINUS_TRUE, MAX_AMP_TRUE,
                               OMEGA_0_TRUE, OMEGA_MAX_TRUE, OMEGA_SIGMA_TRUE)

plt.plot(xi, true_yi)
plt.xlabel("Time (s)")
plt.ylabel("Strain");

Let's make some fake data. We'll simulate "noise" as ten sinusoids of varying frequency, phase, and amplitude added together, and superimpose a merger signal at $t=0$. Make sure you take your time to read the code and understand it.

In [None]:
#>>>RUN

np.random.seed(908)

NUMBER_SINES_TO_ADD = 10

noise_frequencies = 0.5 + 7 * np.random.random(NUMBER_SINES_TO_ADD)
noise_phases = 2 * np.pi * np.random.random(NUMBER_SINES_TO_ADD)
noise_amplitudes = 2 * MAX_AMP_TRUE / NUMBER_SINES_TO_ADD * np.random.random(NUMBER_SINES_TO_ADD)
    # The above line sets noise amplitudes so that the sum of all the noise amplitudes is on average
    # equal to the maximum amplitude of the signal.

sample_spacing = 0.1
xi = np.arange(-128, 128, sample_spacing)#times
yi = np.zeros_like(xi)#data

#Adding Noise
for freq, phase, amplitude in zip(noise_frequencies, noise_phases, noise_amplitudes):
    yi += amplitude * np.sin(phase + freq * xi)
   
#Adding Data
yi += complicated_model_fn(xi, TIME_TRUE, LAMBDA_PLUS_TRUE, LAMBDA_MINUS_TRUE, MAX_AMP_TRUE,
                               OMEGA_0_TRUE, OMEGA_MAX_TRUE, OMEGA_SIGMA_TRUE)

plt.figure(figsize=(16, 5))
plt.plot(xi, yi)
plt.xlabel("Time (s)")
plt.ylabel("Strain")
plt.show()

Our goal is to find the signal in this data.

<a name='problems_6_1'></a>     

| [Top](#section_6_0) | [Restart Section](#section_6_1) | [Next Section](#section_6_2) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.1.1</span>

In this problem we will generate some noise and create SNR (signal to noise ratio) plots in order to identify the time at which a signal exists. Since we already know the signal and the noise seperately, we can implement a naive approach to finding the time which the signal exists where we will simply take the time location of the maximum of the SNR as the time where the signal event occurs. Your goal is to explore how well this crude method estimates the signal event.

First, generate some noise composed of 1,000 sines with frequencies randomly taken from a normal distribution with mean at .8 and standard deviation of 5, phases taken from a random uniform distribution ranging from 0 to $2\pi$, and amplitudes set so that the sum of all the noise amplitudes is on average equal to the maximum amplitude of the signal.


In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

MAX_AMP_TRUE = 1.2
SAMPLE_SPACING = 0.1
NUMBER_SINES_TO_ADD = 1000

xi = np.arange(0, 128, SAMPLE_SPACING)#times

def generate_noise(xi):
  np.random.seed(908)
  yi_noise = np.zeros_like(xi)

  noise_frequencies = 0 #YOUR CODE HERE
  noise_phases = 0 #YOUR CODE HERE
  noise_amplitudes = 0 #YOUR CODE HERE

  #Adding Noise
  for freq, phase, amplitude in zip(noise_frequencies, noise_phases, noise_amplitudes):
      yi_noise += amplitude * np.sin( phase + freq * xi)

  return yi_noise

plt.figure(figsize=(16, 5))
plt.xlabel("Time (s)")
plt.ylabel("Strain")
plt.plot(xi, generate_noise(xi))
plt.show()


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.1.2</span>

Now, we would like to show that by taking the region with maximum strain, we can find the signal. As a first check, we want to check that we can get an injected signal by taking the maximum in the range. Furthermore, we want to check that the injected signal has a maximum time consistent with the injected time. This is a test of the response of the injected wave. 

Create a set of 50 signals of the form shown earlier with the below parameters. Each signal should correspond with each whole second in the range [50, 100) where the corresponding second is the `TRUE_TIME` of the signal (i.e., you should have one signal occuring at t=50, one at t=51, one at t=52, etc.).

For each signal, inject it in the noise and try to find the time at which the injection happens by taking the max devitation. Lastly, make a plot of the true (injected) time, vs the max SNR time. You will notice the scale option on the function `get_Max_times`; this option can be added to rescale the size of the signal before it gets injected. 

HINT: take a look at `np.argmax()`.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

LAMBDA_PLUS_TRUE = 1.0
LAMBDA_MINUS_TRUE = 4
MAX_AMP_TRUE = 1.2
OMEGA_0_TRUE = 3.0
OMEGA_MAX_TRUE = 6.0
OMEGA_SIGMA_TRUE = 4.0

def complicated_model_fn(x, time, lambda_plus, lambda_minus, max_amp, omega_0, omega_max, omega_sigma):
    omega = (omega_max - omega_0) * (np.exp(-np.minimum(x - time, 0)**2 / omega_sigma)) + omega_0
    lambdas = np.array([lambda_plus if xvalue > time else lambda_minus for xvalue in x])
    amplitude = max_amp * np.exp(-abs(x - time) / lambdas)
    return amplitude * np.cos(omega * (x-time))

def get_max_times(xi, yi_noise, true_times,scale=1.0,iCheck=False):
    time_of_maximums = []

    for t in true_times:

        yi_signal = #YOUR CODE HERE
        yi_test_noise = #YOUR CODE HERE
        SNR = #YOUR CODE HERE
        
        time_of_maximums.append(xi[np.argmax(SNR)])
        
        if int(t) == 75 and iCheck:
            plt.xlabel("Time (s)")
            plt.ylabel("Strain")
            plt.plot(xi,yi_test_noise)
            plt.plot(xi,yi_signal)
            plt.show()
        
    return time_of_maximums

true_times = np.linspace(50, 100, 50)
xi = np.arange(0, 128, SAMPLE_SPACING)
yi_noise = generate_noise(xi)

plt.plot(true_times, get_max_times(xi, yi_noise, true_times), label = 'naive model')
plt.xlabel('True Times (s)')
plt.ylabel('Predicted Times (s)')
plt.legend()
plt.show()

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.1.3</span>

For each of the 50 signals, plot the time at which you estimated the signal to have occured against the time at which the signal acually occured (true time on x-axis, estimated time on y-axis). Also plot the line y=x on the plot (what a perfect algorithm would look like). What does the crude method look like as compared to a perfect algorithm? Select the best answer below:

- Crude method fits exactly to ideal algorithm

- Crude method mostly gets the time at which the signal event occurs and occationally overestimates/understimates.

- Crude method largely underestimates the time at which the signal event occurs and occationally overestimates.

- Crude method largely overestimates the time at which the signal event occurs and occationally underestimately.

- Crude method always underestimates the time at which the signal event occurs.

- Crude method always overestimates the time at which the signal event occurs.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.1.4</span>

Finally, now scale the algorithm down by a factor of 20, what happens to the crude method? Select the best answer below:

- Crude method fits exactly to ideal algorithm

- Crude method largely underestimates the time at which the signal event occures and occationally overestimates.

- Crude method largely overestimates the time at which the signal event occures and occationally underestimately.

- Crude method always underestimates the time at which the signal event occures.

- Crude method almost always doesn't work. 

- Crude method doesn't work.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

<a name='section_6_2'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P6.2 Fitting in the Time Domain: Part I</h2>    

| [Top](#section_6_0) | [Previous Section](#section_6_1) | [Problems](#problems_6_2) | [Next Section](#section_6_3) |


<h3>Overview</h3>

In this section you will solve a series of problems which uses a more sophisticated algorithm, called matched filtering, for finding the time at which the signal event occured. This is one of the more difficult problems of the course.

Matched filtering in the time domain is probably conceptually the easiest approach to matched filtering. We will perform a fit of the model function to the data, forcing the model function to assume a time of merger of $t$, then plot the quality of the fit as a function of $t$. We expect a very good fit quality when $t$ is close to the true time $t=0$, and otherwise we expect poor fits.

First, let's regenerate the data (same code as in first section).

In [None]:
np.random.seed(0x98a09fe)

def complicated_model_fn(x, time, lambda_plus, lambda_minus, max_amp, omega_0, omega_max, omega_sigma):
    omega = (omega_max - omega_0) * (np.exp(-np.minimum(x - time, 0)**2 / omega_sigma)) + omega_0
    lambdas = np.array([lambda_plus if xvalue > time else lambda_minus for xvalue in x])
    amplitude = max_amp * np.exp(-abs(x - time) / lambdas)
    return amplitude * np.cos(omega * (x-time))

def simple_fn(x,decay,constant,amplitude):
    return amplitude*np.exp(-1*x*decay)+constant

Model(simple_fn)

LAMBDA_PLUS_TRUE = 1.0
LAMBDA_MINUS_TRUE = 4
MAX_AMP_TRUE = 1.2
OMEGA_0_TRUE = 3.0
OMEGA_MAX_TRUE = 6.0
OMEGA_SIGMA_TRUE = 4.0
TIME_TRUE = 50.0

xi = np.linspace(TIME_TRUE-15, TIME_TRUE+5, 200)
true_yi = complicated_model_fn(xi, TIME_TRUE, LAMBDA_PLUS_TRUE, LAMBDA_MINUS_TRUE, MAX_AMP_TRUE,
                               OMEGA_0_TRUE, OMEGA_MAX_TRUE, OMEGA_SIGMA_TRUE)

NUMBER_SINES_TO_ADD = 10

noise_frequencies = 0.5 + 7 * np.random.random(NUMBER_SINES_TO_ADD)
noise_phases = 2 * np.pi * np.random.random(NUMBER_SINES_TO_ADD)
noise_amplitudes = 2 * MAX_AMP_TRUE / NUMBER_SINES_TO_ADD * np.random.random(NUMBER_SINES_TO_ADD)
    # The above line sets noise amplitudes so that the sum of all the noise amplitudes is on average
    # equal to the maximum amplitude of the signal.

plt.plot(xi, true_yi)
plt.title("True Signal")
plt.xlabel("Time (s)")
plt.ylabel("Strain")
plt.show()

sample_spacing = 0.1
xi = np.arange(-128, 128, sample_spacing)#times
yi = np.zeros_like(xi)#data

#Adding Noise
for freq, phase, amplitude in zip(noise_frequencies, noise_phases, noise_amplitudes):
    yi += amplitude * np.sin(phase + freq * xi)

#Adding Data
signal= complicated_model_fn(xi, TIME_TRUE, LAMBDA_PLUS_TRUE, LAMBDA_MINUS_TRUE, MAX_AMP_TRUE,
                               OMEGA_0_TRUE, OMEGA_MAX_TRUE, OMEGA_SIGMA_TRUE)
yi+=signal

plt.plot(xi, yi)
plt.plot(xi, signal)
plt.title("Signal plus noise")
plt.xlabel("Time (s)")
plt.ylabel("Strain")
plt.show()

plt.plot(xi, yi)
plt.plot(xi, signal)
plt.title("Signal plus noise")
plt.xlim(35,55)
plt.xlabel("Time (s)")
plt.ylabel("Strain")
plt.show()

<a name='problems_6_2'></a>     

| [Top](#section_6_0) | [Restart Section](#section_6_2) | [Next Section](#section_6_3) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.2.1</span>

We'll need to cut the data to perform the fit. How much time before and after $t$ would you like to fit over? We really only need to consider the region where the signal is larger than the noise. In practice this is something we could systematcially calculate on the fly, given the noise and signal size, for instance by analyzing the plot that we generated. Since this is a little subjective, read the guidance below to choose an appropriate window.

In what follows, only consider a 7-10 second window, as this will include enough data to make our fits converge, but will still give little enough data that the fits converge the fastest. Furthermore, this window need not be symetric, as much of the signal lies before the true time with only a little bit of signal left after the time of the event. Therefore, `t_before` > `t_after`.

With these conditions, one possible choice for `[t_before, t_after]` is `[5,2]`. What is another acceptable answer, given the constraints that we outlined?

Enter your answer as a list, formatted as `[t_before, t_after]`.

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.2.2</span>

We will make a function that creates an `LMFIT` `Model` and `Parameters` instance for the complicated model `complicated_model_fn` with starting time `t` and parameters constrained by the dictionary `params_min_max`. Specifically, this function will force the time of the signal to appear at time `t` and will randomly seed the starting points of the parameters that are used, within the given parameter ranges. Call this function `model_and_random_parameters(t)`.

Your task is to write a function that seeds the starting points of parameters randomly, such that each parameter `p` takes on a uniformly distributed random value within the range defined by `p_min` and `p_max`.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

from lmfit import Model, Parameters
    
    
def get_param_random_value(p_min,p_max):
    #get a uniformly distributed random value between p_min and p_max
    #return a float
    return #YOUR CODE HERE


params_min_max = {
    'lambda_plus': (0.1, 5),
    'lambda_minus': (0.1, 5),
    'max_amp': (0, 2),
    'omega_0': (0, 5),
    'omega_max': (0, 10),
    'omega_sigma': (0, 5)
}

def model_and_random_parameters(t):
    model = Model(complicated_model_fn)
    params = Parameters()
    params.add('time', value=t, vary=False)
    for p, (p_min, p_max) in params_min_max.items():
        value = get_param_random_value(p_min,p_max)
        params.add(p, min=p_min, max=p_max, value=value)
    return model, params


#TEST EXAMPLE: SHOULD = 1.51166
t=0.1
np.random.seed(1)
print(model_and_random_parameters(t)[1].get('omega_0').value)

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.2.3</span>

Now, make a function that fits the model created in the previous problem and outputs the fit result. Remember that we only want to look at a specific part of the data when we fit, namely the range `(t-t_before, t+t_after)`, where `t` is the specific time at which we want to look for the signal. Use the values `t_before = 5` and `t_after = 2`.

HINT: to do this, the function `np.where()` may be useful.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.
import lmfit

#THE WINDOW MUST BE [5,2] FOR YOUR ANSWER TO MATCH EXPECTED VALUES
t_before = 5
t_after = 2


def get_signal_indices(xi, t, t_before, t_after):
    #use np.where() to return a 1D the relevant indices
    #note, the result of np.where() will be a tuple
    return #YOUR CODE HERE

def fit_once(xi, yi, t, t_before, t_after):
    data_indices = get_signal_indices(xi, t, t_before, t_after)
    data_x = xi[data_indices]
    data_y = yi[data_indices]
    model, params = model_and_random_parameters(t)    
    result = model.fit(data_y, params, x=data_x)
    return result

result = fit_once(xi, yi, TIME_TRUE, t_before, t_after)
result.plot();

#print("Fit chi2 value: ", result.chisqr)
#print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.2.4</span>

Run the fit multiple times and print the $\chi^{2}$ value and $\chi^{2}$ probability using the following lines of code.

<pre>
print("Fit chi2 value: ", result.chisqr)
print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))
</pre>

What is the lowest $\chi^{2}$ value that you obtain, and corresponding $\chi^{2}$ probability?

Enter your answer as a list of numbers `[chi2, chi2_prob]` with precision 1e-3.

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.2.5</span>

Let's consider whether the $\chi^{2}$ of the fit is a reasonable number. What does the $\chi^{2}$ probability say about the fit? Choose the best answer from the following options:

- The fit is perfect! This is because our model is perfect. Our job is done.
- The fit is perfect, which means we should consider carefully the assumptions we have made.
- The fit is okay, and we can do no better.
- The fit is terrible, so we should adjust our model or the range of data that we are fitting.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.
import scipy.stats as stats

print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))

<a name='section_6_3'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P6.3 Fitting in the Time Domain: Part II</h2>    

| [Top](#section_6_0) | [Previous Section](#section_6_2) | [Problems](#problems_6_3) | [Next Section](#section_6_4) |


<h3>Weighted Fitting</h3>

The uncertainties are overesimated, but why? The real issue is that our fit so far has not taken into account the uncertainties correctly. To do that, we need to do a weighed $\chi^{2}$ fit.

<a name='problems_6_3'></a>     

| [Top](#section_6_0) | [Restart Section](#section_6_3) | [Next Section](#section_6_4) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.3.1</span>

Take the above fit, and repeat it but taking the uncertainties to a value of $\sigma=0.2$. To do this, you will have run a weighted fit with `lmfit` by setting an array of weights. Note, the weights in `lmfit` are designed so that $w=1/\sigma$, leading to the following:

$$\chi^{2} = \sum_{i}\frac{(f(x_{i})-f(x))^{2}}{\sigma_{i}^{2}}$$

From this, now what is the $\chi^{2}$ probability corresponding to the lowest $\chi^{2}$ value ? Enter your answer as a number with precision 1e-3.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.
import lmfit

def fit_once_weighted(xi, yi, t, t_before, t_after, weight=1.0):
    data_indices = get_signal_indices(xi, t, t_before, t_after)
    data_x = xi[data_indices]
    data_y = yi[data_indices]
    
    weights = #YOUR CODE HERE
    
    model, params = model_and_random_parameters(t)
    result = model.fit(data_y, params, x=data_x,weights=weights)
    return result


unc=0.2
result = fit_once_weighted(xi, yi, TIME_TRUE, t_before, t_after,1./unc)

result.plot();
print("Fit chi2 value: ", result.chisqr)
print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.3.2</span>

Finally, we should come up with a strategy to compute the uncertainty of our points. Uncertainty is often defined by how accurate you are trying to model your dataset. With LIGO data, this is a difficult question, since much of the wiggles from the "Noise" are actually understood as oscillations at certain frequencies. 

In this problem, we are going to make the statement, that we do not wish to model the noise, and we would like to have our uncertainty reflect the average RMS of our noisy, signal-free data. 

Take the dataset above, and compute the standard deviation of the signal free noise, then repeat the fit. What $\chi^{2}$ probability do you get? Enter your answer as a number with precision 1e-3.


In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.
import lmfit

def get_noise_indices(xi, t, t_before, t_after):
    return #YOUR CODE HERE
    

def noise(xi, yi, t, t_before, t_after):
    data_indices = get_noise_indices(xi, t, t_before, t_after)
    data_y = yi[data_indices]
    return np.std(data_y)

unc=noise(xi, yi, TIME_TRUE, t_before, t_after)
result = fit_once_weighted(xi, yi, TIME_TRUE, t_before, t_after,1./unc)
result.plot();
print("unc value: ", unc)
print("Fit chi2 value: ", result.chisqr)
print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))

<h3>Correlations</h3>

Our fit is, in some sense, still too good! Why is this the case? Well, what is happening in this case is that our fit function is fitting the noise as well. This is partly a feature of fitting time series data, where the points are correlated with one another, and the reality is that $\chi^{2}$ assumes points randomly fluctuate with each point independent of the previous, whereas here the consecutive points are correlated (i.e., a point will be low if previous point was lower if the noise fluctuated low).

To get a better estimate of the quality of the fit, we can imagine taking every other point or trying to compute the point to point variation, by taking the difference between consecutive points, or even points that are a little farther away. The larger the delta-t of our RMS, the less assumptions we are making about our ability to model background noise. 


>#### Follow-up 6.3.2a (ungraded)
>
>Try running the cells below. In the first case, nearest-neighbor data are averaged and the data are fit again. In the second case, the noise is estimated from differences in points that are 2 time-steps away. Do you think these are reasonable attempts to account for the correlated nature of the data?

In [None]:
#>>>RUN

#Computing uncertainty: merging bins

xi_old = xi.copy()
yi_old = yi.copy()
xi_new = np.array([ 0.5*(xi_old[2*i]+xi_old[2*i+1]) for i in range(len(xi_old)//2) ])
yi_new = np.array([ 0.5*(yi_old[2*i]+yi_old[2*i+1]) for i in range(len(yi_old)//2) ])


uncout=noise(xi_new, yi_new, TIME_TRUE, t_before, t_after)
result = fit_once_weighted(xi_new, yi_new, TIME_TRUE, t_before, t_after,1./uncout)
result.plot();
print("unc value: ", uncout)
print("Fit chi2 value: ", result.chisqr)
print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))

In [None]:
#>>>RUN

#Computing uncertainty: points 2 samples away

def noise_deltat(xi, yi, t, t_before, t_after, dt=2):#dt is the size distance of the samples
    data_indices = get_noise_indices(xi, t, t_before, t_after)
    #print(data_indices[0][:-dt],data_indices[0][dt:])
    data_y = yi[data_indices[0][:-dt]]-yi[data_indices[0][dt:]]
    return np.std(data_y)

uncout=noise_deltat(xi_old, yi_old, TIME_TRUE, t_before, t_after)
result = fit_once_weighted(xi_old, yi_old, TIME_TRUE, t_before, t_after,1./uncout)
result.plot();
print("unc value: ", uncout)
print("Fit chi2 value: ", result.chisqr)
print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))

<a name='section_6_4'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P6.4 Sweeping the Time Window</h2>   

| [Top](#section_6_0) | [Previous Section](#section_6_3) | [Problems](#problems_6_4) | [Next Section](#section_6_5) |


<h3>Fitting Using Multiprocessing</h3>

From the above analysis, we see that an uncertainty of 0.18 is more reasonable. This was found for `dt=2`, which should limit the assumptions we are making about the nature of the background noise. Go back and try the follow-up exercises if you have not done so already!

Let's redefine `fit_once` using this uncertainty. Run the code below several times. Does it always find the lowest $\chi^2$ value?

In [None]:
#From the above analysis, we see that an uncertainty of 0.18 is more reasonable.
#This was found for deltat = 2, which should limit that assumptions we are making
#about the nature of the background noise

#Let's try using this uncertainty


def fit_once_new(t, weight=1./0.18):
    data_indices = get_signal_indices(xi, t, t_before, t_after)
    data_x = xi[data_indices]
    data_y = yi[data_indices]
    weights = np.ones(len(data_x))*weight
    model, params = model_and_random_parameters(t)
    result = model.fit(data_y, params, x=data_x,weights=weights)
    return result


result = fit_once_new(TIME_TRUE)
result.plot();
print("Fit chi2 value: ", result.chisqr)
print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))

Now, instead of `fit_once_new`, we'll use a new function called `fit`, which runs `fit_once_new` multiple times and outputs the best (lowest $\chi^2$) result.

To do this, we will use a package you may not have been exposed to yet: `multiprocessing`. The idea is to run these fits simultaneously to make our code run quicker. The way we do this is by using the `pool.map` function in order to make an list of results called `results`.

<a name='problems_6_4'></a>     

| [Top](#section_6_0) | [Restart Section](#section_6_4) | [Next Section](#section_6_5) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.4.1</span>

Consider the code below, where multiprocessing is already implemented. All you have to do is find the best result in the list and return it.

Code the rest of the `fit` function. What is the lowest $\chi^2$ value and corresponding probability? Enter your answer as a list of numbers `[chi2, chi2_prob]` with precision 1e-3.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

def get_min_result(results):
    min_result = None
    min_chisq = None
    #for each result in results, set a new min_result and min_chisq
    #if result.chisqr is less than the currently stored value
    
    #YOUR CODE HERE
    
    return min_result

NUM_FITS = 6

def fit(t, pool):
    results = pool.map(fit_once_new, np.full(NUM_FITS, t))
    min_result = get_min_result(results)
    return min_result

with Pool(6) as pool:   #'6' here refers to the number of mutliprocessing jobs
    result = fit(TIME_TRUE, pool)

result.plot();
print("Fit chi2 value: ", result.chisqr)
print("Fit chi2 probability: ",1-stats.chi2.cdf(result.chisqr,result.nfree))

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.4.2</span>

Next, we want to see what the fits look like for different `t` values. Call `fit` for values of $t \in [-100, 100]$, where the $t$ values are separated by $\Delta t \sim 1 \text{s}$. Store the results in an array named `results`. The multiprocessing parts have been done for you, just run the code (this could take a little while).

How long does it take to do this? (pick the closest answer)

A. .01 seconds

B. 1 second

C. 5 minutes

D. 5 hours (if this is the answer you pick **something is wrong**)

E. 10 days (if this is the answer you pick **something is wrong**)

In [None]:
#>>>RUN
%%time

results = []
delta_t = 1
ts = np.arange(-100, 100, delta_t)

with Pool(NUM_FITS) as pool:
    for t in ts:
        if t % 10 == 0: print(t)
        results.append(fit(t, pool))

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 6.4.3</span>

Now we need to find out for which $t$ value we get a fit that is most likely to be our signal. One way of figuring this out is by looking for which fit has the largest `max_amp` parameters, as the signal will have a higher max amplitude than the surrounding noise.

Plot `max_amp` as a function of $t$ given the `results` you just calculated. Find the value of $t$ which has the largest `max_amp` and plot the corresponding fit result.

Does the fit look like it could be the signal we're looking for? If yes, then enter below at what value of $t$ this was. If not, keep searching through the next highest `max_amp` values till you get something that may be signal and answer that $t$ value below.

In [None]:
#>>>PROBLEM
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

amps = #LIST OF MAXIMUM AMPLITUDES
result_max_amp = #RESULT CORRESPONDING TO MAX AMP

result_max_amp.plot()
print("Time of best fit result: ", ts[np.argmax(amps)])
plt.show()

plt.plot(ts, amps)
plt.xlabel("Time(s)")
plt.ylabel("Wave amplitudes")
plt.show()

>#### Follow-up 6.4.3a (ungraded)
>    
>Try the above exercise, instead sorting by chi-sq values. Does the smallest chi-sq value give you the same t value that you found previously? Why or why not? What other criteria could you use to search for the signal?


>#### Follow-up 6.4.3b (ungraded)
>    
>This took a while to run. But in practice, it's nice to have searches like this run quickly, so that if a wave event is detected, an alert can be sent out to telescopes all over the world and they can look at the correct area of the sky with minimal delay. How could you make this process faster, aside from running it on better hardware?
