# Significance exercise 1

We'll create a dataset with a signal added to it, but the noise will be louder than the data used in earlier lectures.

In [1]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import scipy
import scipy.signal

In [2]:
# Some settings for the data
sample_rate = 32 # 32 samples per second
num_data_samples = 128*sample_rate # 128 seconds worth of data
times = np.arange(num_data_samples) / sample_rate

def make_signal(gaussian_width, chirpiness):
    signal_inst_frequency = 2. + chirpiness*np.sin(2 * np.pi * 0.1 * times)
    phases = [0]
    for i in range(1,len(times)):
        phases.append(phases[-1] + 2 * np.pi * signal_inst_frequency[i] * 1./sample_rate)
    signal = np.sin(phases)
    gaussian = np.exp( - (times - 64)**2 / (2 * gaussian_width))
    signal = gaussian * np.sin(phases)
    return signal[48*sample_rate:80*sample_rate]

# Make the signals
signal_1 = make_signal(10., 0.)
signal_2 = make_signal(1., 1.)
signal_3 = make_signal(8., 4.)
signal_4 = make_signal(10., 1.)

# Make the noise, and add a signal to the noise at an unknown spot
# Set seed so we get the same dataset!
np.random.seed(21)
noise = np.random.normal(size=[num_data_samples])
rndi = np.random.randint(0,sample_rate*96)
data_21 = noise
data_21[rndi:rndi+len(signal_2)] += signal_2*0.9

## Part 1

* Compute the cross-correlation of data_21 with signal_2. Can you clearly see the signal in the noise?
* Create a 128-second long stretch of random noise (in the same way as we did for `data_21`, but now without a signal added). Cross-correlate with signal_2. What is the loudest value that you see?
* Repeat the process 1000 times. How many times is the maximum cross-correlation louder than it was for `data_21`?

Note: The peak of the signal is just barely above the noise. You can change the $0.9$ in the code that generates the data to make the signal a bit stronger. 

You will not get exactly the same numbers, but in my tests I saw 108 examples of noise where the cross-correlation with `signal_2` produced a larger value than for `data_21`. We know that each of those times, there actually was no signal in the data - because we generated the noise ourselves, and we know that we did not add a signal.

So in 108 out of 1000 cases, we got a peak cross-correlation larger than what we had in `data_21`, but no signal was actually present. This means is that if we had received `data_21` from a detector, we would not be totally confident that there was a real signal in the data. We can however make a statistical statement of how confident we are.

We would see a cross-correlation peak equal to or louder than the one in `data_21` in 108 out of 1000 times where there is no signal. We can express the signficance as a 10.8% chance that the signal is just a "false alarm". We can also make this a rate. Each segment of data is 128 seconds long. There are 108 false alarms in 128000 seconds of data, which is one per 1200 seconds, or about 3 per hour.

### Making things faster

It may take a minute or two to calculate 1000 cross-correlations. You may want to spend a few minutes to find out if either numpy or scipy has a correlation function that does the same thing as the cross-correlation code we've written. 

If you can find a matching function, you should test the output to make sure it does the same thing as the code we've written. Make sure you understand what the inputs and outputs of the function are.

Numpy and scipy functions often much faster than code we write. You can run an example with `%timeit` placed in front to see how fast it runs. If you switch to using this for the following cross-correlation calculations, things will go much quicker.

Note: The `scipy` function is fastest, but it gives a different-length result. This is because of the way it handles the edges, where the template runs off the ends of the data. See the plot: