***NOTE:*** This assumes that [RISE](https://github.com/damianavila/RISE) is installed.  If so, just click the bar chart in the toolbar above, and the presentation should begin.  Note that in edit mode, you can easily change some slide types of cells below, with

 * `shift-i` : toggle slide
 * `shift-b` : toggle subslide
 * `shift-g` : toggle fragment


Once the slideshow is running, the comma key "," will show or hide the buttons for you to press to close the slideshow, get help, or navigate.  You can still navigate using the arrow keys, and so on.

1. Open a command prompt (aka command line, terminal, shell, powershell)

2. If you haven't already, run
```bash
conda env create moble/gw_matched_filtering_demo
```

3. Once you've done that, run
```bash
conda activate gw_matched_filtering_demo
```

4. Leave the command prompt open for now

# Gravitational-wave astronomy with real data

## Matched filtering

# Introduction

  - Gravitational waves (GWs) are like sound waves, except...
    - GW medium is spacetime itself (no matter needed)
    - Transverse rather than longitudinal

  - LIGO is an enormous microphone
    - We could actually hear (very loud) GWs with our own ears
    - We can (and will!) hear LIGO data

  - But there’s a lot of noise
    - Earthquakes, storms, logging, traffic, shotguns

  - We need a good way of digging signal out of the noise

# Outline

Overtly:
  - Sounds of gravitational waves
  - Sounds of LIGO
  - Fourier transforms (FFTs)
  - Matched filtering

Covertly:
  - Data analysis
  - Python
  - Jupyter notebook
  - Github

The ostensible purpose of this talk is to introduce you to matched filtering, which is the basic method that GW detectors use in searching for and measuring GW signals.  But that's a pretty narrow purpose, and most of you will not get involved in GWs.  So I want to also give you some exposure to a few other ideas that hopefully will have more broad application to all of you when you go into other fields.  And we'll use matched-filtering as a way into those other ideas.

So ostensibly, the outline of this activity starts off with introducing you to the sounds of GWs.  I'll make this analogy that LIGO is just an extraordinary microphone, and we'll listen to the sounds a GW makes, and the sounds of the LIGO instrument itself.  Then, we'll see that FFTs are a really powerful way of analyzing these sounds, and matched filtering is a really sensitive way of measuring those FFTs.

But of course, while we're doing that, I also want to give you a little flavor of data analysis.  Pretty much all of you either are working on or will work on data analysis at some point, and there are some very general rules and ideas that can be applied to basically any type of data analysis.  So I'll want to use this stuff as a sort of analogy for other types of data analysis, so hopefully you can apply these principles to your own work.

Whenever you do any data analysis, you'll probably be writing code to do it.  As scientists especially, that code should be open-source because that's crucial to reproducibility, and so that your code can be used by others to build on your work — which is good for science, and good for you personally.  But when working on open-source code, the de facto meeting place is github, which is home to most of the major open-source scientific projects (among many others).  In fact, many employers look at your github presence as if it were part of your resume — as evidence of your ability to write code in various languages, and interact well with others.  You can start out just by making an account, and downloading a package that you want to use.  Then you can open issues (bug reports) if you find any problems or have any questsion.  As you get experience, you can fork other projects and create pull requests for fixes or new features you add to other people's code, and even create your own projects.  One easy way to get started with pull requests is just to improve the documentation for some project that you've used.

On a more immediate level, I also hope to give you the impression that python is a useful language for fast prototyping, and investigating your data, and nothing enables that interactivity better than Jupyter.

# Jupyter notebooks

<br /><br />

  - Run a live session of python (or basically any other language)
  - Manipulate files, write code, interact with data, make plots, take notes, give presentations, ...


  - You don't need to know python
  - Put cursor in grey boxes and hit Shift-Enter

So first, I just want to introduce how we're working here.  Who here has used python before?

Python is really dynamic, and powerful, but also a lot simpler than most other languages.  It's not always the fastest at any computation, but since most of your time is spent writing programs (rather than running them), that's not usually a big problem.  And new developments are making python just as fast as even C/C++ in a lot of cases.

Now, we throw in the Jupyter notebook.  Who here has used Mathematica before?

Well the Jupyter notebook looks and acts like a nice version of Mathematica.  The notebook is connected to a live session of python.  It has these code cells that you run, and you can see the results.  So click on the first cell, and hit Shift+Enter.

Mathematica is better at symbolic math (for now).  But otherwise, python is more useful and general.  And the Jupyter notebook makes it better at interactive stuff.  So here's my unsolicited advice: if you're deciding what programming language to learn, go with python.  There are nerdier options out there, but not many more broadly useful options.  And if you're using python interactively, you'll want to us Jupyter (which is just a different interface) or -- better yet -- the Jupyter notebook.

# Basic idea of matched filtering

  * We have some raw data $d(t)$
  * Does it agree with some possible signal $s(t)$?
  

Take the product: $d(t)\, s(t)$

  * When $d$ and $s$ agree, their product is large and positive
  * When $d$ and $s$ disagree, their product is small or negative

Integrate over all times:
  \begin{equation*}
    \text{"Overlap"} = \int_{-\infty}^{\infty} d(t) \, s(t)\, dt
  \end{equation*}

This is like constructive and destructive interference.  (Normally, we think about the *sum* of waves producing interference; a *product* like this comes up if you look at changes in the "energy" of the combined wave.)  If $d$ and $s$ are in phase, they add up to a big result; if they are out of phase, they cancel out.

\begin{equation*}
    \text{Overlap} = \sum_{t_i} d(t_i) \, s(t_i)\, \Delta t
\end{equation*}

# Improvements

<br /><br />

  * Not sure when $s$ should arrive.  Can we compute overlap with time offsets?
  * This is slow.  Can we do it faster?
  * There's a lot of noise.  Can we reduce its effect?

Yes, if we use Fourier transforms!

# (Discrete) Fourier transforms

<br /><br />

$$
s(t) = \sum_{i} \left[ \tilde{s}_i\, \sin (2\,\pi\,f_i\,t + \phi_i) \right]
$$

Discrete frequencies: $f_i$

FT amplitude: $\tilde{s}_i$

FT phase: $\phi_i$

# Equivalence by Parseval's theorem

<br /><br /><br />

  \begin{equation*}
    \sum_{t_i} d(t_i) \, s(t_i)\, \Delta t
    \quad = \quad
    \sum_{i} \tilde{d}_i \, \tilde{s}_i\, \cos \left( \phi_{d,i} - \phi_{s,i} \right) \Delta f
  \end{equation*}

# Time offsets

<br /><br />

Shift $s(t)$ by some $\delta t$:

<br />

\begin{align*}
    \text{Overlap}(\delta t)
    \quad &= \quad
    \sum_{t_i} d(t_i) \, s(t_i - \delta t)\, \Delta t
    \\
    \quad &= \quad
    \sum_{i} \tilde{d}_i \, \tilde{s}_i\, \cos \left( \phi_{d,i} - \phi_{s,i} - 2\,\pi\,f_i\,\delta t \right) \Delta f
  \end{align*}

So far, there's no reason to prefer doing this calculation as a function of time or of frequency.  Naively, if we look at these formulas, the sum over frequencies looks like it would be slower because we have to do two FTs in the first place just to get the data.

The real problem is that we don't just care about the overlap for a particular time offset, we care about the *best* overlap for any time offset.

# Best time offset

<br />

  \begin{equation*}
    \text{"Match"} = 
    \max_{\delta t} \left[ \text{Overlap}(\delta t) \right]
  \end{equation*}

<br />

  \begin{align*}
    \text{Match}
    \quad &= \quad
    \max_{\delta t} \left[ \sum_{t_i} d(t_i) \, s(t_i - \delta t)\, \Delta t \right]
    \\
    \quad &= \quad
    \max_{\delta t} \left[ \sum_{i} \tilde{d}_i \, \tilde{s}_i\, \cos \left( \phi_{d,i} - \phi_{s,i} - 2\,\pi\,f_i\,\delta t \right) \Delta f \right]
  \end{align*}

So it looks like we'll have to do one of these sums for all the different values of $\delta t$, and pick which one is the best.  But this is where we find the big difference.  If you look at the sum over frequencies, it turns out that that's actually an "inverse" Fourier transform, which only has to be done once to get *every* value for every $\delta t$, and can be done *extremely* efficiently.

# Best time offset

<br /> <br /> <br /> <br />

  \begin{equation*}
    \text{Match}
    =\max \left[ \mathrm{IFT}\left\{ \tilde{d}\, \tilde{s} \right\} \right]
  \end{equation*}
  
<br /> <br />

This is a bit simplistic, but the basic idea is that we can just do 3 Fourier transforms (one an inverse), and get the answer.

# Making it *fast*

<br /><br />

The Fast Fourier Transform (FFT) makes this *tens of thousands* of times faster

The FFT is one of the more remarkable algorithms in all of computing, because it can make things *so* much faster.

So now we've accomplished our first two improvements: including time offsets and making it faster.  Now, we have to deal with all that noise.

# Handling noise

<img src="files/70sEqualizer.jpg" width="1200px">

# Handling noise

<br /> <br /> <br /> <br />

  \begin{equation*}
    \text{Match}
    =\max \left[ \mathrm{IFT}\left\{ \tilde{d}\, \tilde{s} \right\} \right]
    \longrightarrow\max \left[ \mathrm{IFT}\left\{ \frac{\tilde{d}\, \tilde{s}} {\tilde{n}^2} \right\} \right]
  \end{equation*}
  
<br /> <br />

Fourier series are vectors; Fourier space is a vector space.

We've provided the vector space with a "dot product" (which makes it into a Hilbert space).

The dot product accounts for different amounts of noise in the different vector components.

Matched filtering is taking a signal vector and measuring its projection along a template vector.

# Conclusions

Matched filtering:

  - Current GW detectors are like giant microphones
  - There's lots of noise
  - So we filter the data and test for signals

# Conclusions

Data analysis:

  - FFTs are great for time series (any periodic signal)
  - Python and Jupyter notebooks are really useful
  - Look at your data in as many ways as possible
  - Don't blindly trust hand-me-down algorithms
  - Don't blindly trust your results
    - Think about whether they make sense
    - Understand all the features
    - Things you don't understand may lead to discovery

# Notebook outline

2. Listening to gravitational waves
  - What does a GW sound (and look) like?
3. Listening to detector data
  - Our data has a *lot* of noise
4. Digging signal out of noise manually
  - Manipulate the signal to hear the parts that *you think* matter
5. Digging signal out of noise automatically
  - Manipulate the signal to eliminate the parts that are always there
6. Digging signal out of noise with a model waveform
  - Manipulate the signal to find what you want to find...
  - Time-domain stuff here!
7. Speeding up the process for LIGO searches
  - How LIGO really does it
8. More detections!
  - Try your hand at data analysis with newer detections

# github.com/moble/MatchedFiltering

Now, run
```bash
gw_matched_filtering_demo
```

<br />

***NOTE:*** Be careful when listening to any sounds.  Many of them get very loud very suddenly.  Be prepared to move your headphones away from your head or reduce the volume very quickly.

python $Anaconda_Home\envs\gw_matched_filtering_demo\Scripts\gw_matched_filtering_demo