<div style="background-color:#009440; padding: 0px; background-size:cover; background-opacity:50%; border-radius:5px; height:300px">
    <div style="margin: 5px; padding: 10px;">
    <h1 style="color:#00000">Geophysical Data Acquisition and Analysis</h1>
    <h5 style="color:#C0C0C0">LMU, summer 2016</h5>
    <h4 style="color:rgba(0,0,0,0.6)">Stefanie Donner, Céline Hadziioannou, Ceri Nunn</h4>
    </div>
    <div style="float:right; margin: 20px; padding: 20px; background:rgba(255,255,255,0.7); width: 70%; height: 100px">
        <div style="position:relative; top:40%; transform: translateY(-50%)">
        <div style="font-size: x-large; font-weight:900; color:rgba(0,0,0,0.8); line-height:100%">P2.4 - Final exam report</div>
        </div>
    </div>
   
</div>

## Rules + deadline

In the following you find five exercises plus some basic code. Adapt the code as needed to answer the questions and provide your answers in separate markdown cells below the exercise. Please, do not forget to label axes, lines, titles, etc in your plots.  
Make sure that your answers are as elaborate and detailed as necessary to make your answer clear. However, concentrate on the essentials. 

In case you refer to literature/sources outside the course material, do not forget to acknowledge or cite them properly. You are also allowed to include images from outside the notebook if that may help you to explain. In that case, do not forget to provide us with the image files then. This is how you import figures in markdown: 

`<img style="float: left; height: 350px; padding: 10px" src="DATA/figure.jpg"/>`

For help with coding, please, consult the official [Python](http://docs.python.org/) and [ObsPy](http://docs.obspy.org) documentation. For help in formating the markdown cells, you can find help e.g. at the webpage of [wikipedia](https://en.wikipedia.org/wiki/Markdown) or on the [cheat sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). In case of severe problems, please contact us in time.

Submit your final notebook via mail to Stefanie Donner (donner@geophysik.uni-muenchen.de) at the latest 

### August 5th, 23:55 !!!

Notebooks received after that time will not be considered.

*Please, do not forget to execute Cell 0 first!*

<br>
<br>

In [None]:
# Cell 0: Preparation for programming
%pylab inline
from __future__ import print_function
from scipy import interpolate, signal
from time import *
from obspy import *
from obspy.core import read, UTCDateTime
from obspy.clients.fdsn import Client
from obspy.signal.cross_correlation import xcorr_pick_correction
import numpy as np
import matplotlib.pylab as plt
import os
import glob
import wave
import struct
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = 10, 10
plt.rcParams['lines.linewidth'] = 1

______________

## Exercise 1

In this exercise you will work on broadband data from the Mw8.2 Chile earthquake from 1st of April 2014 at 23:46:47 UTC recorded at the black forest station in SW Germany (BFO). The goal of this exercise is to show your understanding of the principal basics in signal processing. Answer the following questions and perform the necessary steps in the corresponding code cells.

a) In cell 1a you fetch the waveforms via the FDSN client of IRIS and remove the instrument characteristic of the data (no further coding necessary here). Explain what "removing the instrument response" means and why it is necessary. Which mathematical method do you connect with this step? What is to consider during instrument response removal? A hint is given by the options used for removing the instrument response. Comment on all of them.

b) Now that you have downloaded the data and removed the instrument response, which further steps do you need to perform for pre-processing? Assume that for your scientific task the highest frequency you want to analyse is 5Hz. Describe the general pre-processing chain and point out the possible pitfalls and how to avoid them. Among others, some steps are high-pass filtering and demean as well as detrend the trace. Why do you do so?

c) Bonus _(Optional, for extra points)_: Try to plot the ray paths for this special earthquake receiver pair. Identify which phases should be theoretically visible in the seismogram. Try to identify them. 

In [None]:
# Cell 1a : getting the waveforms

client = Client("IRIS")
t = UTCDateTime("2014-04-01T23:30:0.0")

st = client.get_waveforms("II", "BFO", "*", "BH?", t, t+(180*60), attach_response=True)
print(st)
st[0].remove_response(output="VEL", pre_filt=None, water_level=30, zero_mean=True, taper=True, taper_fraction=0.05, 
                   plot=False)
st[1].remove_response(output="VEL", pre_filt=None, water_level=30, zero_mean=True, taper=True, taper_fraction=0.05, 
                   plot=False)
st[2].remove_response(output="VEL", pre_filt=None, water_level=30, zero_mean=True, taper=True, taper_fraction=0.05, 
                   plot=True)

In [None]:
# Cell 1b: pre-processing the data

# take a copy of the stream to avoid overwriting the original data
bfo = st.copy()




In [None]:
# Cell 1c - part 1: plot of ray pathes for this EQ-receiver pair

# loading the necessary package as a hint ...
from obspy.taup import TauPyModel




In [None]:
# Cell 1c - part 2: plot seismogram with theoretical travel times

# loading the necessary package as a hint ...
from matplotlib.pyplot import cm




### Answer to exercise 1

...

_____________
## Exercise 2 : Marienplatz Glockenspiel - spectral analysis

In the very top of the tower of the New Town Hall (Rathaus) on Marienplatz, there are 43 bells, which chime different tunes daily at 11am and 12am. The tunes accompany a spectacle of figurines that move and illustrate local stories.  
In this exercise, we will analyze a recording of the Glockenspiel bells. 

You can read more about the Glockenspiel here: https://en.wikipedia.org/wiki/Rathaus-Glockenspiel

You can listen to the first tune of the Glockenspiel recording by playing `data/Glockenspiel_Marienplatz_track1.wav`  


---

<img style="float: left; height: 350px; padding: 10px" src="data/rathaus_munchen.jpg"  />
<img style="float: right; height: 350px; padding: 10px" src="data/glockenspiel_inside.jpg"  />


###### Acknowledgement
_photo Rathaus (left): _ https://commons.wikimedia.org/wiki/File:Altes_Rathaus_und_Mariens%C3%A4ule_in_M%C3%BCnchen.jpg  
_photo inside Glockenspiel (right): Thies Heidecke_  
_We are grateful to the personnell at the Rathaus for providing access to the Glockenspiel tower._  
_We thank Thies Heidecke for his help with the recording. _

In [None]:
# Cell 2a-1 - Read in the .wav file

# read in the file 
dataDir = './data/'
fileName = 'Glockenspiel_Marienplatz_track1.wav'
# if your computer cannot handle the complete tune, use this file instead (only first 15 seconds)
#fileName = 'Glockenspiel_Marienplatz_track1_short.wav'

stream = wave.open(dataDir + fileName,'r')

# get the details about the .wav file
num_channels = stream.getnchannels()
frame_rate = stream.getframerate()     # sampling rate
sample_width = stream.getsampwidth()   
num_frames = stream.getnframes()      # number of points
total_samples = num_frames * num_channels
endtime = float(total_samples) / float(frame_rate)

# read the byte data
raw_data = stream.readframes( num_frames )
stream.close()

# check the type of audio track
if sample_width == 1: 
    fmt = "%iB" % total_samples # read unsigned chars
elif sample_width == 2:
    fmt = "%ih" % total_samples # read signed 2 byte shorts
else:
    raise ValueError("Only supports 8 and 16 bit audio formats.")

# unpack the byte data to integers
integer_data = struct.unpack(fmt, raw_data)

# Keep memory tidy
del raw_data

# set up the channel
channels = [ [] for time in range(num_channels) ]

# read the integers to channels
for index, value in enumerate(integer_data):
    bucket = index % num_channels    
    channels[bucket].append(value)
    
# signal and timeseries arrays:
gsignal = channels[0]
time = np.linspace(0,endtime,total_samples)  # in seconds

print('Number of samples in the signal:', len(gsignal))    

In [None]:
# Cell 2a-2 - read .wav into obspy Stream object

st = read(dataDir + fileName)
print(st)
st.plot()

# just the signal array and time vector
gsignal_st = st[0].data
time_st = st[0].times()

In the previous two cells, the glockenspiel tune has been read in two different ways:

+ `gsignal` is a numpy array with the signal values. Associated time vector is 'time'; sampling rate is in the variables 'frame_rate'
+ `st` is an obspy stream object with the same glockenspiel tune. You can manipulate it in the same way as other stream objects in previous practicals. 

Both contain the same signal, so you can use whichever one is easiest in the following exercise. 

### Questions - part 1

**Note:** if the signal is too large for your computer to handle, use the alternative, shorter signal which contains only the first 15 seconds of the tune. See Cell 2a, where the filename is defined. 

Create several subplots: plot the signal on top, and below that create two subplots: in the first, plot the spectrogram using a window length of 256 samples. 
In the second, plot the spectrogram with a window length of 4096 samples. In both, use an overlap of 50. 

a) Explain what a spectrogram (in general) represents. Through which mathematical operation is the y-axis (frequency axis) created? What controls the value of the upper limit of the y-axis?

b) Compare the spectrograms with `NFFT=256` and `NFFT=4096`. What difference between the two do you notice? Think in terms of time and frequency resolution. Explain why this happens.  
Remember you can zoom in to specific parts of the signal by manipulating the plot limits 
(`plt.xlim((value1, value2))` or `plt.ylim((value1, value2))`) 

c) On both spectrograms, zoom in to frequency around 6780 Hz using `plt.ylim((value1, value2))` . This frequency corresponds to one of the bells. Isolate the signal of this bell as much as you can using a filter.  
Go to Cell 2c. In a new plot, plot the original (unfiltered) signal and the filtered signal on top of each other with different colors. How often does the 6780Hz bell ring in the first 10 seconds of the song? (You may want to trim the signal to the first 10 seconds first). 

d) If you plot a spectrogram of the filtered signal, you will see that the energy of the signal outside your filter band is not exactly zero. What is this effect called?  
Explain how you could achieve a narrower filter (you do not need to execute it for this signal, just describe). What kind of tradeoff will you encounter as you use a narrower filter window? 



In [None]:
# Cell 2b - compare spectograms





### Answer to exercise 2 - part 1/1

a) ...


b) ...

In [None]:
# Cell 2c - isolate single bell

f_bell = 6780
stb = st.copy()





### Answer to exercise 2 - part 1/2

a) ...


b) ...

### Questions - part 2

In the following Cell 2d, we consider part of the spectral content of a single bell chime. The spectrum is calculated for several different lengths of time window:  
1. the complete bell chime (approx. 0.8 seconds)
2. a short time window (500 samples)
3. a longer time window (3000 samples)
4. the longer time window, with the rest of the signal set to zero (so 3000 samples + zero padding to complete signal length) (**Note:** in the first plot, the blue signal is offset downwards by 500 points. This is just to make the difference between green and blue line more visible. ) 

Explain the difference between the spectra obtained with the different time window lengths. 
In particular, consider why the peak around f=3750Hz does not show up when using the shortest time window (red line). Why is the blue spectrum smoother than the green one? 



In [None]:
# Cell 2d - single bell chime

NFFT = 256

# time limits to trim signal to, in seconds 
ts1 = st[0].stats.starttime + 9.1
ts2 = ts1 + 0.8

# subwindow lengths, in samples
lwin1 = 500
lwin2 = 3000

# time limits for subwindow in signal, in samples
lim1 = 17000
lim2 = lim1 + lwin1
lim3 = lim1 + lwin2

stsh = st.copy()
stsh.trim(ts1, ts2)
print(stsh)

stz = st.copy()
stz.trim(ts1, ts2)
# set signal outside subwindow to zero
stz[0].data[0:lim1] = 0
stz[0].data[lim3:] = 0

# calculate spectrum for different window lengths:
# whole signal (black)
f, Pxx_den = signal.periodogram(stsh[0].data, stsh[0].stats.sampling_rate)
# short timew window (red)
f2, Pxx_den2 = signal.periodogram(stsh[0].data[lim1:lim2], stsh[0].stats.sampling_rate)
# longer time window (green)
f3, Pxx_den3 = signal.periodogram(stz[0].data[lim1:lim3], stz[0].stats.sampling_rate)
# longer time window, with zero padding around it (blue)
f4, Pxx_den4 = signal.periodogram(stz[0].data, stz[0].stats.sampling_rate)

# plot signal, spectrogram and spectrum (zoomed in to 2 peaks, as indicated by black box in spectrogram)
plt.figure(figsize=(15,12))

# time series
ax1 = plt.subplot(311)
plt.plot(stsh[0].times(), stsh[0].data, 'k', label='whole signal')
plt.plot(stz[0].times(), stz[0].data - 500, 'b', label='longer timewindow, zero padded')
plt.plot(stz[0].times()[lim1:lim3], stz[0].data[lim1:lim3], 'g', label='longer timewindow')
plt.plot(stsh[0].times()[lim1:lim2], stsh[0].data[lim1:lim2], 'r', label='short timewindow')
plt.legend()
plt.xlabel('time [s]')
plt.title('Glockenspiel, different subwindows')

plt.subplot(312, sharex=ax1)
plt.title('spectrogram, window length %s pts' % NFFT)
# Pxx is the segments x freqs array of instantaneous power, freqs is
# the frequency vector, bins are the centers of the time bins in which
# the power is computed, and im is the matplotlib.image.AxesImage instance
Pxx, freqs, bins, im = plt.specgram(stsh[0].data, NFFT=NFFT, Fs=frame_rate, noverlap=noverlap,
                                cmap=cm.jet,sides='onesided')
#draw box
plt.plot((lim1/frame_rate, (lim2+2000)/frame_rate), (3000, 3000), 'k-', lw=2)
plt.plot((lim1/frame_rate, (lim2+2000)/frame_rate), (4000, 4000), 'k-', lw=2)
plt.plot((lim1/frame_rate, lim1/frame_rate), (3000, 4000), 'k-', lw=2)
plt.plot(((lim2+2000)/frame_rate, (lim2+2000)/frame_rate), (3000, 4000), 'k-', lw=2)

#plt.ylim((0,5000))
plt.ylabel('frequency [Hz]')
plt.xlabel('time [s]')

# show spectra calculated with different timewindows
plt.subplot(313)
plt.plot(f, Pxx_den, 'k', label='whole signal')
plt.plot(f2, Pxx_den2, 'r', label='short timewindow')
plt.plot(f3, Pxx_den3, 'g', label='longer timewindow')
plt.plot(f4, Pxx_den4, 'b', label='longer timewindow, zero padded')
plt.legend(loc='upper left')
plt.xlabel('frequency [Hz]')
plt.title("spectrum, zoom on box")
plt.xlim((3000,4000))
plt.ylim((0,100))
# prevent subplots overlapping
plt.tight_layout() 
plt.show()

### Answer to exercise 2 - part 2

...

_______________________

## Exercise 3

This question is about convolution of signals. You are provided with two signals. 

a) What is a convolution? Convolution is closely connected to a special kind of system. Which one? Describe the connection. <br> 
Using [signal.convolve](http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve.html) convolve the two signals in Cell 3a. Set the mode='same'. Explain the meaning of the mode parameter. <br>
Plot the original signals and the convolved signal. Be careful to plot the entire signal on both the x and y axes. Include labels.  

b) In cell 3b, replace one of signals with a new signal that has a different shape. Convolve the signals and plot them again. Are convolutions commutative? 

c) In cell 3c, convolve the signals win and sig in the reverse order, continuing to use mode='same'. Replot. <br>
Are the results the same or different from cell 2a? Explain this result. Explain any inconsistencies.

d) Bonus _(optional for extra points)_: Amend the signals so that convolving them in the reverse order gives the same result as convolving them in original order. 

In [None]:
#Cell 3a - convolving two signals + plotting

# make a box car function
sig = np.repeat([0., 1., 0.], 100)
# make a Hann window 
win = signal.hann(50)

# convolve the signals




In [None]:
# Cell 3b - change one signal, convolve both signals + plotting




In [None]:
# Cell 3c - convolution in reverse order + plotting

# make a box car function
sig = np.repeat([0., 1., 0.], 100)
# make a Hann window 
win = signal.hann(50)




In [None]:
# Cell 3d - amend signals and convolve again (bonus)






### Answer to exercise 3

...

__________________

## Exercise 4


In Cell 4a theoretical gravity data, modeled for the ringlaser location in Wettzell, are loaded. The data show a superposition of tidal effects due to different celestial bodies. Here is a short overview of the most important ones, sorted according the amplitude of their influence on Earth:
+ tides with a period of half a day from sun, moon, Mars, Jupiter, etc. ..., period: 0.5 day 
+ tides with a period of one day from sun, moon, Mars, Jupiter, etc. ..., period: 1 days
+ cycle of the orbit of the moon, period: 28 days
+ equinox - sun and moon passing the equator plane, period: 186 days (about every six months)
+ Chandler wobble (deviation of Earth's axis of rotation relative to the solid Earth), period: 433 days
+ effects due to further planets ....


a) In Cell 4b, calculate the spectrum of the data and plot it (as a log-log plot). Try to identify the different tidal effects and name the frequencies of their peak positions. Why is the peak for the Chandler wobble not really visible in the spectrogram? 
Hint: To calculate the spectrum, you can use the function [periodogram](http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.signal.periodogram.html) of the scipy.signal package.

b) In Cell 4c, try to separate the tides with periods of half and full day from the other tidal effects by filtering. Which filter do you choose and why? Plot the filtered signal for about the first 28 days.  
Explain what happens behind the screens when applying a filter to data (in mathematical sense). 

c) Now, isolate the half day and full day tides separately from each other and from the rest of the data. Which filter do you choose this time and why? Plot the filtered signal for about the first 25 days on top of the filtered data from exercise 1b.  
How can a filter be defined, in terms of responses? There are mainly three different keywords. Describe them. Why do we need three instead of only one definition?


In [None]:
# Cell 4a - reading and plotting gravity data

# read in gravity data, modeled for Wettzell, Germany. Units is in nm/s^2 vertical acceleration. 
filename = 'data/gravity.dat'

# prepare to input data into obspy Stream
data = np.loadtxt(filename, dtype='float32', comments="#")
stats = {'network': 'XX', 'station': 'WETZ', 'location': '',
         'channel': 'XZ', 'npts': len(data), 'delta': 3600}

stats['starttime'] = UTCDateTime("2015,01,01,00,00,00")
s = Stream([Trace(data=data, header=stats)])

# write as MSEED file
s.write("gravity.mseed", format='MSEED')

# test that it worked, read stream from file and plot
st = read("gravity.mseed")
print(st)
print(st[0].stats)
#st.plot()

plt.plot(st[0].times()/(3600*24), st[0].data, color='b')
plt.xlim(0,365)
plt.xlabel("time [days]")
plt.ylabel("gravity [m/s^2]")
plt.show()

# zoom into first 35 days
plt.plot(st[0].times()[0:800]/(3600*24), st[0].data[0:800], color='b')
plt.xlabel("time [days]")
plt.ylabel("gravity [m/s^2]")
plt.show()

In [None]:
# Cell 4b - calculate spectrum





In [None]:
# Cell 4c - filtering the data




___________________

## Exercise 5

In this question we are going to use a cross correlation technique to make a differential pick time. You are provided with two signals in the data directory: 

`data/seismogram_1.MSEED` <br>
`data/seismogram_2.MSEED`

seismogram_2.MSEED is noisy, and arrives later than seismogram_1.MSEED. We will use a cross-correlation with the better seismogram to make a more accurate pick of the arrival time on the noisy seismogram. 

For this question you should use the obspy function xcorr_pick_correction(). This is well documentated, and has good default plotting options. 

You are given these initial pick times. <br>
t1 = UTCDateTime(0.335)<br>
t2 = UTCDateTime(0.55)

a) Read in the seismograms in Cell 5a. Use the function [xcorr_pick_correction](https://docs.obspy.org/packages/autogen/obspy.signal.cross_correlation.xcorr_pick_correction.html) to create a cross correlation. There is no need to filter the seismograms.  <br>
Plot the cross correlation. Display the Time correlation and Correlation coefficient.

b) What was the length of the time window you cross-correlated over? Why is this a good choice?  

c) Calculate the corrected differential pick time in Cell 5b. This is the time lag between the first arrival on seismogram 1 and on seismogram 2.

d) Write a short paragraph on what has been done here, and why it could be useful. 


In [None]:
# Cell 5a - crosscorrelation





In [None]:
# Cell 5b - differential pick times




________________

## Final bonus question

Between sound, tidal, and seismic/seismological data, what do you prefer to work on considering computational effort? Explain why.

### Answer to bonus question

...