# Sound notebook
This notebook takes you through some of the basic aspects when analyzing sound. Before we start, we need to make sure, you have the correct python packages. We assume you have anaconda installed. If not, follow the instructions [here](https://www.anaconda.com/distribution/).

### Importing and installing packages
Beneath are all the packages we need for this exercise. If you get an error looking like:

**ModuleNotFoundError                       Traceback (most recent call last)**
**<ipython-input-8-9625e36b3edc> in <module>()**
**----> 1 import pyaudio**
**      2 import wave**

**ModuleNotFoundError: No module named 'pyaudio'**

this is because you don't have the package *pyaudio* on your computer, and you need to run *pip install pyaudio* in your commando prompt (windows) or terminal (macOS). If you are on linux and have trouble with accessing your microphone device, either sit together with a windows/mac person or use the "sofa.wav" that comes with this .zip file and skip to the second step in Exercise 1 **Plot the time series**. When you have installed the missing packages, you should be able to import them all:

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt
import pyaudio
import wave
import numpy as np
from scipy.io.wavfile import read
from scipy.signal import butter, filtfilt, freqz
from IPython.display import Audio

# Session 1
## Exercise 0: Check that you can record and save wave files from your laptop microphone
The pyaudio package lets you record sound from your laptop microphone. We can define some of the properties, such as recording duration, output file name, etc. This is done in the following:

In [None]:
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "test_file.wav"
SAMPLING_FREQUENCY = 44100

- Discuss choice of sampling frequency.
- What is the highest frequency that can be represented this a sampling frequency?

Now, test that your recording device works, by executing the code beneath (don't worry about the code itself). Remember to check that your microphone is **on**. While it is running, say something to make sure it captures your beatiful voice :-) After it is done, a file names "test_file.wav" should appear in the same folder as you have located this notebook. Check that you have this file when finished!

In [None]:
def record_audio(WAVE_OUTPUT_FILENAME, RECORD_SECONDS, SAMPLING_FREQUENCY):
    audio = pyaudio.PyAudio()
    CHUNK=1024  #size of analysis frame 
    RATE=SAMPLING_FREQUENCY 
    CHANNELS=2  # stereo
    FORMAT=pyaudio.paInt16
    stream = audio.open(format=pyaudio.paInt16, channels=2,
                    rate=SAMPLING_FREQUENCY, input=True,
                    frames_per_buffer=CHUNK)
    print("recording...")
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print("finished recording")
    stream.stop_stream()
    stream.close()
    audio.terminate()
    waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    waveFile.setnchannels(CHANNELS)
    waveFile.setsampwidth(audio.get_sample_size(FORMAT))
    waveFile.setframerate(RATE)
    waveFile.writeframes(b''.join(frames))
    waveFile.close()
    
#record_audio(WAVE_OUTPUT_FILENAME, RECORD_SECONDS, SAMPLING_FREQUENCY)

## Exercise 1: Record yourself
Record yourself or someone around you to say the english single letter words in sequence **s o f a**, i.e., pronounced as "s$\cdot$o$\cdot$f$\cdot$a". The idea is that you say the letters explicitly and slowly. Afterwards, go to the created file (which we shall call "sofa.wav") and make sure you can clearly hear what is being said and that it is not too loud. We have allocated 6 seconds to say "s$\cdot$o$\cdot$f$\cdot$a", but you can increase this if necessary by changing RECORD_SECONDS.

In [None]:
WAVE_OUTPUT_FILENAME = "sofa.wav"
RECORD_SECONDS = 6
record_audio(WAVE_OUTPUT_FILENAME,RECORD_SECONDS, SAMPLING_FREQUENCY)

### Plot the time series
We can now plot the time signal that has been recorded. You should be able to zoom in/out on the signal to explore its nature. 

- Recognize and describe the components of the individual sounds recorded. 
- Discuss the duration and the number of samples in relation to the sampling frequency and duration of the recording?

In [None]:
signal = read("sofa.wav")
signal = np.array(signal[1][:,0],dtype=float)
fig = plt.figure(1)
plt.plot(signal)
plt.xlabel('Sample');
plt.ylabel('Amplitude');
plt.tight_layout();

### Plot the time series and spectrogram
Let's plot the time signal along with the spectrogram (frequency content over time)

We will use optimized estimation of spectrograms for the experiments.  The operation is similar to projecting on periodic function basis vectors, but optimized for speed and performance. In particular it uses optimized methods for estimation of frequency content in overlapping windows (https://www.dspguide.com/ch9/1.htm).

- Discuss the information conveyed by the spectrogram? 
- For which time windows and frequencies is the spectrogram estimated? What are the (matrix) dimensions of spectrogram?
Compare with the size of the window and the overlap
- Recognize and described the sound components (phonemes) appearing in the spectrogram of the recording "s$\cdot$o$\cdot$f$\cdot$a"? 

In [None]:
fig = plt.figure(2)
plt.subplot(2, 1, 1)
plt.plot(signal);
plt.xlabel('Sample');
plt.ylabel('Amplitude');
plt.subplot(2, 1, 2)
plt.specgram(signal,NFFT=256,noverlap=128, Fs=SAMPLING_FREQUENCY, cmap='viridis');
# NFFT:  size of window in samples. noverlap: size of overlap between windows in samples. 
plt.xlabel('Time [sec]');
plt.ylabel('Frequency [Hz]');
plt.tight_layout()

## Exercise 2: (Frequency) Filtering
Next we are going to filter signals to enhance certain frequency content. Typical applications involve  low-pass, high-pass, band-stop and band-pass filtering, where a "band" refers to a certain frequency range. Low-pass filter can reduce noise, high-pass filters can eliminate slowly varying "trends" in signals. Band-stop filters can be used for eliminating certain unwanted frequencies such as generated by a 50Hz power outlet. Band-pass filters can be used to detect specific frequencies, such as the $\alpha-$rhytm in brain waves. Our experiments in filtering will be based on the own sound recordings. 

We will use professional grade filters in the experiments, the operation is similar to projecting on periodic functoin basis vectors, but optimized for speed and performance. In particular we will use the so-called butterworth filters (see https://www.dspguide.com/ch20/1.htm and https://en.wikipedia.org/wiki/Butterworth_filter for additional references)that have been optimized for sensitivity to the pass frequencies, while effectively rejecting the stop frequencies.

#### Low pass filtering: Let the $low$ frequencies pass!
For low pass filtering, we set a "cutoff frequency", which defines the boundary between the frequencies being filtered away and the frequencies being kept in the signal. We use a butterworth filter of order 8. Start with a $cutoff=1000$ Hz and look at the corresponding spectrogram. Change the cutoff and discuss the changes to the spectrogram. 


In [None]:
def butter_lowpass(cutoff, fs, order=5):
    nyq = 0.5 * fs
    normal_cutoff = cutoff / nyq
    b, a = butter(order, normal_cutoff, btype='low', analog=False)
    return b, a
def butter_lowpass_filter(data, cutoff, fs, order=5):
    b, a = butter_lowpass(cutoff, fs, order=order)
    y = filtfilt(b, a, data)
    return y

cutoff = 1000  
signal_lp_filt = butter_lowpass_filter(signal, cutoff, SAMPLING_FREQUENCY, 8)

fig = plt.figure(4)
plt.subplot(2, 1, 1)
plt.specgram(signal, Fs=SAMPLING_FREQUENCY);
plt.title("Before low pass")
plt.xlabel('Time [sec]');
plt.ylabel('Frequency [Hz]');
plt.subplot(2, 1, 2)
plt.specgram(signal_lp_filt, Fs=SAMPLING_FREQUENCY);
plt.title("After low pass")
plt.xlabel('Time [sec]');
plt.ylabel('Frequency [Hz]');
plt.tight_layout()

#### High pass filtering: Let the $high$ frequencies pass!
For high pass filtering, we set --- as before --- a cutoff frequency, which defines our boundary between the frequencies being filtered away and the ones kept. Again, we use a butterworth filter of order 8. Start with a $cutoff=2500$ Hz and look at the corresponding spectrogram. Change the cutoff and see how this changes the spectrogram. 

- Can you explain what is going on in the spectrogram when changing the cutoff frequency?
- How is this different from the low pass filter?

In [None]:
def butter_highpass(cutoff, fs, order=5):
    nyq = 0.5 * fs
    normal_cutoff = cutoff / nyq
    b, a = butter(order, normal_cutoff, btype='high', analog=False)
    return b, a
def butter_highpass_filter(data, cutoff, fs, order=5):
    b, a = butter_highpass(cutoff, fs, order=order)
    y = filtfilt(b, a, data)
    return y

cutoff = 2500
signal_hp_filt = butter_highpass_filter(signal, cutoff, SAMPLING_FREQUENCY, 8)

fig = plt.figure(5)
plt.subplot(2, 1, 1)
plt.specgram(signal, Fs=SAMPLING_FREQUENCY);
plt.title("Before high pass")
plt.xlabel('Time [sec]');
plt.ylabel('Frequency [Hz]');
plt.subplot(2, 1, 2)
plt.specgram(signal_hp_filt, Fs=SAMPLING_FREQUENCY);
plt.title("After high pass")
plt.xlabel('Time [sec]');
plt.ylabel('Frequency [Hz]');
plt.tight_layout()

### Play the filtered signals
Let us hear the low passed signal and the high passed signal, respectively. Run the 3 next code blocks. 

- Recognize the filtered versions of signal  and explain the differences to the original recorded signal.

In [None]:
lp_cutoff = 1000  
signal_lp_filt = butter_lowpass_filter(signal, lp_cutoff, SAMPLING_FREQUENCY, 8)
hp_cutoff = 2500
signal_hp_filt = butter_highpass_filter(signal, hp_cutoff, SAMPLING_FREQUENCY, 8)

In [None]:
Audio(signal_lp_filt, rate=SAMPLING_FREQUENCY)

In [None]:
Audio(signal_hp_filt, rate=SAMPLING_FREQUENCY)

# Session 2
## Exercise 3: Extract each letter
Go to Fig. 1 (or the time series in Fig. 2) and find the start-sample and end-sample for each letter by dragging the mouse over the plot. The x-value is displayed in the bottom right corner. Remember(!) x is **samples** and these are only whole numbers (no decimals). Store the 8 values in the variables below.

In [None]:
start_s,end_s = 57000,93910  
start_o,end_o = 103871,135510  
start_f,end_f = 147814,181211  
start_a,end_a = 190000,219295  

sound_s = signal[start_s:end_s]
sound_o = signal[start_o:end_o]
sound_f = signal[start_f:end_f]
sound_a = signal[start_a:end_a]

### Plot each letter
Lets plot the time series for each letter. 
- Recognize the sounds making up  these single letter words. 
- Discuss the frequency content of the two components of the  $s$-sound. 

In [None]:
fig = plt.figure(3)
plt.subplot(4, 1, 1)
plt.title("S")
plt.plot(sound_s);
plt.subplot(4, 1, 2)
plt.title("o")
plt.plot(sound_o);
plt.subplot(4, 1, 3)
plt.title("f")
plt.plot(sound_f);
plt.subplot(4, 1, 4)
plt.title("a")
plt.plot(sound_a);

plt.xlabel('Samples');
plt.tight_layout()

### Play each sound
Play the sounds to check you successfully separated them.

In [None]:
Audio(sound_s, rate=SAMPLING_FREQUENCY)

In [None]:
Audio(sound_o, rate=SAMPLING_FREQUENCY)

In [None]:
Audio(sound_f, rate=SAMPLING_FREQUENCY)

In [None]:
Audio(sound_a, rate=SAMPLING_FREQUENCY)

## Exercise 4: Find your fundamental frequency (grundtone)
Speech is a sequence of voice and unvoiced sounds, produced by the  speech organs. The voiced sounds are produced by a vibrating vocal cord, hence, presents both a fundamental frequency and high harmonics. The fundamental frequency is also referred to as the pitch.  What creates each specific sound is the combination of harmonics the power disribution among them. This could also be seen from the spectrograms of the different sounds in "sofa"; each letter has a special frequency composition and when put together we can make words and sentences. However, imagine you recorded a deep voice saying "sofa" and a high voice saying "sofa"; they say have the same semantics yet, sound very different. 

From the time series of your own data, we will next measure your pitch in Hertz (Hertz is the number of cycles pr. second). Estimate your pitch frequency by visual inspection of the time series in Fig. 3. Zoom in on on a part of the signal. Estimate the uncertainty of the measured frequency, possibly by repeting the measurement.

In [None]:
est_pitch = [] # put your answer here in Hz
# Solution
#print(str(est_pitch) + " Hz")

The pitch is highly variable in the population with some differences between the genders, and variability due to physical height differences. 

## Exercise 5: Extract phonemes
The pronunciation of the single letter words $s$ and $f$ have one phoneme (sound component) in common. The phoneme is often listed with the symbol: $/\epsilon /$.
Extract the first phoneme in $s$ and $f$ by same method as used before: visual inspection and finding sample(s). Use Fig. 3 for this.

In [None]:
start_s_e,end_s_e = 4210,14224
start_f_e,end_f_e = 3800,13081

sound_s_e = sound_s[start_s_e:end_s_e]
sound_f_e = sound_s[start_f_e:end_f_e]

Let's hear if they sound the same:

In [None]:
Audio(sound_s_e, rate=SAMPLING_FREQUENCY)

In [None]:
Audio(sound_f_e, rate=SAMPLING_FREQUENCY)

## Exercise 6 
Next we manipulate sounds to illustrate the component structure. Separate the two phonemes of the s and f words and recombined them with switched $/\epsilon /$ sounds.  Listen to the synthesized sounds and judge the success of the editing.

**Hints**:
- The numpy function *arange(a,b)* creates an array [a,a+1,a+2,...,b-1]
- The numpy function *delete* removes indices from array
- The numpy function *insert* can be used to insert values in array

In [None]:
# YOUR CODE HERE
#
# Solution:
sound_s_new = np.insert(np.delete(sound_s,np.arange(start_s_e,end_s_e)),start_f_e,sound_f_e)
sound_f_new = np.insert(np.delete(sound_f,np.arange(start_f_e,end_f_e)),start_f_e,sound_s_e)

In [None]:
Audio(sound_s_new, rate=SAMPLING_FREQUENCY)

In [None]:
Audio(sound_f_new, rate=SAMPLING_FREQUENCY)

## Exercise 7 (optional)
Challenge, can you synthesize the sound of the *word* sofa from the components of the single letter sounds? 

In [None]:
# YOUR CODE HERE
#

# Solution:
sound_sofa = np.append(np.append(np.append(np.delete(sound_s,np.arange(0,end_s_e)),sound_o), \
                                 np.delete(sound_f,np.arange(0,end_f_e))), \
                               sound_a)
                       
Audio(sound_sofa, rate=SAMPLING_FREQUENCY)                       