In [13]:
%%capture
# Installs
!pip3 install bokeh

In [136]:
# Imports
import IPython.display as ipd
import numpy as np
import bokeh 
from bokeh.io import output_notebook
from bokeh.plotting import figure, output_file, show
from scipy.io import wavfile
import scipy.signal as signal
import matplotlib.pyplot as plt
import sounddevice as sd

output_notebook()

# Question 1

Randy Jones 20:50 - In this segment Randy and George talk about Don Norman's principles of design, and that the human brain can only have enough mental capacity for about seven knobs in a given area plus or minus two. This got me thinking about a discussion we had in class about the Yamaha DX7 and why it only has 6 nodes in its algorithms. While I still think the most plausible reason is due to computational requirements, it could also be from this same design principle, that any more nodes would be too hard to easily comprehend and use.

# Question 2

Perry Cook 49:25 - In this segment Perry and George are discussing how the tambourine is created and how it is then digitally recreated. I thought this section was really cool to explore how the randomness of noise is also reflected in the randomness of creating the cymbals of the tambourine. When we create more percussive sounds digitally it is usually a random sampling of frequencies in a short duration with more lower frequencies creating kick drum sounds (eg. lowpass filter) and higher frequencies for things like a hihat or tambourine (eg. highpass filter). Perry described that the same thing happens when creating the tambourine in the physical world: you just take metal and hammer it out close enough to a thin disc and with enough of them together you get a tambourine. So in both the physical and digital world tambourines are created from random frequencies!

# Question 3

Got two different mug recordings and plotted their magnitude spectra as shown in the FM synthesis notebook. The higher sounding mug (Mug 1) does indeed have higher frequencies in its spectra whereas Mug 2 which is lower and more percussive has lower frequencies with less peaks (eg. less harmonics).

In [6]:
# Question 3
# Resource: https://github.com/gtzan/synthesizers_cs_perspective/blob/main/src/notebooks/fm_synthcsp.ipynb


def plot_mag_spectra(output, srate, title):
    mag_spectrum = abs(np.fft.rfft(output))
    p = figure()
    freqs = np.linspace(0, 0.5 * srate, len(mag_spectrum))
    max_freq_bin = int(srate / len(mag_spectrum) * 5000)
    p.line(freqs[0:max_freq_bin],mag_spectrum[0:max_freq_bin] * 2 * (1.0 / srate))
    p.title.text_font_size = '20pt'
    p.title.text = title
    show(p)

srate1, mug1 = wavfile.read('./mug1.wav')
srate2, mug2 = wavfile.read('./mug2.wav')
ipd.Audio(mug1, rate=srate1)

  srate1, mug1 = wavfile.read('./mug1.wav')


In [7]:
ipd.Audio(mug2, rate=srate2)

In [9]:
plot_mag_spectra(mug1, srate1, "Mug 1 Magnitude Spectra")
plot_mag_spectra(mug2, srate2, "Mug 2 Magnitude Spectra")

# Question 4

Listening to the two additive synthesis creations I can tell which mug is which based on their respective frequencies, however it loses a lot of it's percussive "mug-tapping" qualities sounding more like a shortly played organ note.

In [39]:
# Question 4
# Resource: https://github.com/gtzan/synthesizers_cs_perspective/blob/main/src/notebooks/additive_synthcsp.ipyn

def sinusoid(freq, t): 
    data = np.sin(2*np.pi*freq *t)
    return data

def additive_synth(data, srate):
    spectrum = np.fft.fft(data)
    magnitudes = np.abs(spectrum)
    phases = np.angle(spectrum)

    fundamental_freq = np.argmax(magnitudes)
    freqs = [fundamental_freq, fundamental_freq*2, fundamental_freq*3, fundamental_freq*4]

    # Create the additive synthesis model
    duration = len(data) / srate
    time = np.linspace(0, duration, len(data))

    model = np.zeros_like(time)
    for freq in freqs:
        oscillator = sinusoid(freq, time) 
        envelope = signal.gaussian(len(data), std=len(data)/10)
        model += oscillator * envelope

    return model, fundamental_freq

add_mug1, mug1_fundamental = additive_synth(mug1, srate1)
ipd.Audio(add_mug1, rate=srate1)


In [40]:
add_mug2, mug2_fundamental = additive_synth(mug2, srate2)
ipd.Audio(add_mug2, rate=srate2)

# Question 5

Similar to question 4 I can tell the difference between the two mug recordings as one mug has higher frequencies than the other, this time however the modal synthesis creates a more bell-like sound, with that initial impulse acting as the percussive hit and then a resonance afterwards. So while it is more like the mugs percussive-wise it also make it much more pitched and musical as well.

In [41]:
# Question 5
# Resource: https://github.com/gtzan/synthesizers_cs_perspective/blob/main/src/notebooks/modal_synthesis.ipynb


def make_impulse(srate):
    N = int(1.5*srate)  
    impulse = np.zeros(N)
    impulse[0] = 1 
    return impulse, N

def modal_resonance(audio, amp, freq, radius, srate): 
    b = np.ones(1)
    a = np.zeros(3)
    a[0] = 1.0 
    a[1] = -2*radius * np.cos(2*np.pi*freq*(1.0/srate))
    a[2] = radius * radius 
    # apply filter
    filtered_audio = amp * scipy.signal.lfilter(b, a, audio)
    return filtered_audio 

def modal_note(fundamental, excitation, N): 
    # Mode parameters
    nModes = 4
    freqs = fundamental * np.array([1.0, 1.81,2.27, 4.54]) # modal center frequencies
    radii = [ 0.999, 0.9999, 0.9995, 0.9995] # modal radii
    amps = [40, 60, 50, 70]    
    
    modes = [] 
    mix = np.zeros(N)
    for m in np.arange(0, nModes): 
        modes.append(modal_resonance(excitation, amps[m], freqs[m], radii[m], srate))
        mix += modes[m]
        # normalize modes after mixing for individual playback 
        modes[m] = 0.5 * (modes[m] / np.max(modes[m]))
    mix = 0.5 * (mix / np.max(mix))
    return mix, modes 
    
mug1_impulse, mug1_N = make_impulse(srate1)
(mix1,modes1) = modal_note(mug1_fundamental, mug1_impulse, mug1_N)
mug2_impulse, mug2_N = make_impulse(srate2)
(mix2,modes2) = modal_note(mug2_fundamental, mug2_impulse, mug2_N)

ipd.Audio(mix1, rate=srate1, normalize=False)

In [42]:
ipd.Audio(mix2, rate=srate2, normalize=False)

# Question 6

Plotting all of the magnitude spectra is really interesting, we can see that the additive synthesis and modal synthesis both have peaks in similar ranges to the origianl mag spec. However the additive synthesis has more peaks than the modal which you would think would make it sound more like the original mug tap, however it's actually the modal with its singular peak that sounds more like the mug. I think the way modal is generated (eg. with an impulse) might help with creating that more percussive sound of tapping the mug.

In [43]:
# Question 6

plot_mag_spectra(mug1, srate1, "Mug 1 Magnitude Spectra")
plot_mag_spectra(mug2, srate2, "Mug 2 Magnitude Spectra")
plot_mag_spectra(add_mug1, srate1, "Mug 1 Additive Mag Spec")
plot_mag_spectra(add_mug2, srate2, "Mug 2 Additive Mag Spec")
plot_mag_spectra(mix1, srate1, "Mug 1 Modal Mag Spec")
plot_mag_spectra(mix2, srate2, "Mug 2 Modal Mag Spec")

# Question 7

Playing around with the index and ratio, it seems like a low mc_ratio results in a more kickdrum-like sound whereas a higher mc_ratio results in a sound more like a snare drum (or with a higher freq like a hihat). I found that too high of an index started adding some pitch/harmonics and made it sound like a melodic sound so I kept the index at 3 to keep it more percussive.

In [90]:
# Question 7
# Resource: https://github.com/gtzan/synthesizers_cs_perspective/blob/main/src/notebooks/fm_synthcsp.ipynb

def hz2radians(f, srate):
    return 2 * np.pi * f / srate

def envelope(segments,srate,duration): 
    nsamples = int(srate*duration)
    value = 0.0
    segment_index = 0 
    data = np.zeros(nsamples)
    segment_sample = 0 
    prev_target = 0.0

    for i in np.arange(nsamples): 
        if (segment_index < len(segments)): 
            target = segments[segment_index][0]
            ramp_time = segments[segment_index][1]
            delay_time = segments[segment_index][2]
            
            ramp_samples = (ramp_time / 1000.0) * srate 
            delay_samples = (delay_time / 1000.0) * srate
            
            if i < segment_sample + ramp_samples: 
                incr = (target-prev_target) / ramp_samples 
            elif i < segment_sample + ramp_samples + delay_samples: 
                incr = 0.0 
            else: 
                if ramp_samples != 0.0: 
                    incr = (target-prev_target) / ramp_samples 
                else: 
                    incr = 0.0 
                segment_sample = i 
                segment_index = segment_index+1 
                prev_target = target 
            value = value + incr 
        data[i] = value
    return data

def frequency_modulation(start, end, freq, mc_ratio, index, srate,env): 
    output = np.zeros(end-start)
    carrier_phase = 0.0 
    carrier_phase_incr = hz2radians(freq,srate)
    modulator_phase_incr = hz2radians(mc_ratio * freq,srate)
    
    # get centered sin after integration 
    modulator_phase = 0.5 * (np.pi + modulator_phase_incr) 
    fm_index = hz2radians((mc_ratio * freq * index), srate)
    
    ind_env = fm_index * env
    
    for t in np.arange(start, end): 
        modulation = ind_env[t] * np.sin(modulator_phase)
        
        output[t] = env[t] * np.sin(carrier_phase)
        
        carrier_phase += (modulation + carrier_phase_incr)
        modulator_phase += modulator_phase_incr
    return output 

srate7 = 48000
s7 = [(1, 20, 0), (0,10,0), (0, 70,0), (0.0, 900, 0)]
env = envelope(s7, srate7, 1)
output = frequency_modulation(0, srate7, 50, 3, 1000, srate7, env)
ipd.Audio(output, rate=srate7)

# Question 8

Used a similar set-up to the digital filter notebook's bandpass filter and found that when the central frequency (cf) was a lot lower than the input's frequency the output would be much quiter/dampened and if the quality factor was lower then it would replicate the inputted sine wave more closely but a high q would result in a school lunch alarm sound between the two frequencies.

In [171]:
# Question 8
# Resource: https://github.com/gtzan/synthesizers_cs_perspective/blob/main/src/notebooks/digital_filters.ipynb
# and https://ccrma.stanford.edu/~jos/smac03maxjos/smac03maxjos.pdf

def sine(freq=440, amp=1.0, dur=1.0, srate=44100):
    t = np.linspace(0, dur, int(srate * dur))
    return amp * np.sin(2 * np.pi * freq * t)

def two_pole(audio, cf, q, srate):
    b = np.zeros(3)
    a = np.zeros(3)

    frad = 2 * np.pi * cf / srate 
    alpha_ = np.sin(frad)/(2*q)
    a[0] = 1 + alpha_
    a[1] = -2 * np.cos(frad)
    a[2] = 1 - alpha_
    b[0] = (1 - np.cos(frad)) / 2
    b[1] = 1 - np.cos(frad)
    b[2] = (1 - np.cos(frad)) / 2

    filtered_audio = signal.lfilter(b, a, audio)
    return filtered_audio

audio8_1 = sine()
data8_1 = two_pole(audio8_1, 400, 1000, 44100)
ipd.Audio(data8_1, rate=44100)

In [127]:
audio8_2 = sine(880, 1, 2, 44100)
data8_2 = two_pole(audio8_2, 500, 5, 44100)
ipd.Audio(data8_2, rate=44100)

# Question 9

Used the two_pole filter from question 8 on the two mug recordings and looked at the magnitude spectra. It appears that the lower frequencies of both with gently and generally boosted whereas the higher frequencies were more supressed and don't have any significant peaks. If it's similar to what we learned in class, then this could be due to the averaging formula for the filter, evening out the richer low frequencies together and lessening the more sparse higher frequencies/harmonics.

In [133]:
# Question 9
data9_1 = two_pole(mug1, mug1_fundamental, 10, srate1)
ipd.Audio(data9_1, rate=srate1)

In [134]:
data9_2 = two_pole(mug2, mug2_fundamental, 10, srate2)
ipd.Audio(data9_2, rate=srate2)

In [135]:
plot_mag_spectra(data9_1, srate1, "Mug 1 Two Pole Filter")
plot_mag_spectra(data9_2, srate2, "Mug 2 Two Pole Filter")

# Question 10

Made a real-time modal synthesis wavetable, with an impulse from Mug 1. Since its in a notebook then I am just generating two seconds of audio here, but in practice you could continuously cycles through the wavetable to play a note for as long as you want!

In [169]:
# Question 10
# Resource: https://github.com/gtzan/synthesizers_cs_perspective/blob/main/src/notebooks/wavetables_synthcsp.ipynb

def create_modal_wavetable(fundamental, srate):
    impulse, N = make_impulse(srate)
    (mix,modes) = modal_note(fundamental, impulse, N)
    return mix
    
def wavetable_lookup(phase_index, wavetable): 
    x  = phase_index
    x0 = int(phase_index)
    x1 = x0+1
    y0 = wavetable[x0]
    y1 = wavetable[x1]
    return y0 * (x1-x) + y1 * (x - x0)


table_length = 1000
phase_index = 0
wavetable = create_modal_wavetable(mug1_fundamental, srate1)
duration_samples = 2 * srate1 
samples = np.arange(0, duration_samples)
data = np.zeros(duration_samples)

freq = 220 
phase_increment = (table_length * freq) / srate1 
for s in samples: 
    phase_index = (phase_index + phase_increment) % table_length
    data[s] = wavetable_lookup(phase_index, wavetable)
 
ipd.Audio(data, rate=srate1)
