# Introduction

This notebook is intended to walk a beginner through basic concepts of audio signal processing required to get into audio synthesis and analysis. A basic familiarity with the Julia programming language is required. A number of tutorials are available at [Julia Tutorials](https://julialang.org/learning/tutorials/). For the basics, you may want to refer to the [Julia Language - a concise tutorial](https://syl1.gitbook.io/julia-language-a-concise-tutorial/).

## Basic concepts and terminology

A digital audio signal is represented as a sequence of numbers taken to refer to some physical aspect of the sound such as air pressure around a microphone over time. The pressure is an analog quantity - i.e. can be measured at any given time and not only at specific times. Since we cannot deal with real world analog quantities on a computer, we use an approximation of it by **sampling** the analog quantity at regular intervals of time, called the **sampling rate** (in "Hertz" Hz - which is "samples per second"), and digitizing the pressure value into a number, usually in the range `[-1.0 to 1.0]`. For our purposes, we'll use 32-bit floating point numbers to represent these "samples".

For a **stereo** audio signal -- an audio signal with two **channels** -- each sample will be made of two such numbers, one for the left pickup and one for the right pickup. A pair of such samples may be referred to in the literature as a **sample frame**, a **frame** or just as a **sample** if the context is clear. For more complex signals like 8-channel surround sound, similarly 8 different numbers are packed into a **sample frame**. 

For our purpose, we'll mostly be dealing with **mono** sound signals - i.e. which have only one **channel**. A mono signal therefore has a very simple representation when working with the sound - a simple one-dimensional `Float32` array.

Common "sampling rates" used in audio processing are 22.05KHz, 44.1KHz, 48KHz and 96KHz (where KHz = "Kilo Hertz" where 1KHz = 1000Hz). **We will use 48KHz unless otherwise noted**. This is a good quality sampling rate and is usually the default on today's desktop computers and operating systems.

## A note on volume levels

Earlier, we noted that sound signals are "digitized" into numbers in the range `[-1.0 to 1.0]`. The maximum value of a sample can therefore be `1.0`. However, we do not want to use the full range since a computer's audio system may treat this as a very loud sound at default volume settings. Furthermore, we will need some "head room" when we're working with audio signals by combining multiple signals or running algorithms on them. For example, if we add two sine waves of **amplitude** 1.0 each, we'll get an amplitude of 2.0, which exceeds the range. When the range is exceeded and you play the audio anyway, you will hear what is called **clipping** - where the audio system on your computer will take `min(max(signal, -1.0), 1.0)` and play the result. **We do not want our audio to clip**. So either we will process the audio to reduce the amplitude after applying any processing steps, or we will simply use lower amplitude values to start with.

## Making a sine tone

A pure tone has the shape of the mathematical `sin(x)` function. The formula for such a pure sine tone can therefore be expressed as - $y(t) = a * \text{sin}(2\pi ft)$, where $a$ is the "amplitude" of the sine wave -- i.e. the maximum value it can take on, and $f$ is the frequency in Hz. $t$ is the time in seconds. This is the mathematical formula for an analog audio signal. In order to "digitize" it, we will convert it into a discrete series by calculating values at discrete times given by $t = f_s i$ where $f_s$ is the **sampling rate** (also known as **sampling frequency**) in some literature.

Let us make a 10-second long sine tone which has a frequency of 440Hz (the pitch known as "A440") and try to play it back.

In [40]:
function sine_wave(frequency, amplitude=0.25)
    # We return a mathematical function which we can sample at regular intervals
    # to get a digitized audio signal.
    function wave(t)
        amplitude * sin(2 * 3.141592654 * frequency * t)
    end
end

# Given any function of time and a duration, "digitize" will sample
# the function's values at a regular rate given by "sampling_rate" and
# return a Float32 array of the appropriate length.
function digitize(func, duration, sampling_rate=48000)
    number_of_samples = trunc(Int,sampling_rate * duration)
    signal = Array{Float32,1}(undef, number_of_samples)
    for i in 1:number_of_samples
        signal[i] = func(i / sampling_rate)
    end
    signal
end

# Given samples and a file name, this one writes the float32 samples
# as raw data to the given output file.
function write_sound_file(file_name, samples, sampling_rate=48000)
    out = open(file_name, "w")
    write(out, samples)
    close(out)
end

write_sound_file (generic function with 2 methods)

Now let's write out a simple sine tone of frequency 440Hz of duration 10 seconds and amplitude 0.25 and play it back.

In [43]:
write_sound_file("/tmp/10sec_A440.float32", digitize(sine_wave(440.0, 0.25), 10.0))

Once the above line is executed, you'll find a fil enamed "10sec_A440.float32" in the "tmp" folder. Most audio players won't know how to play it as the file does not contain all the information required to interpret its contents, such as the sampling rate and sample format.

1. Open Audacity application
2. Choose "File -> Import -> Raw data"
3. Choose the "/tmp/10sec_A440.float32" file
4. You'll see a dialog box with settings. Change the settings to the following -
    a. Set "Encoding" to "32-bit float"
    b. Set "Byte order" to "Default endianness"
    c. Set "Channels" to "1 (mono)"
    d. Set "Sample rate" to "48000" Hz.
    e. Leave the other boxes at their defaults.
5. Audacity will now open the file and display a waveform. Don't worry id you are seeing a blue block instead of a "wave form", because 10 seconds of audio has 4400 wiggles, they all get squashed into a block of colour.
6. Hit "play" to play the sound. Do you hear a tone?
7. Change the zoom settings - click on the magnifying glass icon and then click roughly in the middle of the waveform multiple times to "zoom into the audio signal". After clicking about 4 times, you should be able to see the sine wave shape emerge.

If you've come this far, **congratulations**! You've synthesized your first sound. Note that you can also go the other way around - i.e. record a mono sound using Audacity, export it into raw float32 format at 48000Hz and use simple julia code to read in the audio file to work with the clip.

## Questions

When importing the sound file into Audacity, what do you get if you chose a different value for the sample rate - say 24000Hz, or 44100Hz. Does the clip sound different? What happens to the "duration" that Audacity displays? Can you explain what you observe?


# The relationship between "frequency" and "phase" of a pure tone

When we described our pure sine tone, we wrote it out as a closed form mathematical expression like this - $y(t) = a \text{sin}(2\pi ft)$ where we noted that $f$ is the frequency. Supposing we want to make a sine wave whose frequency itself changes over time (like with much of music), how would we do it?

Let's say you want the frequency of the sound to vary between 440Hz and 660Hz over a duration of 2 seconds. We can write frequency curve that as a mathematical function of time as $f(t) = 440 + t * (660 - 440) / 2.0$ where $t$ is assumed to be in the range $0.0$ to $2.0$ seconds. 

**Task** Modify the above Julia code to make a sine wave whose frequency will vary according to the function $f(t)$ we just wrote. Digitize the result, write it out to a file and listen to it in Audacity. How does it sound? Does it sound correct to you? If yes, explain what you just did. If no, what sounds wrong about it?
