# How to write WAVE Files in Python

This sections will cover how to deal with [WAVE format files](https://ccrma.stanford.edu/courses/422-winter-2014/projects/WaveFormat) in python.

Modules exist already to do a lot of this work, but the purpose here is to demonstrate the principles of creating a wav file from scratch, regardless of the programming langauage used. 

This resource is presented as a guide for an instructor to create a light-weight utility for students which can be deployed easily which is useful when internet connections break or administrative privileges don't allow for easy installation of additional software.

It is also a resource for the curious student who would like to know a little more about how audio is packaged up for playback in its simplest form.

The following is written in python using just standard libraries. Alongside this article are some examples in other programming languages. Python also comes with the [`wave` module](https://docs.python.org/3/library/wave.html) and an example of using it is gicven in the section [Using the `wave` Module](#Using-the-wave-Module). This is helpful if you're focus is simply on achieving a quick method for audio playback. Much of the same philosophy from the rest of this article is applicable, some components may be a little more opaque when designing a library.


## Benefits to Making Your Own Library

- agile: you can change it when you need to
- reusable: it should be something simple enough you can pass it on to students to modify as they please.
- flexible: it should be something you can drop into a project for when administrative priveleges, internet connections and a myriad of other IT disasters conspire against you to derail a lesson plan.
- personal: it's _your_ library and you can customuse it as you please. As an instructor you may want to adapt it for creating coding challenges, as a sound designer or musician you might want to add functionality specific to your needs.

## Jupyter Notebook Shortcuts

|                   Function |           macOS           |              Windows              |
| -------------------------: | :-----------------------: | :-------------------------------: |
|                  Run Cells | <kbd>⌘</kbd>+<kbd>⏎</kbd> | <kbd>CTRL</kbd>+<kbd>ENTER</kbd>  |
| Run Cells and Select Below | <kbd>⇧</kbd>+<kbd>⏎</kbd> | <kbd>SHIFT</kbd>+<kbd>ENTER</kbd> |
| Run Cells and Insert Below | <kbd>⌥</kbd>+<kbd>⏎</kbd> |  <kbd>ALT</kbd>+<kbd>ENTER</kbd>  |
|        Toggle Line Numbers | <kbd>⇧</kbd>+<kbd>L</kbd> |   <kbd>SHIFT</kbd>+<kbd>L</kbd>   |
|             New Cell Above |       <kbd>a</kbd>        |           <kbd>a</kbd>            |
|             New Cell Below |       <kbd>b</kbd>        |           <kbd>b</kbd>            |
|                Delete Cell | <kbd>d</kbd> <kbd>d</kbd> |     <kbd>d</kbd> <kbd>d</kbd>     |


## WAVE Format Header

The elegant part of the WAVE format file is that the header is actually quite simple to understand. It is this header that defines a WAVE file, the `.wav` extension is just a hint to software as to what the file containes.

There are 44 bytes of of header data to understand, most of that is fixed values anyway, so there isn't a lot to remember.

Below is a table of the header in order. Offset and Size columns contain values in bytes:

| Endian | Offset | Field Name       | Size |
| -----: | :----: | ---------------- | :--: |
|    big |   0    | Chunk ID         |  4   |
| little |   4    | Chunk Size       |  4   |
|    big |   8    | Format           |  4   |
|    big |   12   | Sub-Chunk 1 ID   |  4   |
| little |   16   | Sub-Chunk 1 Size |  4   |
| little |   20   | Audio Format     |  2   |
| little |   22   | Num Channels     |  2   |
| little |   24   | Sample Rate      |  4   |
| little |   28   | Byte Rate        |  4   |
| little |   32   | Block Align      |  2   |
| little |   34   | Bits Per Sample  |  2   |
|    big |   36   | Sub-Chunk 2 ID   |  4   |
| little |   40   | Sub-Chunk 2 Size |  4   |
| little |   44   | PCM Audio Data   |  ?   |

The only unknown size is the PCM Audio Data as that will be determined by the number of samples and the byte depth.

### Notes of Terminology

A couple of notes on terminology before going forward.

- Pulse Code Modulation (PCM): The format of digital audio typically found in a WAVE file.
- Byte Depth vs. Bit Depth: Refer to fundamentally the same thing. Digital PCM audio resolution is generally talked of in bit depth (or bits per sample). Byte depth is simply Bit Depth ÷ 8, or bit depth in units of bytes.
- WAVE format vs. wav vs. `.wav`: all refer to a file in the WAVE format and are interchangeably
- samples vs. frames: In PCM audio a _sample_ is a single digital amplitude value for a single channel. A _frame_ is a collection of samples for all channels in a single moment in time. This means for monophonic audio 'sample' and 'frame' are _technically_ the same thing, but this resource will attempt to make the distinction clear in situations where increasing the number of channels may cause confusion.
- Endian: The order of bytes in a multi-byte number. You could describe the number `234` as starting from the 'big end' with "2 hundreds, 3 tens, 4 units"  or the 'little end' as "4 units, 3 tens, 2 hundreds". The same applies for data with multiple bytes. What this comes down to is when you have to write a value one byte at a time, what order do you do it in. Don't worry, it will be referred to explicitly when the time comes.

### Header Values

Below is a table of the header fields, their type and what value or calculations is needed.

|               Date Type | C Type     | Field Name       | Value                                           | Desciription                                     |
| ----------------------: | :--------- | ---------------- | :---------------------------------------------- | :----------------------------------------------- |
|      4 ASCII characters | `char[4]`  | Chunk ID         | `"RIFF"`                                        | The characters `R`,`I`,`F`,`F`                   |
| unsigned 4-byte integer | `uint32_t` | Chunk Size       | `4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)` | Size of the file in bytes                        |
|      4 ASCII characters | `char[4]`  | Format           | `"WAVE"`                                         | The characters `W`,`A`,`V`,`E`                   |
|      4 ASCII characters | `char[4]`  | Sub-Chunk 1 ID   | `"fmt "`                                        | The characters `f`,`m`,`t`,` `                   |
| unsigned 4-byte integer | `uint32_t` | Sub-Chunk 1 Size | `16`                                            | The size of the fields up to Bits per Sample     |
| unsigned 2-byte integer | `uint16_t` | Audio Format     | `1`                                             | `1` for PCM. See Appendix on Audio Format        |
| unsigned 2-byte integer | `uint16_t` | Num Channels     | `1`                                             | Number of channels: 1 = mono, 2 = stereo         |
| unsigned 4-byte integer | `uint32_t` | Sample Rate      | `44100`                                         | Sampling Rate in Hz                              |
| unsigned 4-byte integer | `uint32_t` | Byte Rate        | `SampleRate * NumChannels * BitsPerSample/8`    | Bytes per second transferred by audio            |
| unsigned 2-byte integer | `uint16_t` | Block Align      | `NumChannels * BitsPerSample/8`                 | Size of a single frame                           |
| unsigned 2-byte integer | `uint16_t` | Bits Per Sample  | `16`                                            | Size of a single sample                          |
|      4 ASCII characters | `char[4]`  | Sub-Chunk 2 ID   | `"data"`                                        | The characters `d`,`a`,`t`,`a`                   |
| unsigned 4-byte integer | `uint32_t` | Sub-Chunk 2 Size | `NumSamples * NumChannels * BitsPerSample/8`    | size of audio data (without the header) in bytes |

The audio data type is dictated by the `Bits Per Sample` field and the `Audio Format`. For the rest of this article we will work exclusively in 16-bit audio. If you want to start experimenting or would like to set a challenge based on resolution, those are the two fields which will require your attention.

#### Header Value Notes

There are some common stumbling blocks which are worthwhile being aware of as they might provide nice learning opportunities for students.

- Sub-Chunk 1 ID: there is a space character ` ` in the `"fmt "` character string that is very easy to miss. In fact, `Chunk ID`, `Format`, `Sub-Chunk 1 ID` a `Sub-Chunk 2 ID` are all just ASCII strings. Ask your students to open a wave file in a text editor and see if they can find them. See if the students can also find wav files with other text like `bext` which would indicate a [broadcast extension audio file](https://tech.ebu.ch/docs/tech/tech3285.pdf)
- PCM audio data is always signed EXCEPT for 8-bit (1 byte) data when the it is unsigned, centering on `127`.
- There is a hard limit to the size of an audio file, given that `Chunk Size` is a 4 byte integer i.e. $2^{32}$ bytes (or ~4.2 Gigabytes)


## Creating a WAVE file

So far there has been a lot of explanation on the structure of the WAVE format file, but it is all theory and abstract until you actually have to use that information to create a file. The section contains step-by-step instructions on how to create a WAVE file alongside some notes on potential problems you or your students may stumble across.

Like any recipe, lets first assmble our ingredients before we start cooking. From the table above we can simply take the field names and turn those into more pythonic variable names

| Field Name       | python variable    |
| ---------------- | ------------------ |
| Chunk ID         | `chunk_id`         |
| Chunk Size       | `chunk_size`       |
| Format           | `format`           |
| Sub_Chunk 1 ID   | `sub_chunk_1_id`   |
| Sub_Chunk 1 Size | `sub_chunk_1_size` |
| Audio Format     | `audio_format`     |
| Num Channels     | `num_channels`     |
| Sample Rate      | `sample_rate`      |
| Byte Rate        | `byte_rate`        |
| Block Align      | `block_align`      |
| Bits Per Sample  | `bits_per_sample`  |
| Sub_Chunk 2 ID   | `sub_chunk_2_id`   |
| Sub_Chunk 2 Size | `sub_chunk_2_size` |

We know the value that most of these have to be so can also fill in those blanks

| Field Name       | python variable    | Value                                                 |
| ---------------- | ------------------ | :---------------------------------------------------- |
| Chunk ID         | `chunk_id`         | `b"RIFF"`                                              |
| Chunk Size       | `chunk_size`       | `4 + (8 + sub_chunk_1_size) + (8 + sub_chunk_2_size)` |
| Format           | `format`           | `b"WAVE"`                                               |
| Sub_Chunk 1 ID   | `sub_chunk_1_id`   | `b"fmt "`                                              |
| Sub_Chunk 1 Size | `sub_chunk_1_size` | `16`                                                  |
| Audio Format     | `audio_format`     | `1`                                                   |
| Num Channels     | `num_channels`     | `1`                                                   |
| Sample Rate      | `sample_rate`      | `44100`                                               |
| Byte Rate        | `byte_rate`        | `sample_rate * num_channels * bits_per_sample/8`      |
| Block Align      | `block_align`      | `num_channels * bits_per_sample/8`                    |
| Bits Per Sample  | `bits_per_sample`  | `16`                                                  |
| Sub_Chunk 2 ID   | `sub_chunk_2_id`   | `b"data"`                                              |
| Sub_Chunk 2 Size | `sub_chunk_2_size` | `num_samples * num_channels * bits_per_sample/8`      |

which gives us:

```python
chunk_id         = b"RIFF"                                             
chunk_size       = 4 + (8 + sub_chunk_1_size) + (8 + sub_chunk_2_size)
format           = b"WAVE"                                             
sub_chunk_1_id   = b"fmt "                                             
sub_chunk_1_size = 16                                                 
audio_format     = 1                                                  
num_channels     = 1                                                  
sample_rate      = 44100                                              
byte_rate        = sample_rate * num_channels * bits_per_sample/8     
block_align      = num_channels * bits_per_sample/8                   
bits_per_sample  = 16                                                 
sub_chunk_2_id   = b"data"                                             
sub_chunk_2_size = num_samples * num_channels * bits_per_sample/8     
```

For the character strings, we have to use strings of bytes which [requires the prefix `b`](https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals) e.g. `b"RIFF"`

We have a problem, we need audio data before we know how long it is going to be. We will also have to change the order above as we are currently using variables before we even create them.

Lets begin by making some audio and to do so we import some functions and variables from the math library.

In [None]:
from math import sin, pi

For the audio we create a 440 Hz sine tone, 1 secod long at sampling rate 44.1 kHz. 

In [None]:
fs       = 44100.0   # Sampling Rate 
f0       = 440.0     # Fundamental frequency
duration = 1.0       # in seconds

delta    = 2.0 * pi * f0 / fs # how much does the phase change between samples
sine_wave = [sin(delta * i) for i in range(int(duration*fs))]


The number of samples is simply the length of the `sine_wave array`

In [None]:
num_samples = len(sine_wave)

Now we can start populating the fields of the WAVE header

In [None]:
chunk_id         = b'RIFF'                                             
format           = b'WAVE'                                              
sub_chunk_1_id   = b'fmt '                                             
sub_chunk_1_size = 16                                                 
audio_format     = 1                                                  
num_channels     = 1                                                  
sample_rate      = 44100                                              
bits_per_sample  = 16                                                 
sub_chunk_2_id   = b'data'                                             
byte_rate        = sample_rate * num_channels * bits_per_sample // 8     
block_align      = num_channels * bits_per_sample // 8                   
sub_chunk_2_size = num_samples * num_channels * bits_per_sample // 8     
chunk_size       = 4 + (8 + sub_chunk_1_size) + (8 + sub_chunk_2_size)

Note: the [`//` floor division operator]() is used here to ensure the result is an integer.

Though we have assigned the fields out of order, it is very important that we write them in order of the tables above.

Before we start writing we need a name for our file.

In [None]:
filename = "a440Hz.wav"

To write a file in python 3 we can use the `with open(filename,'wb')` construct.
The [`wb` option in `open`](https://docs.python.org/3/library/functions.html#open) signfies we want to open a file and write to it in binary mode, which we do as we are not writing text, but rather byte data.





In [None]:
wav_file =  open(filename, 'wb')

To write the data in the correct byte order (remember the Endian column from the first table) we can use the [`struct` module](https://docs.python.org/3/library/struct.html). The `struct` module's `pack` and `unpack` allows us to control [what order the data is written](https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment) and [its type](https://docs.python.org/3/library/struct.html#format-characters).

In [None]:
import struct

Taking the values from the tables in the `struct` documentation that is most relevant to us:

| Format | Type                      |
| :----: | ------------------------: |
| `H`    | unsigned 16-bit integer   |
| `h`    | signed 16-bit integer     |
| `I`    | unsigned 32-bit integer   |
| `s`    | byte string of characters |


| Character | Endian   |
| --------- | -------- |
| `<`       | `little` |
| `>`       | `big`    |


For example, `chunk_id` is a string with 4 characters and the `struct` format would be `4s`. `sub_chunk_1_size` is a little-endian 32-bit unsigned integer, which would require `<I`


| Endian | C Type     | `struct` format | python variable    |
| -----: | :--------- | :-------------: | ------------------ |
|    big | `char[4]`  |     `'4s'`      | `chunk_id`         |
| little | `uint32_t` |     `'<I'`      | `chunk_size`       |
|    big | `char[4]`  |     `'4s'`      | `format`           |
|    big | `char[4]`  |     `'4s'`      | `sub_chunk_1_id`   |
| little | `uint32_t` |     `'<I'`      | `sub_chunk_1_size` |
| little | `uint16_t` |     `'<H'`      | `audio_format`     |
| little | `uint16_t` |     `'<H'`      | `num_channels`     |
| little | `uint32_t` |     `'<I'`      | `sample_rate`      |
| little | `uint32_t` |     `'<I'`      | `byte_rate`        |
| little | `uint16_t` |     `'<H'`      | `block_align`      |
| little | `uint16_t` |     `'<H'`      | `bits_per_sample`  |
|    big | `char[4]`  |     `'4s'`      | `sub_chunk_2_id`   |
| little | `uint32_t` |     `'<I'`      | `sub_chunk_2_size` |

Putting that all together gives us

In [None]:
wav_file.write(struct.pack('4s', chunk_id))
wav_file.write(struct.pack('<I', chunk_size))
wav_file.write(struct.pack('4s', format))
wav_file.write(struct.pack('4s', sub_chunk_1_id))
wav_file.write(struct.pack('<I', sub_chunk_1_size))
wav_file.write(struct.pack('<H', audio_format))
wav_file.write(struct.pack('<H', num_channels))
wav_file.write(struct.pack('<I', sample_rate))
wav_file.write(struct.pack('<I', byte_rate))
wav_file.write(struct.pack('<H', block_align))
wav_file.write(struct.pack('<H', bits_per_sample))
wav_file.write(struct.pack('4s', sub_chunk_2_id))
wav_file.write(struct.pack('<I', sub_chunk_2_size))

If we close our file now, it _should_ be 44 bytes in size.

In [None]:
wav_file.close()

### PCM Audio Bit Depth

To write the audio date, we have to first transform it from floating point to a 16-bit value.

Signed integer PCM data has a quirk where the maximum negative value is one larger than the maximum positive value.

For unsigned 16-bit values the maximum number of values that can be represented is $2^{16}$. With signed values, we lose one bit to tell us if the sign is positive or negative (the sign bit). That means we actually have $2^{15}$ plus and minus values. One of _those_ values is zero which gives us $2^{15}$ for negative values and $2^{15} - 1$ for positive values.

We could simply clamp the apmlitude to $2^{15} - 1$ to avoid any trouble. However, this is a really nice example to talk about [twos-compliment](https://www.cs.cornell.edu/~tomf/notes/cps104/twoscomp.html) binary representation. The choice is up to you. To give the greatest freedom, we will deal with the problem in this example.

The magnitude of maximum value for 16-bit signed is $2^{15}$. Using the power oprator `**` that looks like

In [None]:
max_value = 2**15

We can generate pcm data in the correct range using a combination of the [`max`](https://docs.python.org/3/library/functions.html#max) and [`min`](https://docs.python.org/3/library/functions.html#min) functions

In [None]:
pcm = [max(-max_value, min(max_value-1, int(sample * max_value))) for sample in sine_wave]           

Breaking that down:

**A**. `int(sample * max_value)`: Scale samples to 2^{15}

**B**. `min(max_value-1,A)`: which value is smaller, **A** (`int(sample * max_value)`) or $2^{15}-1$

**C**. `max(-max_value, B)`: which value is bigger, **B** (the output from `min`) or $-2^{15}$

This should stop any accidental overflows.

We can then write out this data to a file.

We previously closed the file so we will have to open it again. This time we are appending data so we use the 'wb+' option

In [None]:
wav_file = open(filename, 'ab')

The data is 16-bit signed, little endian. Using the tables above that means our format string is `<h`

In [None]:
for sample in pcm:
    wav_file.write(struct.pack('<h', sample))
    
wav_file.close()

Lets compile that altogteher in a function called `write_wav_file` so that we can reuse it and modify it later.

In [None]:
def write_wav_file(audio, 
                   filename,
                   num_channels=1,
                   bits_per_sample=16,
                   sample_rate=44100,):
    
    number_of_samples = len(audio)
    chunk_id         = b'RIFF'                                             
    format           = b'WAVE'                                              
    sub_chunk_1_id   = b'fmt '                                             
    sub_chunk_1_size = 16                                                 
    audio_format     = 1                                                  
    sub_chunk_2_id   = b'data'                                             
    byte_rate        = sample_rate * num_channels * bits_per_sample // 8     
    block_align      = num_channels * bits_per_sample // 8                   
    sub_chunk_2_size = num_samples * num_channels * bits_per_sample // 8     
    chunk_size       = 4 + (8 + sub_chunk_1_size) + (8 + sub_chunk_2_size)

    with open(filename, 'wb') as wav_file:
        wav_file.write(struct.pack('4s', chunk_id))
        wav_file.write(struct.pack('<I', chunk_size))
        wav_file.write(struct.pack('4s', format))
        wav_file.write(struct.pack('4s', sub_chunk_1_id))
        wav_file.write(struct.pack('<I', sub_chunk_1_size))
        wav_file.write(struct.pack('<H', audio_format))
        wav_file.write(struct.pack('<H', num_channels))
        wav_file.write(struct.pack('<I', sample_rate))
        wav_file.write(struct.pack('<I', byte_rate))
        wav_file.write(struct.pack('<H', block_align))
        wav_file.write(struct.pack('<H', bits_per_sample))
        wav_file.write(struct.pack('4s', sub_chunk_2_id))
        wav_file.write(struct.pack('<I', sub_chunk_2_size))

        max_val = 2**15
        for sample in sine_wave:
            pcm_sample = int(sample * max_val)
            pcm_sample = max(-max_val, min((max_val-1), pcm_sample))  # Clamp to valid range
            wav_file.write(struct.pack('<h', pcm_sample))

In [None]:
write_wav_file(sine_wave, "a440hz-from-function.wav")

## Reading a WAVE file

A good test after creating a function to write WAVE file is to then read the data and write iit back out to another file. This should ensure that bothe read and write functions are working correctly.

We have just created a file so lets try and read it

In [None]:
wav_file = open(filename, 'rb')

For reading we need to perform the reverse operation for writing. For the `struct` module this means we use th `unpack` function. 

`struct.unpack` always returns a tuple, even if it only has one element in it.


We could and an index after the function like

```python
chunk_id        = struct.unpack('4s', wav_file.read(4))[0]
chunk_size      = struct.unpack('<I', wav_file.read(4))[0]
format          = struct.unpack('4s', wav_file.read(4))[0]
subchunk1_id    = struct.unpack('4s', wav_file.read(4))[0]
subchunk1_size  = struct.unpack('<I', wav_file.read(4))[0]
audio_format    = struct.unpack('<H', wav_file.read(2))[0]
num_channels    = struct.unpack('<H', wav_file.read(2))[0]
sample_rate     = struct.unpack('<I', wav_file.read(4))[0]
byte_rate       = struct.unpack('<I', wav_file.read(4))[0]
block_align     = struct.unpack('<H', wav_file.read(2))[0]
bits_per_sample = struct.unpack('<H', wav_file.read(2))[0]
subchunk2_id    = struct.unpack('4s', wav_file.read(4))[0]
subchunk2_size  = struct.unpack('<I', wav_file.read(4))[0]
```

With python, a cleaner is perhaps to add a `,` after the variable.

In [None]:
chunk_id,        = struct.unpack('4s', wav_file.read(4))
chunk_size,      = struct.unpack('<I', wav_file.read(4))
format,          = struct.unpack('4s', wav_file.read(4))
subchunk1_id,    = struct.unpack('4s', wav_file.read(4))
subchunk1_size,  = struct.unpack('<I', wav_file.read(4))
audio_format,    = struct.unpack('<H', wav_file.read(2))
num_channels,    = struct.unpack('<H', wav_file.read(2))
sample_rate,     = struct.unpack('<I', wav_file.read(4))
byte_rate,       = struct.unpack('<I', wav_file.read(4))
block_align,     = struct.unpack('<H', wav_file.read(2))
bits_per_sample, = struct.unpack('<H', wav_file.read(2))
subchunk2_id,    = struct.unpack('4s', wav_file.read(4))
subchunk2_size,  = struct.unpack('<I', wav_file.read(4))

Remembering the maximum absolute value is `2**15`

In [None]:
max_value = 2**15

`subchunk2_size` tells us the size of the audio in bytes. The number of samples is `subchunk2_size / 2` as there are 2 bytes per sample value.

`wav_file.read()` will read the remainder of the file.

In [None]:
pcm = struct.unpack(f'<{subchunk2_size//2}h', wav_file.read())

These values are then scaled to floating point ±1 by dividing by out maximum value

In [None]:
audio = [sample / max_value for sample in pcm]

Finally we can write the audio back out again and check that it works.

In [None]:
write_wav_file(audio, "a440hz-from-read-file.wav")

Added together our `read_wav_file` function might look like

In [None]:
def read_wav_file(filename):
    with open(filename, 'rb') as wav_file:                
        (chunk_id,)        = struct.unpack('4s', wav_file.read(4))
        (chunk_size,)      = struct.unpack('<I', wav_file.read(4))
        (format,)          = struct.unpack('4s', wav_file.read(4))
        (subchunk1_id,)    = struct.unpack('4s', wav_file.read(4))
        (subchunk1_size,)  = struct.unpack('<I', wav_file.read(4))
        (audio_format,)    = struct.unpack('<H', wav_file.read(2))
        (num_channels,)    = struct.unpack('<H', wav_file.read(2))
        (sample_rate,)     = struct.unpack('<I', wav_file.read(4))
        (byte_rate,)       = struct.unpack('<I', wav_file.read(4))
        (block_align,)     = struct.unpack('<H', wav_file.read(2))
        (bits_per_sample,) = struct.unpack('<H', wav_file.read(2))
        (subchunk2_id,)    = struct.unpack('4s', wav_file.read(4))
        (subchunk2_size,)  = struct.unpack('<I', wav_file.read(4))
            
        number_of_frames = subchunk2_size // (bits_per_sample // 8)

        max_amplitude = 2**15
        pcm = struct.unpack(f'<{subchunk2_size//2}h', wav_file.read())
        audio = [sample / max_value for sample in pcm]

    return audio

## Ideas for expansion


What we have covered is how to sucessfully write a monophonic, 44.1 kHz, 16-bit WAVE file. There a a lot of potential ways on building on these two very simple functions that would provide an interesting challenege for students

For some ideas on how to expand this we might consider

- Get audio Metadata: Change the read function to not just return audio, but the audio metadata. Maybe using a dictionary or maybe these functions should be methods of a class?


## Using the `wave` Module

The [`wave` module](https://docs.python.org/3/library/wave.html) is one of pythons oldest original standard liraries meaning it can be considered universal and is guarenteed to be available in any python install.

The wave library provides some handy functions to deal with parsing the header of a wav file so you don't have to do the work.

Begin first by importing `wave`: 

In [None]:
import wave

as always, we import some functions and variables from the math library

In [None]:
from math import sin, pi

here we can create our sine wav to be written to a file

In [None]:
fs = 44100.0   # Sampling Rate 
f0 = 440.0     # Fundamental frequency
duration = 1.0 # in seconds

delta = 2.0 * pi * f0 / fs # how much does the phase change between samples

sine_wave = [sin(delta * i) for i in range(int(duration*fs))]

write to a wav file

In [None]:
file = wave.open('test.wav', 'wb')

In [None]:
file.setnchannels(1)
file.setsampwidth(2)
file.setframerate(int(fs))

The `wave` module expects audio data to be in a byte format. We need to import the `struct` module which provides the ability to transform from one data type to another.

We currently have a list of sample values and before we can write them to a file we need to turn them into a byte-string.

This is typical data wrangling and will be an epected stage whenever writing data to a file.

It can be a little dauntin for it to be one of the first things to introduce to students. Therefore, it is one of the benefits to abstracting the process away in a library. At first they can use the library you provide, but after a while you can invite them to open up the file and explore a little further.


In [None]:
import struct

For explanation on the bit depth and scaling the audio see the [PCM-Audio-Bit-Depth section above](#PCM-Audio-Bit-Depth)

This time we will take the more simplistic approach and scale to $2^{15} - 1$

In [None]:
bit_depth = 16
max_amplitude = (2 ** (bit_depth - 1)) - 1
byte_data = b''.join([struct.pack('<h', int(sample * max_amplitude)) for sample in sine_wave])

We open the file with [`wave.open`](https://docs.python.org/3/library/wave.html#wave.open)

In [None]:
filename = 'A440Hz-wave-module.wav'
wave_file = wave.open(filename, 'wb')

Before we can write the audio data to a file there are a couple of pieces of metadata to confiugure, namely

- number of channels
- byte depth
- sample rate

In [None]:
wave_file.setnchannels(1)  # mono
wave_file.setsampwidth(bit_depth // 8)  # 16-bit depth i.e. 2 bytes
wave_file.setframerate(int(fs))

after setting those we can write the byte data

In [None]:
wave_file.writeframesraw(byte_data)

In [None]:
wave_file.close()

In [None]:
import IPython.display as ipd

ipd.Audio(data=sine_wave, rate=fs)

# Reading a wav file

Reading is a lot more simplistic than writing as a lot of decisions have been made for you.

In general, to read a wav file you should expect to deal with three elements

1. opening a file object in read mode
2. reading byte data
3. transforming byte data into floating point format

For python it will be easiest to keep the audio sample format to a list of float type numbers.
This is closest to what you will find in other programming languages.

This assumes you are enforcing wav files with

- 1 channel (mono)
- 16-bit depth

In [None]:
import wave
import struct

wave_file = wave.open(filename, 'rb')
p = wave_file.getparams()
frames = wave_file.readframes(p.nframes)
audio_samples = [sample[0] / max_amplitude for sample in struct.iter_unpack('<h',frames)]

The `struct` library's `unpack` functions always return a tuple, even if they are only 1 element long, the sort of arbitrary decision that can snipe some students into paralysis as they try to navigate. Another good reason to remove this kind of operation from view.

## An example library

Below is an example of a possible simple library you could provide.

Modify based on what the fpucs of the lessons is. If it is teaching DSP, now might not be the time to punish students for getting the file extension wrong.

If the focus _is_ to teach holistic programming skills, like how to read error, think perhaps of changing the contents of `if not filename.endswith('.wav'):` to throw a helpful error instead.

The library is incredibly limited, but to an extent that is the point.

Should only be a handful of lines, in this case under 30 lines.


Treads some middle ground, enforce parameters like sample rate and bit depth.

Don't be afraid to admit that you have pitched the library incorrectly.

If students are tripping up at the same point, then you can be agile and alter the library accordingly.


To import, all students should have to type is 

```py
from wav_library import *
```

after which they will have access to the `write_wav_file` and `read_wav_file` functions


In [None]:
# wav_library
#
# To import, all students should have to type is 
#
# ```py
# from wav_library import *
# ```
#
# After which they will have access to the `write_wav_file` and `read_wav_file` functions
#
import struct
import wave
from math import sin, pi

def write_wav_file(float_data, filename, nchannels=1, bit_depth=16, sample_rate=44100):
    
    normalisation = 1 / max([abs(x) for x in float_data])
    
    float_data = [sample * normalisation for sample in float_data]
    
    if not filename.endswith('.wav'):
        filename += '.wav'
        
    with wave.open(filename, 'wb') as wave_file:
        wave_file.setnchannels(nchannels)
        wave_file.setsampwidth(bit_depth // 8)
        wave_file.setframerate(sample_rate)
                
        max_amplitude = (2 ** (bit_depth - 1) - 1)
        byte_data = b''.join([struct.pack('<h', int(sample * max_amplitude)) for sample in float_data])
        
        wave_file.writeframesraw(byte_data)

def read_wav_file(filename):

    if not filename.endswith('.wav'):
        filename += '.wav'
        
    with wave.open(filename, 'rb') as wave_file:
        p = wave_file.getparams()
        frames = wave_file.readframes(p.nframes)
        audio_samples = [sample[0] / max_amplitude for sample in struct.iter_unpack('<h',frames)]
        return audio_samples

Limitations with this approach

There are a lot of drawbacks and stumbling blocks which, rather than pretending they don't exist, are worth being aware of.

- guessing file names could cause confusion down the line
- this doens't support stereo
- not transferrable to other languages, this makes heavy use of python list comprehensions. Givene the prevalence of C languages in audio programming, it may be better to follow a standard for loop structure
- messy returns. The read function must return a lot of variables. This could be done as with an object-orientated approach, but there is tradeoff with the complexity that would be removed and put in its place.
- this assumes parameters that would likely change, expecially if students wish to use there own samples.
- the audio is always normalised using `normalisation = 1.0 / max(abs(float_data))`

In [None]:
import IPython.display as ipd
ipd.Audio(data=read_wav_file(filename), rate=fs)


## Appendices


### Audio Format Field Values

| Hex Value | Format     |
| --------: | ---------- |
|    0x0001 | PCM        |
|    0x0003 | IEEE Float |
|    0x0006 | A LAW      |
|    0x0007 | MU LAW     |
