# Data sonification with Python

## Instructor

- Walt Gurley

## Workshop description

Data visualization is the process of using graphical elements to represent data. This workshop introduces the concept of data sonification, using characteristics of sound to represent information. Sonification can provide an alternate mode for communicating data with implications for accessibility, engagement, and discovery. Participants in this workshop will get an overview of sonification techniques and tools and learn basic processes for mapping data to sound using the Python programming language.

## Learning objectives

- A basic understanding of the properties of sound

- A basic understanding of using properties of sound to represent data and different sonification methods

- Experience using `pandas` data types and methods to manipulate data

- An awareness of the `music21` library and its utility for producing audio from structured data

- Experience applying different methods to map numerical data to audio frequency/pitch using Python

## Questions during the workshop

Please feel free to ask questions throughout the workshop.

We have a second instructor who will available during the workshop. They will answer as able, and will collect questions with answers that might help everyone to be answered at the end of the workshop.

## Using Jupyter Notebooks and Google Colaboratory

Jupyter notebooks are a way to write and run Python code in an interactive way. They're quickly becoming a standard way of putting together data, code, and written explanations or visualizations into a single document and sharing that. There are a lot of ways that you can run Jupyter notebooks, including just locally on your computer, but we've decided to use Google's Colaboratory notebook platform for this workshop. Colaboratory is “a Google research project created to help disseminate machine learning education and research.” If you would like to know more about Colaboratory in general, you can visit the [Welcome Notebook](https://colab.research.google.com/notebooks/welcome.ipynb).

## Notebook setup

The next cell loads the necessary Python libraries and dependencies. If this notebook is run in Google Colaboratory, it will also install and load extra dependencies to create audio files and play audio files.

Libraries imported into this notebook:
- [music21](https://web.mit.edu/music21/) - toolkit for computer-aided musicology
- [pandas](https://pandas.pydata.org/) - a data analysis and manipulation tool
- [NumPy](https://numpy.org/) - a library supporting numerical analysis
- [matplotlib](https://matplotlib.org/) - a plotting library
- [midi2audio](https://pypi.org/project/midi2audio/) - a tool for synthesizing or playing MIDI audio
- [IPython Audio controls](https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html?highlight=audio#IPython.display.Audio) - a tool for playing audio and generating audio controls in a Jupyter notebook
- [os](https://docs.python.org/3/library/os.html) - a Python module providing functions for manipulating files and directories

Additional dependencies:
- [Fluidsynth](http://www.fluidsynth.org/) - a synthesizer for processing MIDI files. This can be installed in Google Colaboratory and can also be installed on your local machine
- [Fluid (R3) Mono General MIDI SoundFont (GM) (FluidR3Mono_GM.sf3)](https://github.com/musescore/MuseScore/tree/master/share/sound) - a soundfont file containing the synthesized instruments to playback MIDI audio. This is an open source soundfont file obtained from the [MuseScore GitHub repository](https://github.com/musescore/MuseScore)

In [None]:
# Test if this notebook is running in Colab
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False
print("I am in Colab: " + str(IN_COLAB))

# If running in Colab install additional dependencies to create audio files from
# MIDI files
if IN_COLAB:
  # Install synthesizer to process MIDI files
  !apt install fluidsynth
  # Copy the soundfonts file to our session storage space (No longer using installed soundfont. Using soundfont stored in GitHub repository)
  # !cp /usr/share/sounds/sf2/FluidR3_GM.sf2 ./FluidR3_GM.sf2
  # Install midi2audio module to convert MIDI files to audio files
  !pip install midi2audio

# Load midi2audio module to convert MIDI files to audio files
from midi2audio import FluidSynth

# Load modules from the music21 library for sonifying data
from music21 import instrument, note, scale, stream, midi, tempo

# Load data processing and visualization libraries
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
plt.style.use('ggplot')

# Load the Audio display tool to play and control audio
from IPython.display import Audio

# Import os library to test if "FluidR3Mono_GM.sf3" file already exists in
# current directory, fetch the soundfont file from GitHub if not
import os
if not os.path.exists('FluidR3Mono_GM.sf3'):
  # Fetch the soundfont file from GitHub
  !curl https://raw.githubusercontent.com/ncsu-libraries-data-vis/data-sonification-with-python/main/FluidR3Mono_GM.sf3 -o FluidR3Mono_GM.sf3

## A brief overview of audio properties and sonification

Sound travels through air like a wave as particles are compressed together and then stretched apart. By measuring how these particles change we can represent sound as a series of waves called a waveform.

An audio waveform has two basic properties, **amplitude** and **wavelength**. Amplitude is measured as the magnitude of displacement of a particle from its original position and can be thought of as loudness. Wavelength is used to measure frequency. Frequency is a measure of how many times the waveform repeats over time. Frequency is directly related to pitch, lower frequencies have a lower pitch and higher frequencies a higher pitch. Humans have a general hearing frequency range of 20 Hz to 20,000 Hz (Hz = one cycle per second).

Here is a wonderful [interactive guide to audio waveforms](https://pudding.cool/2018/02/waveforms/) by Josh Comeau.

![An image representing a sound wave traveling through the air and as a waveform.](https://github.com/WaltGurley/rem-rem-cur/blob/gh-pages/MicIn/particleSound.png?raw=true "As a sound wave travels through the air particles are compressed and stretched apart. This can be modeled as a waveform")

Beyond the basic properties of sound waves we can also consider the audio properties of timbre (the perceived quality of a sound, e.g., how a guitar sounds different than a trumpet), tempo (the speed at which a collection of sounds are played, e.g., beats per minute), and spatial positioning (where a sound is coming from in space, e.g., panning left or right in stereo sound).

Just as we may map a data value to color, position, or size on a graph, we can use these properties of sound to represent data sonically. Sonification has the ability to represent data in a way that complements visualization and has valid application with regard to accessibility, engagement, and discovery:

- Accessibility: [SAS Graphics Accelerator](https://support.sas.com/software/products/graphics-accelerator/samples/index.html)

- Engagement: [Sounds from Around the Milky Way](https://www.nasa.gov/mission_pages/chandra/news/data-sonification-sounds-from-around-the-milky-way.html)

- Discovery: [Sonification of Cyclone Sidr](https://youtu.be/RRluA1r3rTk)


## Load and observe/clean the dataset

We will be working with a dataset consisting of monthly atmospheric carbon dioxide (CO<sup>2</sup>) values measured at Mauna Loa Observatory, Hawaii from the Scripps CO<sup>2</sup> program website (visit the [Scripps CO<sup>2</sup> program website](https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html) for more information about the dataset we are using).

We load the raw dataset directly from the Scripps website using the pandas method `read_csv()`. The raw data file requires some intial parsing and manipulation using arguments passed into `read_csv()`, such as identifying the starting row of the data (using the keyword argument `header=56`), filtering only the columns of data of interest (using the keyword argument `usecols=[0, 1, 2, 4, 5]`, and parsing dates (using the keyword argument `parse_dates=[2]` and the `date_parser` function). Comments are included in the code below to describe each argument.

In [None]:
# Read and format the csv file located at the provided URL into a Pandas
# DataFrame
co2_raw = pd.read_csv(
  # The URL of the csv file
  "https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/monthly/monthly_in_situ_co2_mlo.csv",
  # The row position of the column headers for the dataset
  header=56,
  # Create new column header names to rename given headers
  names=["year", "month", "date", "CO2ppm", "season_adj"],
  # Only take the specified columns from the csv file
  usecols=[0,1,2,4,5],
  # Parse dates from the data given in column 2
  parse_dates=[2],
  # How to parse the date values
  date_parser=lambda x: pd.to_datetime(int(x), unit='D', origin=pd.Timestamp('01-01-1900')),
  # Set the column index of the DataFrame
  index_col=2
)

# Print the data


In [None]:
# Replace missing values (-99.99) with NaN


# Trim dataset to remove leading and trailing missing CO2 data


# Print the data


In [None]:
# Configure the plot
plt.figure(figsize=[8, 5])
plt.xlabel("Time (months)")
plt.ylabel("CO2 ppm")
plt.title("Monthly atmospheric CO2 values measured at Mauna Loa Observatory, Hawaii")

# Plot the data


**Question:** What properties of sound could we use to sonify this dataset (e.g., frequency, amplitude, timbre, tempo, and spatial position)?

## Audification
Audification is a type of sonification in which a data series is directly translated into an audio waveform. This sonification process is applied in research ranging from medicine to seismology and astronomy.

Audification is typically suitable for large datasets that have a cyclical component.

Examples:

- [Vibrations of the Sun](https://soundcloud.com/nasa/sun-sonification)
- [Audification of brainwave data](https://youtu.be/y1Nl3De_frM)

### Audification of a sine wave
We will first demonstrate the process of audification by generating a sine wave over time and then converting that data into audio.

When creating audio we need to consider two things, the sample rate and the shape of the sound wave.

In [None]:
# How many times per second (Hz) are we sampling our data? (44100 Hz is a
# common sampling frequency)


# Over how many seconds are we sampling our data?


# Generate a series of time samples at a rate of 'sampleRate' Hz over 'time' 
# seconds (44100 Hz * 2 seconds = 88200 samples)


# Print out time samples


In [None]:
# Generate a sine wave with a frequency of 'frequency' Hz over 'timeSample'
# samples 


In [None]:
# Configure the sine wave plot
plt.figure(figsize=[15, 5])
plt.xlabel('Time (seconds)')
plt.ylabel('Amplitude')
plt.title(f'{frequency} Hz sine wave')

# Only plot 1/4 of the data (0.25 seconds)
time_max = sample_rate // 4
plt.plot(time_samples[:time_max], sine_wave[:time_max])

In [None]:
# Generate audio from sine wave data


**Question**: Would a 440 Hz sine wave have a higher or lower pitch than the 220 Hz sine wave?

### Audification of CO<sup>2</sup> concentration data

We need to modify our data in order to create an audification. First, our data has a sample rate of 12 samples per year. That sample rate is approximately 0.0000004 Hz, and based on the lower limit of human hearing (20 Hz), this frequency is well below our ability to hear.

To bring our dataset into an audible frequency range we must speed it up considerably. We will compress the time scale of our data to a sample rate of 3000 Hz (the lowest sample rate at which we can playback audio with the IPython Audio tool). This equates to a frequency increase of about 10^10.

In [None]:
# Set our sample rate at 3000 Hz


# Generate a series of time samples at a rate of 'data_sample_rate' Hz over 1 second


In [None]:
# Use a linear interpolation to fill NaN values, the audio player WILL NOT WORK
# with data that contains any NaN values


In [None]:
# Configure the plot
plt.figure(figsize=[8, 5])
plt.xlabel('Time (seconds)')
plt.ylabel('Amplitude')
plt.title('CO2 concentration compressed to a sample rate of 3000 Hz')

# Plot the time compressed waveform of CO2 concentration data


In [None]:
# The sample rate of our data is approximately 0.0000004 Hz (12 samples / year),
# we speed it up about 10^10 times (3000 Hz)


**Question:** Why is our audification so short?

### Audification of normalized CO<sup>2</sup> concentration data

Even when resampling our dataset at a higher frequency, we still don't really get a useful audio representation of our waveform. We need to modify our dataset even further to establish a central value about which we can measure displacement. To do this we will normalize our dataset by removing the longterm trend from the data (i.e., subtracting the seasonally adjusted CO<sup>2</sup> concentration values from the true CO<sup>2</sup> concentration values)

In [None]:
# Remove the longterm trend of increasing CO2 concentration


In [None]:
# Use a linear interpolation to fill NaN values–the audio player WILL NOT WORK
# with data that contains any NaN values


# Print the data


In [None]:
# Plot CO2 concentration data
plt.figure(figsize=[15, 10])
plt.subplot(2, 2, 1)
plt.xlabel('Time (months)')
plt.ylabel('CO2 ppm')
plt.title('CO2 concentration')
plt.plot(co2["CO2ppm"])

# Plot seasonally adjusted CO2 concentration data
plt.subplot(2, 2, 2)
plt.xlabel('Time (months)')
plt.ylabel('CO2 ppm')
plt.title('Seasonally adjusted CO2 concentration')
plt.plot(co2["season_adj"])

# Plot normalized CO2 concentration data over 3000 Hz sample frequency
plt.subplot(2, 2, 3)
plt.xlabel('Time (seconds)')
plt.ylabel('Amplitude')
plt.title('Normalized CO2 concentration artificially sampled at 3000 Hz')
plt.plot(data_time_samples[:len(co2_fit_int)], co2_fit_int)

In [None]:
# The sample rate of our data is approximately 0.0000004 Hz (12 samples / year),
# we speed it up about 10^10 times (3000 Hz)


**Question:** Does this audification provide you with any insight into the data or help you pick out any discernable patterns?

## Parameter mapping

Parameter mapping is the process of mapping data to properties of sound. As previously mentioned, these properties can include pitch (frequency), amplitude (loudness), tempo (beat speed), and timbre (quality). The spatial position of sound (for example, panning audio to the left or right channel in a stereo mix) can also be used as a mapping property.
 
As opposed to audification, mapping data to sound properties provides more options and opportunities to represent the features of a dataset.

In this demonstration we will only explore using variations in pitch, tempo, and timbre to represent monthly atmospheric CO<sup>2</sup> values.

### Subsample CO<sup>2</sup> concentration data for sonification

We will subsample our data to create a smaller dataset that will be appropriate for generating shorter demo sonifications. Additionally, due to some constraints with audio playback in Google Colab we have to create audio files of our sonifications and then load them into our notebook for playback. In this case, smaller files are preferable for shorter load times.

In [None]:
# Create a subsample of the CO2 data from the year 2000 to present (co2_modern)


# We are only going to be working with the measured CO2 concentration values
# moving forward, so we will store it in a variable (co2_ppm)


# Print the data


In [None]:
# Plot the subsampled CO2 concentration data


In [None]:
# Check out some basic statistics of the subsampled data


**Question:** Given the values of the descriptive statistics, could we directly use the CO<sup>2</sup> concentration data as frequencies (in Hz) in a sonification?

### Functions for generating audio streams and creating audio files

Three functions are provided for processing numerical data to create a sonification in which data values are represented as musical pitch (frequency). The three functions are `create_audio_stream`, `create_audio_file`, and `data_to_audio`. In this workshop we will mainly use the `data_to_audio`. Each function is briefly described here and documented in the code cell below.

The function `create_audio_stream` creates a music21 audio stream that is playable in local Jupyter notebook using the call `[stream_name].show('midi')`. This functionality is not available in Google Colab and other notebook editors such as VS Code and Jupyter Lab, so we must also create an audio file for playback using the function `create_audio_file` to listen to sonifications in these environments.

#### The `data_to_audio` function

The two functions above, `create_audio_stream` and `create_audio_file` have been combined into the function `data_to_audio` to streamline the process of creating playable audio in environments outside of Jupyter Notebook. However, each of the previous two functions can still be called individually.

`data_to_audio` has four parameters:

- `notes` - a List of music21 Note objects

- `file_name` - The name of the file to write. Name must include audio file extension type. Valid types include .flac and .wav

- `bpm` - beats per minute, or the speed of the sonification (default value is 120, each note is treated as a 16th note and four notes constitute one beat)

- `instrument_name` - the name of a synthesized instrument to play the notes (default value is "Piano"). A list of instrument names is available in the [music21 documentation](http://web.mit.edu/music21/doc/moduleReference/moduleInstrument.html#module-music21.instrument). (These synthesized instruments are provided through the soundfont file.)

These three arguments provide the data mappings to frequency (the `notes` argument), tempo (the `bpm` argument), and timbre (the `instrument_name` argument).

**Run the cell below to define these three functions**

In [None]:
def create_audio_stream(notes, bpm=120, instrument_name='Piano'):
    """
    Creates and returns a music21 Stream object of notes.

    Parameters
    ----------
    notes : Sequence[music21.note.Note]
        A sequence of music21 Note objects.
    bpm : int, optional
        The beats per minute of the stream, the tempo (speed) of the sonification. The default value is 120 bpm, each note is treated as a 16th note and four notes constitute one beat.
    instrument_name : str, optional
        The name of a synthesized instrument to play the notes. If no value is provided, or the name is not a valid instrument, the default "piano" instrument is used. For a list of valid instrument names see the [music21 Instrument module documentation](https://web.mit.edu/music21/doc/moduleReference/moduleInstrument.html).

    Returns
    -------
    music21.stream.base.Stream
        A music21 Stream object

    """
    # Create a new music21 stream object to add notes to
    new_stream = stream.Stream()

    # Set the tempo string of the stream
    new_stream.append(tempo.MetronomeMark(number=bpm))

    # Set the instrument to play the stream notes
    new_stream.append(getattr(instrument, instrument_name)())

    # Iterate over the notes provided in the series
    for this_note in notes:
        # Append the note to the new stream
        new_stream.append(this_note)
        # Set the duration of the note as a sixteenth note (0.25 of quarter note)
        this_note.duration.quarterLength = 0.25
    # Return a music21 stream object
    return new_stream

def create_audio_file(stream, file_name, keep_midi=False):
    """
    Create an audio file using the given music21 stream and the given file_name.
    
    It writes a MIDI file using music21 and then uses the newly written MIDI file to create the specified audio file using FluidSynth midi to audio conversion.

    Parameters
    ----------
    stream : music21.stream.base.Stream
        A music 21 Stream object containing notes.
    file_name : str
        The name of the file to write. Name must include audio file extension type. Valid types include .flac and .wav.
    keep_midi : bool, optional
        Keep the intermediate MIDI file if True. Default is False.

    Returns
    -------
    None

    """
    # If the MIDI file is not needed, set the MIDI file name to "None" to create
    # a temporary file with the music21 "stream.write()"" method. If keeping
    # MIDI file, create string for MIDI file path.
    if not keep_midi:
        midi_file_name = None
    else:
        # File name with MIDI extension
        midi_file_name = file_name.split('.')[0] + '.mid'
    
    # Use the music21 stream object write function to create a MIDI file and
    # store the path to the new file
    midi_file_path = stream.write('midi', midi_file_name)

    # Use the FluidSynth module to call the sound font and convert the newly 
    # created MIDI file to the specified file_name using the midi_to_audio function
    FluidSynth('FluidR3Mono_GM.sf3').midi_to_audio(
        midi_file_path, file_name
    )

    # Delete the MIDI file if "keep_midi" not set to True
    # if not keep_midi:
    #     os.remove(midi_file_name)

def data_to_audio(notes, file_name, bpm=120, instrument_name='Piano'):
    '''
    Processes a sequence of music21 Note objects to create a music21 stream and generate an audio file with optional musical style parameters.

    Combines the create_audio_stream and create_audio_file into one functions.

    Parameters
    ----------
    notes : Sequence[music21.note.Note]
        A sequence of music21 Note objects
    file_name : str
        The name of the file to write. Name must include audio file extension type. Valid types include .flac and .wav
    bpm : int, optional
        The beats per minute of the stream, the tempo (speed) of the sonification. The default value is 120 bpm, each note is treated as a 16th note and four notes constitute one beat.
    instrument_name : str, optional
        The name of a synthesized instrument to play the notes. If no value is provided, or the name is not a valid instrument, the default "piano" instrument is used.
    
    Returns
    -------
    None

    '''

    # Generate the music21 Stream object from the list of notes
    notes_stream = create_audio_stream(
        notes, bpm=bpm, instrument_name=instrument_name
    )
    # Generate an audio file from the music21 Stream object
    create_audio_file(notes_stream, file_name=file_name)

In [None]:
# Uncomment the line below to print out a list of instruments contained in the
# provided soundfont

# !echo "inst 1" | fluidsynth ./FluidR3Mono_GM.sf3

### Parameter mapping using CO<sup>2</sup> concentration data values as frequency (pitch)

In this demonstration we will use a function that directly maps a numeric value to a frequency and the corresponding scientific pitch notation value.

For example, a data value of 440 would be interpreted as the frequency 440 Hz. This frequency would then be converted to scientific pitch notation as the pitch A4 (A in octave 4).

Moving forward we will be thinking of frequency in terms of Hz and scientific pitch notation. It might be helpful to reference this [table of piano key pitches and frequencies](https://en.wikipedia.org/wiki/Piano_key_frequencies#List).

We will use the defined function `value_to_pitch` to translate data values as frequency to pitch. This function takes one argument:

- `value` - a data value interpreted as a frequency

In [None]:
def value_to_pitch(value):
    """
    Generate a music21 Note object with a set pitch based on the provided value interpreted as a frequency or music21 Rest object of silence.
    
    It returns the pitch of the provided frequency value if the value is greater than zero. If the value is less than zero it returns a rest (i.e., silence).

    Parameters
    ----------
    value : int | float
        A numerical data value to be directly mapped to frequency and used to generate a pitched musical note. 

    Returns
    -------
    music21.note.Note | music21.note.Rest
        A music21 Note object with a set pitch based on the provided value interpreted as a frequency. Values less than zero will return a music21 Rest object.

    """
    # Test if the frequency value is greater than zero
    if (value > 0):
        # Create a music21 Note object
        converted_note = note.Note()
        # Set the pitch of the Note object based on the supplied frequency value
        converted_note.pitch.frequency = value
        # Return the Note object with the assigned pitch
        return converted_note
    # Retrun a Rest value (silence)
    return note.Rest()

In [None]:
# Test the value_to_pitch function


The `value_to_pitch` function has returned a [music21 Note object](http://web.mit.edu/music21/doc/moduleReference/moduleNote.html#module-music21.note). A Note object has many properties. For this workshop, the most important properties are the note name and pitch frequency. We can get a full descriptive name by calling the property `fullName` on a Note object. We can get the frequency of a note by accessing the property `pitch.frequency` on a Note object.

In [None]:
# Print out full note name information by calling "fullName"


In [None]:
# Print out the frequency information by calling "pitch.frequency"


In [None]:
# Map CO2 concentration values directly to pitch


#Print the data


The following cell contains the code for generating a music21 Stream object and playing the audio in the notebook without having to generate any files. This process uses the music21 Stream functionality to playback MIDI audio.

I have only found this to work in a Jupyter Notebook environment. As such, we will skip this cell, but note this ability if you are using a Jupyter Notebook environment.

In [None]:
# Create an audio stream of the raw_pitch values using the create_audio_stream
# function. Pass in the series of notes (raw_pitch), a tempo in bpm, and an
# instrument name from the list located in the right-hand column of this
# music21 documentation page: https://web.mit.edu/music21/doc/moduleReference/moduleInstrument.html
raw_pitch_stream = create_audio_stream(raw_pitch, 180)

# If running in Jupyter Notebook a local machine, uncomment the line below to
# play the MIDI data in the Jupyter notebook without having to create a file

# raw_pitch_stream.show('midi')

In the next cell we generate an audio file from the data. We will create a FLAC audio file, but it is also possible to create other standard audio file formats such as mp3, mp4, and wave files by changing the file extension.

When running this notebook in Colab I found that FLAC audio files tended to be smaller and loaded faster than other audio file types when using the Audio tool.

In [None]:
# Create an audio file of the pitch data


# Load the newly created audio file for playback


In [None]:
# Plot the data for visual reference


#### Activity: Sonify seasonally adjusted CO<sup>2</sup> concentrations using `value_to_pitch`

Create a sonification of the seasonally adjusted CO<sup>2</sup> concentration data from the `co2_modern` dataset using the `value_to_pitch` method.

In [None]:
# Store the seasonally adjusted concentration data from "co2_modern" in a variable


# Use the apply() method to call value_to_pitch on each row of the
# seasonally adjusted concentration data


# Create an audio file of the pitch data


# Load the audio file for playback


**Question:** Are there any shortcomings with the method we used to create this sonification?

### Sonification using values mapped to linear pitch range

In this demonstration we will use a function that maps a data value from the dataset range (dataMin - dataMax) to the pitch range of a standard 88 key piano (~27 Hz – ~4186 Hz).

    (dataMin)         (value)         (dataMax)
        |----------------o----------------| Data scale
                          \
                           \
                            \       
     (27 Hz)           (mapped value)           (4186 Hz)
        |---------------------o---------------------| Frequency range scale (88 key piano)

We will use the defined function `map_value_to_pitch_range` to map data values from the data domain to a pitch range. This function takes two arguments:
- `value` - the data value to be mapped to the new pitch range
- `data_min_max` - a list containing the min and max of the dataset ([min, max])

In [None]:
def map_value_to_pitch_range(value, data_min_max):
    """
    Map a data value from the dataset domain (data min - data max) to the frequency range of a standard 88 key piano (~27 Hz - ~4186 Hz).
    
    It returns a music21 Note object with the pitch of the data value mapped to the frequency range.

    Parameters
    ----------
    value : int | float
        A numerical data value to be mapped to the frequency range based on the minimum and maximum values of the whole dataset.
    data_min_max : List[int | float]
        A list containing the minimum and maximum of the dataset in the form [min, max].

    Returns
    -------
    music21.note.Note
        A music21 Note object with a specified pitch based on the mapping of the provided value to the new pitch range based on the min and max of the dataset.

    """
    # Create the music21 Note object
    converted_note = note.Note()

    # Set the data range using supplied min and max
    data_range = data_min_max[1] - data_min_max[0]

    # Frequency of the lowest note on a standard 88 key piano
    MIN_HZ = 27.50
    # Frequency of the highest note on a standard 88 key piano
    MAX_HZ = 4186.01

    # Set the pitch of each Note object mapping from the data range to the
    # frequency range of a standard 88 key piano
    converted_note.pitch.frequency = (
        ((value - data_min_max[0]) * (MAX_HZ - MIN_HZ)) / data_range
    ) + MIN_HZ
    return converted_note

In [None]:
# Test the map_value_to_pitch_range function


In [None]:
# Calculate the minimum and maximum values of the CO2 concentration data
co2_range = co2_ppm.agg(['min', 'max'])

# Convert values to pitch range
pitch_range = co2_ppm.apply(
  map_value_to_pitch_range, data_min_max=co2_range.values
)

In [None]:
# Create an audio file from the pitch_range_stream
data_to_audio(pitch_range, 'pitch_range.flac', instrument_name='Glockenspiel')

# Load the newly created audio file for playback
Audio('pitch_range.flac')

In [None]:
# Plot the actual concentration data and the mapped frequency data for visual reference
plt.figure(figsize=[12, 5])
plt.subplot(1, 2, 1)
plt.title('CO2 concentration over time')
plt.plot(co2_ppm)

plt.subplot(1, 2, 2)
plt.title('CO2 concentration mapped to\n frequency range over time (seconds)')
# Create the time scale for the audio file using the bpm (the bpm
# value passed into the data_to_audio function for this sonification)
bpm = 120
time_scale = np.linspace(0, (len(pitch_range) / 4) / bpm * 60, len(pitch_range))
plt.plot(time_scale, pitch_range.apply(lambda x: x.pitch.frequency))

#### Activity: Sonify seasonally adjusted CO<sup>2</sup> concentrations using `map_value_to_pitch_range`

Create a sonification of the seasonally adjusted CO<sup>2</sup> concentration data using the `map_value_to_pitch_range` method.

In [None]:
# Calculate the min and max values of the seasonally adjusted C02 concentration
# data


# Use the apply() method to call map_value_to_pitch_range on each row of the
# seasonally adjusted concentration data


# Create the audio file from the audio stream


# Load the audio file for playback


### Sonification using values mapped to an exponential pitch range

The frequency ranges between pitches are not linear. For example, the frequency of note A4 is 440 Hz. To get to the next octave of this note we must double the frequency of A4, giving A5 as 880 Hz (i.e., A6 = 1760, A3 = 220, and so on).

      A3        A4              A5                                A6                                                              A7
    (220hz)  (440 Hz)         (880 Hz)                         (1760 Hz)                                                       (3520 Hz)
       |--------|----------------|--------------------------------|----------------------------------------------------------------|

In order to map our values to an appropriate pitch scale, we can apply a log transform to pitch range. This makes it easier to discern pitches at lower frequencies and produces a more natural and pleasing sound.

In this demonstration we will use a function that maps a data value from the dataset range (dataMin - dataMax) to the frequency range of a standard 88 key piano (\~27 Hz – \~4186 Hz) over an exponential scale.

We will use the defined function `map_value_to_pitch_range_exp_scale` to map data values from the data domain to a log scale pitch range. This function takes two arguments:
- `data_value` - the data value to be mapped to the new pitch range
- `data_min_max` - a list containing the min and max of the dataset ([min, max])

In [None]:

def map_value_to_pitch_range_exp_scale(value, data_min_max):
    """
    Map a data value from the dataset domain (data min - data
    max) to the log2 frequency range of a standard 88 key piano (~27 Hz – ~4186
    Hz). It returns a music21 Note object with the pitch of the mapped frequency
    value.

    Parameters
    ----------
    value : int | float
        A numerical data value to be mapped to the frequency range based on the minimum and maximum values of the whole dataset.
    data_min_max : List[int | float]
        A list containing the minimum and maximum of the dataset in the form [min, max].

    Returns
    -------
    music21.note.Note
        A music21 Note object with a specified pitch based on the mapping of the provided value to the new pitch range generated by the min and max of the dataset.

    """
    # Create the music21 Note object
    converted_note = note.Note()

    # Set the data range using supplied min and max
    data_range = data_min_max[1] - data_min_max[0]

    # Frequency of the lowest note on a standard 88 key piano
    MIN_HZ = 27.50
    # Frequency of the highest note on a standard 88 key piano
    MAX_HZ = 4186.01

    # Set the pitch of each Note object, mapping from the data range to a
    # log base 2 frequency range of a standard 88 key piano
    #
    # https://stackoverflow.com/questions/19472747/convert-linear-scale-to-logarithmic
    #           x - x0
    # log(y) = ------- * (log(y1) - log(y0)) + log(y0)
    #          x1 - x0
    converted_note.pitch.frequency = np.exp2(
        (value - data_min_max[0]) / data_range *
        (np.log2(MAX_HZ) - np.log2(MIN_HZ)) +
        np.log2(MIN_HZ)
    )
    return converted_note

In [None]:
# Test the map_value_to_pitch_range_exp_scale function


In [None]:
# Convert values to pitch range


In [None]:
# Create an audio file of the pitch data


# Load the newly created audio file for playback


In [None]:
# Plot the actual concentration data and the mapped frequency data for visual reference
plt.figure(figsize=[12, 5])
plt.subplot(1, 2, 1)
plt.title('CO2 concentration over time')
plt.plot(co2_ppm)

plt.subplot(1, 2, 2)
plt.title('CO2 concentration mapped to frequency over time')
# Calculate the time scale for the audio we generated using the bpm (the bpm
# value passed into the data_to_audio function for this sonification)
bpm = 120
time_scale = np.linspace(0, len(pitch_range) / 4 / bpm * 60, len(pitch_range))

plt.plot(time_scale, pitch_range_exp.apply(lambda x: x.pitch.frequency))

#### Activity: Sonify seasonally adjusted CO<sup>2</sup> concentrations using `map_value_to_pitch_range_exp_scale`

Create a sonification of the seasonally adjusted CO<sup>2</sup> concentration data using the `map_value_to_pitch_range_exp_scale` method.

In [None]:
# Use the apply() method to call map_value_to_pitch_range on each row of the
# seasonally adjusted concentration data


# Create the audio file from the audio stream


# Load the audio file for playback


### Sonification using values mapped to a musical scale

In this demonstration we will use a function that maps a data value from the dataset range (dataMin - dataMax) to a specified [musical scale](https://en.wikipedia.org/wiki/Scale_(music)) (i.e., an ordered group of pitches) with the given provided starting note (i.e., the tonic) and an octave range (i.e., the number of notes over which to map your data).

Using a musical scale provides us with some options to make our sonification more "musical". Previously we were mapping values to frequencies and pitches without consideration for ordering of notes. Predefined musical scales can evoke culturally influenced emotions that can give our sonification an aesthetic feeling. For example, major scales are often interpreted as happy, while minor scales are interpreted as sad.

We will use the defined function `map_value_to_scale` to map data values from the data domain to a musical scale. This function takes five arguments:
- `value` - the data value to be mapped to the new pitch range
- `data_min_max` - a list containing the min and max of the dataset ([min, max])
- `tonic` - a string containing a valid scientific pitch notation for the first note of the scale
- `scale_name` - a string containing a valid scale subclass name from the music21 scale class (default value is "MajorScale"). A list of scale names is available in the [music21 documentation](http://web.mit.edu/music21/doc/moduleReference/moduleScale.html#module-music21.scale)
- `octave_range` - a list containing the min and max octave numbers over which to map the pitches (default value is [3, 5])

In [None]:
def map_value_to_scale(value, data_min_max, tonic='c',
                       scale_name='MajorScale', octave_range=[3, 5]):
    """
    Map a data value from the dataset domain (data min - data max) to a musical scale with a provided tonic note over a provided octave range.
    
    It returns a music21 Note object with the pitch of the mapped data value.

    Parameters
    ----------
    value : int | float
        A numerical data value to be mapped to the frequency range based on the minimum and maximum values of the whole dataset.
    data_min_max : List[int | float]
        A list containing the minimum and maximum of the dataset in the form [min, max].
    tonic : str, optional
        A string containing a valid scientific pitch notation for the first note of the scale.
    scale_name : str, optional
        A string containing a valid scale subclass name from the music21 scale class. A list of scale names is available in the [music21 documentation](http://web.mit.edu/music21/doc/moduleReference/moduleScale.html#module-music21.scale)
    octave_range : List[int]
        A list of two integers containing the lower and upper bounds of the octaves over which to map the pitches

    Returns
    -------
    music21.note.Note
        A music21 Note object with a specified pitch based on the mapping of the provided value to the new pitch range generated by the min and max of the dataset.
    """
    # Create the music21 Note object
    converted_note = note.Note()

    # Set the scale using the supplied tonic value
    this_scale = getattr(scale, scale_name)(tonic)
    # Get the pitches of the scale based on the octave range
    scale_pitches = this_scale.getPitches(
        tonic + str(octave_range[0]),
        tonic + str(octave_range[1]))
    
    # Get the data range using supplied min and max 
    dataRange = data_min_max[1] - data_min_max[0]

    # Get the position of the note on the scale given the range of the scale
    note_position = round(
        ((value - data_min_max[0]) * (len(scale_pitches) - 1)) / dataRange)
    # Set the pitch of the note
    converted_note.pitch = scale_pitches[note_position]
    return converted_note

The C major scale has a base note (tonic) of C and consists of the notes: C, D, E, F, G, A, B, C.

In [None]:
# Test the map_value_to_scale function


In [None]:
# Convert values to musical scale


In [None]:
# Create an audio file of the pitch data


# Load the newly created audio file for playback


In [None]:
# Plot the actual concentration data and the mapped frequency data for visual
# reference
plt.figure(figsize=[12, 5])
plt.subplot(1, 2, 1)
plt.title('CO2 concentration over time')
plt.plot(co2_ppm)

plt.subplot(1, 2, 2)
plt.title('CO2 concentration mapped to frequency over time')
# Create the time scale for the audio file using the bpm (the bpm
# value passed into the data_to_audio function for this sonification)
bpm = 120
time_scale = np.linspace(0, (len(pitch_range) / 4) / bpm * 60, len(pitch_range))

plt.plot(time_scale, musical_notes.apply(lambda x: x.pitch.frequency))

#### Activity: Sonify seasonally adjusted CO<sup>2</sup> concentrations using `map_value_to_scale`

Create a sonification of the seasonally adjusted CO<sup>2</sup> concentration data using the `map_value_to_scale` method.

In [None]:
# Use the apply() method to call map_value_to_scale on each row of the
# seasonally adjusted concentration data


# Create the audio file from the audio stream


# Load the audio file for playback


### Sonification playground

Use the cells below to generate your own sonifications of the CO<sup>2</sup> concentration data using the provided functions.

You can the call the following functions on a Pandas Series using the `apply()` function:

- `value_to_pitch(value_as_frequency)`

- `map_value_to_pitch_range(value, data_min_max)`

- `map_value_to_pitch_range_exp_scale(value, data_min_max)`

- `map_value_to_scale(value, data_min_max, tonic="c", scale_name="MajorScale",
  octave_range=[3, 5])`

For example, to create a sonification of the modern CO2 data you could do the following:

1. Map the data to a log scale pitch range to return a series of music21 Note objects
      ```python
      pitch_range = co2_modern['season_adj'].apply(
            map_value_to_pitch_range_exp_scale, data_min_max=co2_range.values
      )
      
      ```

1. Create an audio file from the data at 100 bpm using a Guitar sound and play it back
      ```python
      data_to_audio(pitch_range, 'demo_notes.flac', 100, 'Guitar')
      Audio('demo_notes.flac')
      
      ```

In [None]:
# Create you own sonifications here


## Other resources

### Filled version of this notebook

[Data Sonification with Python filled notebook](https://colab.research.google.com/github/ncsu-libraries-data-vis/data-sonification-with-python/blob/main/filled-data-sonification-with-python.ipynb) - a version of this notebook with all code filled in. Use this version of the notebook as a reference to check work.

### Sonification development tools and applications

- [Sonic Pi](https://sonic-pi.net/) - a code-based music creation and performance tool based on the Ruby programming language

- [p5.js](https://p5js.org/) - a JavaScript library for creative coding that includes a library for interfacing with web-based audio.

- [Web audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API) - base API for controling audio on the web

- [SAS Graphics Accelerator](https://support.sas.com/software/products/graphics-accelerator/samples/index.html) - a free Chrome extension that allows you to sonify data captured from web tables and data from web-based SAS visualization

- [TwoTone](https://app.twotone.io/) - An interactive web application for easily creating sonifications with uploaded data

### Sonification theory and practice

- [The Sonification Handbook](https://sonification.de/handbook/) - an open access textbook published in 2011 providing an introduction to sonification and sonification research and techniques

- [An Overview of Auditory Displays and Sonification](https://sonification.de/) - the website of Thomas Hermann, one of the editors of The Sonification Handbook, providing an overview of sonification

- [DataSonificationBlog](https://www.saralenzi.com/datasonificationblog) - blog of Sara Lenzi, sonification researcher

- [Intentionality and design in the data sonification of social issues](https://www.researchgate.net/publication/343692618_Intentionality_and_design_in_the_data_sonification_of_social_issues) - a journal article analyzing the role of intentionality when designing sonifications that communicate social issues to public audiences

- [Thirteen Years of Reflection on Auditory Graphing: Promises, Pitfalls, and Potential New Directions](https://digitalcommons.unl.edu/cgi/viewcontent.cgi?referer=http://playitbyr.org/&httpsredir=1&article=1429&context=psychfacpub) - a 2005 journal article covering sonification methods and discussing the success of these methods

- [Loud Numbers](https://www.loudnumbers.net/) - a data sonification podcast and mailing list



## Credits

This workshop was created by Walt Gurley at the NC State University Libraries.