# Data Exploration

The below delves into the exploration and analysis of MIDI data sourced from the POP909 dataset, a collection of music tracks. The script is structured into distinct sections, each fulfilling a crucial role in understanding and processing the musical information contained within the MIDI files. Initially, the code demonstrates how to play MIDI files using the mido library, enabling auditory examination of the music. Following this, it employs the pretty_midi library to analyze MIDI data, extracting essential details such as tempo, time signature changes, and characteristics of pitched instruments. Visualizations, including histograms and scatter plots, are generated to visualize note distributions and assess pitch consistency over time. Furthermore, the code performs key signature analysis across multiple MIDI files, revealing common patterns in time signatures and expected music keys within the dataset. **Overall, this code provides a comprehensive framework for exploring MIDI music data, offering valuable insights.**

In [11]:
# Install necessary library
!pip install python-rtmidi # Install python-rtmidi library for MIDI I/O operations
!pip install music21 # Install music21 library for computer-aided musicology tasks

import mido # Import the mido library for working with MIDI files and messages
from mido import MidiFile # Import the MidiFile class from mido for parsing MIDI files
import pretty_midi # Import the pretty_midi library for creating and modifying MIDI files

# Import specific modules from music21 library for music analysis and manipulation
from music21 import converter  # For converting between different music file formats
from music21 import key       # For analyzing key signatures in music
from music21 import meter     # For analyzing meter (time signature) in music

import os # Import the os module for interacting with the operating system (e.g., file operations)
import statistics # Import the statistics module for statistical computations (e.g., mean, median)
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module for creating plots and visualizations

Collecting music21
  Obtaining dependency information for music21 from https://files.pythonhosted.org/packages/a1/1b/ef7bdf01df19cc7ac9294531a9c991c6d382bde6bc15c9d106b9a5e547ef/music21-9.1.0-py3-none-any.whl.metadata
  Downloading music21-9.1.0-py3-none-any.whl.metadata (4.8 kB)
Collecting jsonpickle (from music21)
  Obtaining dependency information for jsonpickle from https://files.pythonhosted.org/packages/d3/25/6e0a450430b7aa194b0f515f64820fc619314faa289458b7dfca4a026114/jsonpickle-3.0.2-py3-none-any.whl.metadata
  Downloading jsonpickle-3.0.2-py3-none-any.whl.metadata (7.5 kB)
Collecting webcolors>=1.5 (from music21)
  Obtaining dependency information for webcolors>=1.5 from https://files.pythonhosted.org/packages/d5/e1/3e9013159b4cbb71df9bd7611cbf90dc2c621c8aeeb677fc41dad72f2261/webcolors-1.13-py3-none-any.whl.metadata
  Downloading webcolors-1.13-py3-none-any.whl.metadata (2.6 kB)
Downloading music21-9.1.0-py3-none-any.whl (22.8 MB)
   ---------------------------------------- 0.

## Play One of the Songs

In [5]:
# just comment out while reloading code because it plays everytime
midi_file = MidiFile('POP909/909/909.mid')

# Open output port with default backend
with mido.open_output() as port:
    for message in midi_file.play():
        port.send(message)

^C


KeyboardInterrupt: 

## Analyzing & Processing the MIDI Data

Use one midi file as a tester to see what information is in the files

Lets see if our midi has any information on Tempo, Pitch, Duration, and/or Velocity

In [6]:
# Load MIDI file
midi_data = pretty_midi.PrettyMIDI('POP909/909/909.mid')

# Check if tempo information is available
if midi_data.get_tempo_changes():
    print("Tempo information exists.")
else:
    print("Tempo information does not exist.")

# Check if time signature changes are available
if midi_data.time_signature_changes:
    print("Time signature changes exist.")
else:
    print("Time signature changes do not exist.")

# Check if there are any pitched instruments (containing pitch, duration, velocity)
pitched_instruments = [i for i in midi_data.instruments if i.is_drum is False and len(i.notes) > 0]
if pitched_instruments:
    print("Pitched instruments exist.")
    
    # Check if there's pitch, duration, and velocity in the first pitched instrument
    first_pitched_instrument = pitched_instruments[0]
    if any(note.pitch is not None and note.start is not None and note.end is not None and note.velocity is not None
           for note in first_pitched_instrument.notes):
        print("Pitch, duration, and velocity information exist in the first pitched instrument.")
    else:
        print("Pitch, duration, or velocity information is missing in the first pitched instrument.")
else:
    print("No pitched instruments found.")


Tempo information exists.
Time signature changes exist.
Pitched instruments exist.
Pitch, duration, and velocity information exist in the first pitched instrument.


In [7]:
# Load MIDI file
midi_data = pretty_midi.PrettyMIDI('POP909/909/909.mid')

# Print time signature changes
print("Time Signatures:")
for ts in midi_data.time_signature_changes:
    print(ts)

# Print tempo changes
print("\nTempo Changes:")
for tempo_change in midi_data.get_tempo_changes()[1]:
    print(tempo_change)

# Print instrument information
print("\nInstruments:")
for instrument in midi_data.instruments:
    print(f'Name: {instrument.name}, Program: {instrument.program}, Is drum: {instrument.is_drum}')

    # Print pitch, duration, and velocity for each note in the instrument
    if not instrument.is_drum and len(instrument.notes) > 0:
        print(f"\nNotes for {instrument.name}:")
        for note in instrument.notes:
            print(f'Pitch: {note.pitch}, Duration: {note.end - note.start:.4f}, Velocity: {note.velocity}')


Time Signatures:
1/4 at 0.00 seconds

Tempo Changes:
104.00001386666852
103.75022479215373
104.00001386666852
103.75022479215373
104.00001386666852
103.75022479215373
104.00001386666852

Instruments:
Name: MELODY, Program: 0, Is drum: False

Notes for MELODY:
Pitch: 76, Duration: 0.0829, Velocity: 85
Pitch: 77, Duration: 0.0913, Velocity: 78
Pitch: 79, Duration: 0.0913, Velocity: 74
Pitch: 77, Duration: 0.0889, Velocity: 78
Pitch: 76, Duration: 0.0962, Velocity: 81
Pitch: 77, Duration: 0.0865, Velocity: 80
Pitch: 79, Duration: 0.0913, Velocity: 78
Pitch: 77, Duration: 0.0962, Velocity: 80
Pitch: 76, Duration: 0.1118, Velocity: 81
Pitch: 77, Duration: 0.1118, Velocity: 80
Pitch: 79, Duration: 0.1214, Velocity: 76
Pitch: 79, Duration: 0.0745, Velocity: 76
Pitch: 79, Duration: 0.0649, Velocity: 71
Pitch: 79, Duration: 0.1418, Velocity: 74
Pitch: 88, Duration: 0.1851, Velocity: 80
Pitch: 84, Duration: 0.5769, Velocity: 75
Pitch: 74, Duration: 0.0793, Velocity: 81
Pitch: 76, Duration: 0.173

In [8]:
# Print some notes for the first instrument, if it has any
if midi_data.instruments:
    print("\nNotes for MELODY:")
    for note in midi_data.instruments[0].notes[:5]:  # Just the first 10 notes
        print(note)
        print(f"Duration: {note.end - note.start}")  # Adding duration
        
# Print some notes for the first instrument, if it has any
if midi_data.instruments:
    print("\nNotes for BRIDGE:")
    for note in midi_data.instruments[1].notes[:5]:  # Just the first 10 notes
        print(note)
        print(f"Duration: {note.end - note.start}")  # Adding duration
        
# Print some notes for the first instrument, if it has any
if midi_data.instruments:
    print("\nNotes for PIANO:")
    for note in midi_data.instruments[2].notes[:5]:  # Just the first 10 notes
        print(note)
        print(f"Duration: {note.end - note.start}")  # Adding duration


Notes for MELODY:
Note(start=12.655046, end=12.737979, pitch=76, velocity=85)
Duration: 0.08293268124999997
Note(start=12.799277, end=12.890623, pitch=77, velocity=78)
Duration: 0.09134614166666566
Note(start=12.943508, end=13.034854, pitch=79, velocity=74)
Duration: 0.09134614166666566
Note(start=13.231969, end=13.320912, pitch=77, velocity=78)
Duration: 0.08894229583333413
Note(start=13.520431, end=13.616585, pitch=76, velocity=81)
Duration: 0.09615383333333405

Notes for BRIDGE:
Note(start=1.116586, end=1.201923, pitch=76, velocity=80)
Duration: 0.0853365270833335
Note(start=1.260817, end=1.383413, pitch=77, velocity=74)
Duration: 0.1225961375000002
Note(start=1.405048, end=1.560096, pitch=79, velocity=81)
Duration: 0.15504805624999984
Note(start=1.693509, end=1.825721, pitch=88, velocity=79)
Duration: 0.1322115208333332
Note(start=1.981971, end=2.098557, pitch=84, velocity=83)
Duration: 0.11658652291666671

Notes for PIANO:
Note(start=1.405048, end=1.724759, pitch=48, velocity=85)

In [9]:
midi_data.plot('histogram', 'pitchClass', 'count');

AttributeError: 'PrettyMIDI' object has no attribute 'plot'

The scatter plot below shows that the use of notes look consistent through time, so there are no key changes in this piece.

In [None]:
midi_data.plot('scatter', 'offset', 'pitchClass');

In [None]:
timeSignature = midi_data.getTimeSignatures()[0]
music_analysis = midi_data.analyze('key')
print("Music time signature: {0}/{1}".format(timeSignature.beatCount, timeSignature.denominator))
print("Expected music key: {0}".format(music_analysis))
print("Music key confidence: {0}".format(music_analysis.correlationCoefficient))
print("Other music key alternatives:")
for analysis in music_analysis.alternateInterpretations:
    if (analysis.correlationCoefficient > 0.5):
        print(analysis)

In [None]:
def analyze_midi_file(file_path):
    try:
        # Load MIDI file
        midi_data = converter.parse(file_path)

        # Extract key and time signature
        key_signature = midi_data.analyze('key')
        time_signature = midi_data.getTimeSignatures()[0]

        # Print results
        print(f"Music time signature: {time_signature.numerator}/{time_signature.denominator}")
        print(f"Expected music key: {key_signature}")
        print(f"Music key confidence: {key_signature.correlationCoefficient}")

#         # Print other music key alternatives
#         print("Other music key alternatives:")
#         for analysis in key_signature.alternateInterpretations:
#             if analysis.correlationCoefficient > 0.5:
#                 print(analysis)

    except Exception as e:
        print(f"Error analyzing {file_path}: {e}")

def analyze_midi_folder(root_folder):
    for root, dirs, files in os.walk(root_folder):
        for file in files:
            if file.lower().endswith(".mid") or file.lower().endswith(".midi"):
                file_path = os.path.join(root, file)
                print(f"\nAnalyzing MIDI file: {file_path}")
                analyze_midi_file(file_path)

# Replace 'your_root_folder' with the path to the folder containing subfolders with MIDI files
analyze_midi_folder('POP909')


The below block takes a long time to run. 

In [None]:
#MODE
def analyze_midi_file(file_path):
    try:
        # Load MIDI file
        midi_data = converter.parse(file_path)

        # Extract key and time signature
        key_signature = midi_data.analyze('key')
        time_signatures = [event.numerator / event.denominator for event in midi_data.flat.getElementsByClass(meter.TimeSignature)]
        
        return time_signatures, key_signature

    except Exception as e:
        print(f"Error analyzing {file_path}: {e}")
        return None, None

def analyze_midi_folder(root_folder):
    results = {'TimeSignatures': [], 'ExpectedKey': []}

    for root, dirs, files in os.walk(root_folder):
        for file in files:
            if file.lower().endswith(".mid") or file.lower().endswith(".midi"):
                file_path = os.path.join(root, file)
                print(f"\nAnalyzing MIDI file: {file_path}")
                time_signatures, key_signature = analyze_midi_file(file_path)

                if time_signatures is not None and key_signature is not None:
                    results['TimeSignatures'].extend(time_signatures)
                    results['ExpectedKey'].append(str(key_signature))

    return results

def plot_results(results):
    # Plot time signatures
    plt.figure(figsize=(10, 6))
    plt.hist(results['TimeSignatures'], bins=20, edgecolor='black')
    plt.xlabel('Time Signature')
    plt.ylabel('Frequency')
    plt.title('Distribution of Time Signatures')
    plt.show()

    # Find mode time signature and expected music key
    mode_time_signature = statistics.mode(results['TimeSignatures'])
    expected_music_key = statistics.mode(results['ExpectedKey'])
    print(f"\nMode Time Signature: {mode_time_signature}")
    print(f"Expected Music Key: {expected_music_key}")

# Replace 'your_root_folder' with the path to the folder containing subfolders with MIDI files
analysis_results = analyze_midi_folder('POP909')
plot_results(analysis_results)


Reference: https://www.kaggle.com/code/wfaria/midi-music-data-extraction-using-music21