# Music Generation using RNN (Recurrent Neural Network)

## Dependencies

FluidSynth is required for playing the audio in the Notebook. Install via the commands:
* `sudo apt install -y fluidsynth`
* `pip install --upgrade pyfluidsynth`

The other dependencies should be installed from the `requirements.txt` file.

1. File and Path Operations
    * `import pathlib`: Provide object-oriented filesystem paths.
    * `import glob`: For filename pattern matching.
    * `import zipfile`: Tools for working with ZIP archives.
    * `import datetime`: Date and time operations.
2. Data Handling and Analysis
    * `import pandas as pd`: Provide data structures and data analysis tools (e.g., DataFrames).
    * `import numpy as np`: Supports large, multi-dimensional arrays and matrices, along with a large collection of mathematical functions.
    * `import collections`: Implement specialized container datatypes (e.g., Counter, defaultdict).
3. MIDI and Audio Processing
    * `import pretty_midi`: Facilitate MIDI file manipulation and audio synthesis.
    * `import fluidsynth`: Tools to synthesize audio from MIDI files using SoundFonts.
4. Data Visualization
    * `import matplotlib.pyplot as plt`: Plotting & Visualizing data.
    `* import seaborn as sns`: Drawing statistical graphics.
5. HTTP Requests
    * `import requests`: Simplify making HTTP requests.
6. Machine Learning and PyTorch
    * `import torch`: Tools for deep learning and tensor computations.
7. IPython Display
    * `from IPython import display`: Display functions for Jupyter Notebooks.
8. Type Hinting
    * `from typing import Optional`: Support for type hints and annotation

In [None]:
import pathlib
import glob
import zipfile
import datetime

import pandas as pd
import numpy as np
import collections

import pretty_midi
import fluidsynth

import matplotlib.pyplot as plt
import seaborn as sns

import requests

import torch

from IPython import display

from typing import Optional

In [None]:
seed = 42

torch.manual_seed(seed)
np.random.seed(seed)

# Sampling rate for the audio playback
SAMPLING_RATE = 16_000

## Download the Maestro V3.0.0 Dataset

The dataset is to be downloaded and unzipped in a specific folder. Console messages are informing about the current status of the downloading/unzipping of the dataset.

In [None]:
data_dir = pathlib.Path("data/maestro-v3.0.0")
zip_file_path = pathlib.Path("data/maestro-v3.0.0-midi.zip")
URL = "https://storage.googleapis.com/magentadata/datasets/maestro/v3.0.0/maestro-v3.0.0-midi.zip"

# Check if the data directory exists & create it if it does not exist
if not data_dir.exists():
    data_dir.parent.mkdir(parents=True, exist_ok=True)

if not zip_file_path.exists():
    print(f"Downloading {URL}...")
    
    response = requests.get(URL, stream=True)
    # Check for request errors
    response.raise_for_status()
    
    # Save the .zip file to the disk
    with open(zip_file_path, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    
    print("Download Completed.")

# Verify the downloaded file
if zip_file_path.exists():
    try:
        print(f"Extracting {zip_file_path}...")
        with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
            zip_ref.extractall('data')
        print("Extraction completed.")
    except zipfile.BadZipFile:
        print("Error: The file is not a valid ZIP file or it is corrupted.")
else:
    print("Error: ZIP file does not exist.")

In [None]:
# Checking the number of MIDI files in the Maestro dataset
filenames = glob.glob(str(data_dir/'**/*.mid*'))
print('Number of files:', len(filenames))

## Process a Single MIDI File

A single MIDI file is parsed and printed on the console to be inspected. 

In [None]:
sample_file = filenames[0]
print(sample_file)

A *`PrettyMIDI`* object is generated for the selected MIDI file.

In [None]:
pm_object = pretty_midi.PrettyMIDI(sample_file)

The selected MIDI file is played directly in the Notebook.

In [None]:
def display_audio(pm: pretty_midi.PrettyMIDI, seconds=30):
    waveform = pm.fluidsynth(fs=SAMPLING_RATE)
    # Take a sample of the generated waveform to mitigate kernel resets
    waveform_short = waveform[:seconds*SAMPLING_RATE]
    return display.Audio(waveform_short, rate=SAMPLING_RATE)

In [None]:
display_audio(pm_object)

In the console is printed information about the selected MIDI file (number of instruments & their type)

In [None]:
print('Number of instruments:', len(pm_object.instruments))
instrument = pm_object.instruments[0]
instrument_name = pretty_midi.program_to_instrument_name(instrument.program)
print('Instrument name:', instrument_name)

## Note Extraction

The first 10 notes from the selected MIDI file are selected. For each of them the pitch, name and duration is determined and printed in the console.

In [None]:
for i, note in enumerate(instrument.notes[:10]):
    note_name = pretty_midi.note_number_to_name(note.pitch)
    duration = note.end - note.start
    print(f'{i}: pitch={note.pitch}, note_name={note_name}, duration={duration:.4f}')

### Extract Notes from MIDI function

In [None]:
def midi_to_notes(midi_file: str) -> pd.DataFrame:
    pm = pretty_midi.PrettyMIDI(midi_file)
    instrument = pm.instruments[0]
    notes = collections.defaultdict(list)
    
    # Sort notes by Start Time
    sorted_notes = sorted(instrument.notes, key=lambda n: n.start)
    prev_start = sorted_notes[0].start
    
    for current_note in sorted_notes:
        start_time = current_note.start
        end_time = current_note.end
        
        notes["pitch"].append(current_note.pitch)
        notes["start_time"].append(start_time)
        notes["end_time"].append(end_time)
        notes["step"].append(start_time - prev_start)
        notes["duration"].append(end_time - start_time)
        
        prev_start = start_time
    
    return pd.DataFrame({name: np.array(value) for name, value in notes.items()})

In [None]:
raw_notes = midi_to_notes(sample_file)
raw_notes.head()