# Custom GPT2

Bellow we can see how to interact with the trained Custom GPT2 model.

It's worth mentioning that this model didn't coverge due to hardware limitations, and therefore **was not used in the blind test of this research.**

First make sure that the model is already downloaded. To download the model, follow the instructions of the **Download model weights section** in the [README file](../README.md).

> Note: Ensure that  you are running this code in the right conda environment

In [1]:
import sys
# imports function from src directory
sys.path.append('..')

# noinspection PyUnresolvedReferences
from token_sequence_to_midi import convert_token_sequence_to_notes_sequence

import json
import note_seq
from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast, pipeline



# Model and tokenizer paths
MODEL_PATH = '../bin/checkpoints/checkpoint-595000/' # or '../bin/checkpoints/checkpoint-595000'
TOKENIZER_PATH= MODEL_PATH + 'tokenizer.json'

# Soundfont path
SF2_PATH = '../bin/soundfont/Yamaha-C5-Salamander-JNv5.1.sf2'
SAMPLE_RATE = 16000

# MIDI Primers path
MIDI_PRIMERS_PATH = '../resources/midi/'
TOKEN_SEQUENCE_PRIMERS_PATH = '../resources/tokens/primers_token_sequences.json'

PRIMERS = [
    'c_major_arpeggio.mid',  #0
    'c_major_scale.mid', #1
    'clair_de_lune.mid', #2
    'fur_elise.mid', #3
    'moonlight_sonata.mid', #4
    'prelude_in_c_major.mid' #5
]

PRIMER_MIDI = PRIMERS[5]



## Set Primers location and render function

In [2]:
# Defines model and tokenizer from downloaded files
model = GPT2LMHeadModel.from_pretrained(MODEL_PATH)
tokenizer = PreTrainedTokenizerFast(tokenizer_file=TOKENIZER_PATH)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

primers_token_dict = {}

# Read 'data/primers/tokens/primers_token_sequences.json' file as a dictionary
with open(TOKEN_SEQUENCE_PRIMERS_PATH) as json_file:
    primers_token_dict = json.load(json_file)

# Render MIDI in an audio display and plot its sequence
def render_note_sequence(notes):
    note_seq.play_sequence(notes, synth=note_seq.fluidsynth, sf2_path=SF2_PATH)
    note_seq.plot_sequence(notes)


# Render primer sequence from MIDI file name
def render_primer_sequence(midi_file):
    midi_file_path = MIDI_PRIMERS_PATH + midi_file
    print('File: ' + midi_file_path)

    notes = note_seq.midi_file_to_note_sequence(midi_file_path)

    render_note_sequence(notes)

    primer_token_sequence = primers_token_dict[midi_file]

    # removes piece_end from sequence so the model can generate after that
    primer_token_sequence = primer_token_sequence.replace('piece_end', '')

    return primer_token_sequence

# Renders midi in audio, plot and sequence

In [3]:
# load json file as a dictionary
token_sequence = render_primer_sequence(PRIMER_MIDI)
print(token_sequence)

File: ../resources/midi/prelude_in_c_major.mid


fluidsynth: error: Unknown integer parameter 'synth.sample-rate'


piece_start time_shift=22 velocity=12 note_on=60 time_shift=29 note_on=64 time_shift=23 velocity=15 note_on=67 time_shift=22 note_on=72 time_shift=22 velocity=16 note_on=76 time_shift=23 note_off=67 velocity=15 note_on=67 time_shift=22 note_off=72 note_on=72 time_shift=23 note_off=76 note_on=76 time_shift=19 note_off=60 note_on=60 time_shift=22 note_off=64 note_on=64 time_shift=23 note_off=67 note_on=67 time_shift=19 note_off=72 velocity=16 note_on=72 time_shift=22 note_off=76 velocity=15 note_on=76 time_shift=19 note_off=67 velocity=16 note_on=67 time_shift=20 note_off=72 velocity=17 note_on=72 time_shift=22 note_off=76 velocity=16 note_on=76 time_shift=18 note_off=60 note_off=64 note_off=67 note_off=72 note_off=76 


# Converts tokens into tokens ids

The cell bellow illustrates the tokenization process of the token sequence. For each musical event encoded in a token sequence, a token id is assigned. This token id is used by the model to predict the next token in the sequence.

In [4]:
token_ids = tokenizer.encode(token_sequence, return_tensors="pt")
print(token_ids)

tensor([[295, 147,  19,  45, 176,  33, 152,  14,  39, 147,  60, 147,  11,  77,
         152,  38,  14,  39, 147,  59,  60, 152,  76,  77, 128,  44,  45, 147,
          32,  33, 152,  38,  39, 128,  59,  11,  60, 147,  76,  14,  77, 128,
          38,  11,  39, 134,  59,   9,  60, 147,  76,  11,  77, 122,  44,  32,
          38,  59,  76]])


# Pipeline definition
The cell bellow defines a text-generation pipeline, which is used to generate the token sequence. The pipeline is defined by the model and the tokenizer given a specific task (in this case text-generation).

In [5]:
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer)

# Generates and decode new tokens from sequence

Bellow we can see the raw predictions of the model. The model predicts a token id for each token in the sequence. The token id is then converted into a token, which is the musical event encoded in the token sequence. The `pipeline` defined in the previous cell automates this whole process of generating and decoding the token sequence. 

In [6]:
# Generate model token output
predictions = model.generate(token_ids, max_length=100, temperature=0.5)

# More elaborate generation with beam search and other parameters:
# predictions = model.generate(token_ids, max_length=1000, num_beams=5, no_repeat_ngram_size=10, early_stopping=True, repetition_penalty=0.5)

print(predictions)

tensor([[295, 147,  19,  45, 176,  33, 152,  14,  39, 147,  60, 147,  11,  77,
         152,  38,  14,  39, 147,  59,  60, 152,  76,  77, 128,  44,  45, 147,
          32,  33, 152,  38,  39, 128,  59,  11,  60, 147,  76,  14,  77, 128,
          38,  11,  39, 134,  59,   9,  60, 147,  76,  11,  77, 122,  44,  32,
          38,  59,  76,  14,  39,  14,  45,  14,  77, 147,  32,  14,  33,  14,
          33,  14,  39,  14,  33,  14,  60, 147,  32,  14,  33,  14,  33,  14,
          45,  14,  77, 147,  44,  14,  45,  14,  45,  14,  45,  14, 102,  14,
         102,   4]])


# Generate music with the model

Note in the cell bellow that the tokens are generated from the primers tokens enconded in the `token_sequence` variable. The primers are the first tokens of the sequence that the model will use to generate the new tokens. After that, a default text-generation hugging face 🤗 pipeline is used to generate the new tokens in a auto-regressive manner.

# Convert token sequence to midi

With the token sequence generated, we can convert it into a midi file. The cell bellow shows how to render a token sequence into MIDI, outprinting the audio and the plot of the generated music, as well the token sequence.

> Note: The audio might not work when executing environment using docker. In any case, you can always download the generated midi file and listen to it in your favorite music player.

In [7]:
# Generate model outputs (already mapping token ids to vocabulary)
model_output = pipe(token_sequence, max_length=1024, temperature=0.9)
generated_tokens = model_output[0]['generated_text']

# Converts generated tokens to note sequence
generated_notes_sequence = convert_token_sequence_to_notes_sequence(generated_tokens)

# Render note sequence by displaying MIDI play and plot
render_note_sequence(generated_notes_sequence)
first_100_tokens = generated_tokens.split(' ')[:100]
print('First 100 tokens of generated sequence: ', first_100_tokens)

fluidsynth: error: Unknown integer parameter 'synth.sample-rate'


First 100 tokens of generated sequence:  ['piece_start', 'time_shift=22', 'velocity=12', 'note_on=60', 'time_shift=29', 'note_on=64', 'time_shift=23', 'velocity=15', 'note_on=67', 'time_shift=22', 'note_on=72', 'time_shift=22', 'velocity=16', 'note_on=76', 'time_shift=23', 'note_off=67', 'velocity=15', 'note_on=67', 'time_shift=22', 'note_off=72', 'note_on=72', 'time_shift=23', 'note_off=76', 'note_on=76', 'time_shift=19', 'note_off=60', 'note_on=60', 'time_shift=22', 'note_off=64', 'note_on=64', 'time_shift=23', 'note_off=67', 'note_on=67', 'time_shift=19', 'note_off=72', 'velocity=16', 'note_on=72', 'time_shift=22', 'note_off=76', 'velocity=15', 'note_on=76', 'time_shift=19', 'note_off=67', 'velocity=16', 'note_on=67', 'time_shift=20', 'note_off=72', 'velocity=17', 'note_on=72', 'time_shift=22', 'note_off=76', 'velocity=16', 'note_on=76', 'time_shift=18', 'note_off=60', 'note_off=64', 'note_off=67', 'note_off=72', 'note_off=76', '', 'velocity=15', 'note_on=67', 'velocity=15', 'note_o

# Unconditional generation

Uncondinitional generation is the process of generating music without a primer, or in other words, "from scratch". Uncomment the code cell bellow to perform uncondinitional generation

In [8]:
# unconditional_tokens = pipe('piece_start', max_length=100, temperature=0.3)[0]['generated_text']
# unconditional_notes_sequence = convert_token_sequence_to_notes_sequence(unconditional_tokens)
#
# render_note_sequence(unconditional_notes_sequence)
# print(unconditional_tokens)