## Model definition is on other file.

In [40]:
###
#   Model Definition
###
import torch 
import torch.nn as nn
import math
import numpy as np

## Positional Encoding, is it worth?

Positional encodings, according to a recent paper transformers can infer positional information for music generation. Could try no positional encoding vs sinusoidal positional encoding vs book positional encoding. Sinusoidal has a more dynamic approach than positional encoding, that has a fixed size. 

implementation 13 to 45

## Co-Pilot
## Key Differences
#### Precomputation vs. On-the-fly Calculation:

PositionalEncoding precomputes the positional encodings for a fixed maximum length (max_len) and stores them in self.P. This can be more efficient if the sequence lengths are known and fixed.
SinusoidalPosEmbedding computes the positional encodings on-the-fly based on the input tensor x. This can be more flexible for varying sequence lengths.
#### Usage:

PositionalEncoding is typically used by adding the precomputed positional encodings to the input embeddings.
SinusoidalPosEmbedding generates the positional encodings directly from the input tensor and can be used to concatenate or add to the input embeddings.
#### Implementation Details:

PositionalEncoding uses a fixed maximum length and creates a large tensor self.P to store the positional encodings.
SinusoidalPosEmbedding calculates the positional encodings dynamically using the input tensor x and does not require a fixed maximum length.

### BLOCKS ATTENTION

 In model_architecture.py lines from  59 -44 we define the attention block which will contain:
 - Multihead attention, where query key value will be distributed so each head can learn different features. 
 One paper recommended 4 heads for music generation. 
 - Causal masking: Basically masks the input that goes after the next-to-predict token, to prevent the model looking foward in thye sequences to predict a token. 

### Transformer Block

In model_architecture.py lines from 99-160 we define or transformer blocks. These consists of:
- one  multihead attention layer
- followed by norm and dropout layers
- a parameter that sets another multihead norm dropout if the block is classified as a decoder. This is to implement cross attention
- later an mlp layer withy addition and norm/dropout


        

### MusicalEmbeder
To process the embeddings we need to define a custom embedder that contains 
- Embedding layer, we give the number of token classes, (6) and embedsing dimension size
- linear layer to extract embeddings/information from individual features
- Concat features with token embeddings
- Single vector to reduce in linear regression. Sort Cone to bottleneck

### Encoder Block

In model_architecture.py lines from 165-212 we define our encoder blocks. These consists of:
- an embedding layer,

- followed by an optional positional encodding method, could be book based or sinusoidal positional encoding. 
- Then an N amount of transformer blocks. 



        



### Decoder Block


from 214-250, similar to encoder
- positional embedding, with extra layers on transformer blocks to use cross attention
- followed by a fully connected layer. 

### Class Encoder Decoder (ver1)


Ablation studies of architectures. We will have an encoder decoder structure that is one encoder with two stacked encoder blocks, and two(and three) output blocks. These numbers were chosen based on the input (two type of tokens for input, three outputs). Later compare. Should be bad. Evaluation is gonna be token error rate. 

theversion is on 252 until 276, simple encoder decoder training.


### Class encoder Decoder v2

The best model out of this is going to be compared with another encoder decoder with two encoders and three decoders, one for each input and one of each output. Plan is to train encoder with embeddings for each type of tokens, then add the two encoded embeddings and feed it into the three decoders, each decoder for one voice. 
The problem is how will the output be managed / how will the model be lossed. Probably calculate loss by adding loss. 

### Ahora fue, data preparation?

In [41]:

##code taken from github  deepChoir, and modified accordingly.
## QUantize score to have it all squared
## Normalizesall notesand scores.

import os
import numpy as np
from tqdm import trange
# from config import *
import music21 as m21
from copy import deepcopy
import pickle



# 'loader.py'
EXTENSION = ['.musicxml', '.xml', '.mxl']
DATASET_PATH = "gaby_deeplearning/dataset"

In [42]:
!pwd

/mnt/c/Users/gabri/OneDrive/Documents/multihead_attention_music


### Quantize score

In [43]:



def quant_score(score):
    
    for element in score.flatten():
        onset = np.ceil(element.offset/0.25)*0.25

        if isinstance(element, m21.note.Note) or isinstance(element, m21.note.Rest) or isinstance(element, m21.chord.Chord):
            offset = np.ceil((element.offset+element.quarterLength)/0.25)*0.25
            element.quarterLength = offset - onset

        element.offset = onset

    return score



### Traverse input dir, customimplementation of glob

In [44]:


##gets the file names of each score. 

def get_filenames(input_dir):
    
    filenames = []

    # Traverse the path
    for dirpath, dirlist, filelist in os.walk(input_dir):
        # Traverse the list of files
        for this_file in filelist:
            # Ensure that suffixes in the training set are valid
            if input_dir==DATASET_PATH and os.path.splitext(this_file)[-1] not in EXTENSION:
                continue
            filename = os.path.join(dirpath, this_file)


            score = m21.converter.parse(filename)
            skippable=False
            #   ## Added, if the score has a 3/4 time signature, we will skip it.
            for part in score.parts:
                if skippable:
                    break
                for element in part.flatten():
                    
                    if isinstance(element, m21.meter.TimeSignature):
                        if element.numerator%3 == 0 or element.numerator==2:
                    
                            print('skipping 3/4 time signature')
                            skippable=True
                            break
                        

            if skippable:
                continue

            filenames.append(filename)


    return filenames


### Normalize key signature.

In [45]:

# key signature to gap
def ks2gap(ks):
    
    if isinstance(ks, m21.key.KeySignature):
        ks = ks.asKey()
        
    try:
        # Identify the tonic
        # print('printing tonic')
        # print(ks.tonic)
        if ks.mode == 'major':
            tonic = ks.tonic

        else:
            # print('tonic not major, ks')
            tonic = ks.parallel.tonic
            # print(tonic)
    
    except:
        return m21.interval.Interval(0)

    # Transpose score
    gap = m21.interval.Interval(tonic, m21.pitch.Pitch('C'))

    return gap.semitones



### UNused code? Each different key signature inside a scorewill be splitted into a new file

In [46]:

## Split score by key signature

def split_by_key(score):

    scores = []
    score_part = []
    ks_list = []
    ks = None
    ts = m21.meter.TimeSignature('c')
    pre_offset = 0

    for element in score.flatten():

        # If is key signature
        if isinstance(element, m21.key.KeySignature) or isinstance(element, m21.key.Key):

            # If is not the first key signature
            if ks!=None:

                scores.append(m21.stream.Stream(score_part))
                ks = element
                ks_list.append(ks)
                pre_offset = ks.offset
                ks.offset = 0
                new_ts = m21.meter.TimeSignature(ts.ratioString)
                score_part = [ks, new_ts]
            
            else:

                ks = element
                ks_list.append(ks)
                score_part.append(ks)

        # If is time signature
        elif isinstance(element, m21.meter.TimeSignature):

            element.offset -= pre_offset
            ts = element
            score_part.append(element)
        
        else:

            element.offset -= pre_offset
            score_part.append(element)

    scores.append(m21.stream.Stream(score_part))
    if ks_list==[]:
        ks_list = [m21.key.KeySignature(0)]
        
    gap_list = [ks2gap(ks) for ks in ks_list]

    return scores, gap_list




### Calculating beat strengths

In [47]:

### Needs modification?? how to implement beats in remi?
def beat_seq(ts):

    # Read time signature
    beatCount = ts.numerator
    beatDuration = 4/ts.denominator

    # Create beat sequence
    beat_sequence = [0]*beatCount*int(beatDuration/0.25)
    beat_sequence[0] += 1

    # Check if the numerator is divisible by 3 or 2
    medium = 0 

    if (ts.numerator%3)==0:
        medium = 3

    elif (ts.numerator%2)==0:
        medium = 2

    ##  debugging
    for idx in range(len(beat_sequence)):

        # print('time idx', idx)

        # Add 1 to each beat
        if idx%((beatDuration/0.25))==0:
            # print('adding 1 to beat sequence')
            beat_sequence[idx] += 1

        
        # Mark medium-weight beat (at every second or third beat)
        if (medium==3 and idx%((3*beatDuration/0.25))==0) or \
            (medium==2 and idx%((2*beatDuration/0.25))==0):
            # print('adding 1 to beat sequence because medium')	
            beat_sequence[idx] += 1
            
    return beat_sequence



### Melody Reader

##### Definition of tokens
- start_of_score_token =[0,0,0,0,0,0,0]
- Intrument_tokens =    [1,0,0,0,0,0,SATB]
- Start_sequence =      [2,0,0,0,0,0,0]
- Note_on =             [3,beat=0-15,beat_str=0-3,position=number_pos_measure,pitch=midi_range,duration=0-16,SATB]
- Chord_on =            [4,beat=0-15,beat_str=0-3,Chord_degree=1-7?,root=0-12,mode=[0=major,1=minor,2=dim,3=aug],extension=letssee]
- end_score =           [5,0,0,0,0,0,0]

Chord_modes:
- 1 = major/dominant/
- 2 = minor
- 3 = diminished
- 4 = augmented
- 5 = suspended-fourth
- 6 = suspended-second
- 7 = power 

Chord Extensions:
- 5 = regular
- 6 = dimished-seventh
- 7 = dominant
- 8 = seventh
- 8+2 =10 =major- ninth
- 7+2 =9 minor-ninth
- 

In [48]:
###
## different strings encountere from chord kind.split('-')[0]
# major, minor, diminished, augmented, suspended,half,power,dominant

def extract_chord_mode_extension(chord: m21.chord.Chord):
    
    chord_kind = chord.chordKind.split('-')
    chord_kind_first = chord_kind[0]

    if chord_kind_first=='major':

        mode=1
        if len(chord_kind)>1:
            
            ## managing different majors
            if chord_kind[1]=='seventh':
                extension=8
            elif chord_kind[1]=='ninth':
                extension=10
        else:
            extension=5
            
    elif chord_kind_first=='dominant':
        mode=1

        ## manage dominants
        if len(chord_kind)>1:
            if chord_kind[1]=='seventh':
                extension=7
            elif chord_kind[1]=='ninth':
                extension=9
        else:
            extension=5
        
    elif chord_kind_first=='minor':
        mode=2
        if len(chord_kind)>1:
            if chord_kind[1]=='seventh':
                extension=7
            elif chord_kind[1]=='major' and chord_kind[2]=='seventh':
                extension=8
        
            elif chord_kind[1]=='ninth':
                extension=9
            elif chord_kind[1]=='major' and chord_kind[2]=='ninth':
                extension=10
        else:
            extension=5
            
            
    elif chord_kind_first=='diminished':
        mode=3
        if len(chord_kind)>1:
            if chord_kind[1]=='seventh':
                extension=6
        else:
            extension=5
        
    elif chord_kind_first=='augmented':
        mode=4
        if len(chord_kind)>1:
            if chord_kind[1]=='major':
                extension=8
        else:
            extension=5 
    elif chord_kind_first=='suspended':
        ## logic to manage sus 4 and sus 2
        if chord_kind[1]=='fourth':
            mode=5
        else:
            mode=6
        extension=5
    elif chord_kind_first=='half':
        ## half diminished
        mode=3
        extension=7
        pass
    elif chord_kind_first=='power':
        mode=7
        extension=5
    else:
        mode=0
        extension=0
    return mode, extension
        


### Melody Reader
Input
- Music 21 parts, with melody and chords

Output
- tokens

In [49]:

def melody_reader(melody_part, gap,intrument,scale,chord_counter=None, chord_degree_counter=None):

    # # Initialization
    # melody_txt = []
    # ts_seq = []
    # beat_txt = []
    # fermata_txt = []
    # chord_txt = []
    # chord_token = [0.]*12
    # fermata_flag = False
    melody_tokens = []

    ### new Tokens
    ## array of tokens as encountered
    aot_encounter =[]


    ## chord tokens
    chord_tokens = []

 

    ## beat definition strength
    ### if beat is strong, i.e first beat 3
    ## if beat is medium, i.e third beat 2
    ## if beat is weak, i.e second beat and fourth 1
    ## else is none

    # Read note and meta information from melody part
    for element in melody_part.flatten():
       

        ### Definition of tokens
        token_encountered = [0] * 7
        ## note: token_encounter[0]=3
        ## chord:  token_encounter[0]=4


        if isinstance(element, m21.note.Note):
            # midi pitch as note onset
            ## normalize to C
            token_encountered[0]=3
            midi_note = element.transpose(gap).pitch.midi

            
            beat_in_16ths = int(element.beat*4)
            ## first beat
            if beat_in_16ths == 4:
                beat_strength = 3
            elif beat_in_16ths == 8:
                beat_strength = 1
            elif beat_in_16ths == 12:
                beat_strength = 2
            elif beat_in_16ths == 16:
                beat_strength = 1
            else:
                beat_strength = 0
        
            ## is offset position?
            position = int(element.offset/0.25)
            
            duration = int(element.quarterLength*4)

            token_encountered[1]=position
            
            token_encountered[2]=beat_in_16ths
            token_encountered[3]=beat_strength
            token_encountered[4]=midi_note
            token_encountered[5]=duration
            token_encountered[6]=intrument







            # for f in element.expressions:
            #     if isinstance(f, m21.expressions.Fermata):
            #         fermata_flag = True
            #         break

        elif isinstance(element, m21.note.Rest):
            # 128 as rest onset
            token_encountered[0]=3
            token = 128
            duration = int(element.quarterLength*4)
            position = int(element.offset/0.25)
            beat_in_16ths = int(element.beat*4)
            if beat_in_16ths == 4:
                beat_strength = 3
            elif beat_in_16ths == 8:
                beat_strength = 1
            elif beat_in_16ths == 12:
                beat_strength = 2
            elif beat_in_16ths == 16:
                beat_strength = 1
            else:
                beat_strength = 0
            token_encountered[1]=position
            
            token_encountered[2]=beat_in_16ths
            token_encountered[3]=beat_strength
            token_encountered[4]=128
            token_encountered[5]=duration
            token_encountered[6]=intrument
            
        ## if chord, take the highest note
        elif isinstance(element, m21.chord.Chord) and not isinstance(element, m21.harmony.ChordSymbol):
            notes = [n.transpose(gap).pitch.midi for n in element.notes]
            notes.sort()
            midi_note = notes[-1]
                        
            beat_in_16ths = int(element.beat*4)
            ## first beat
            if beat_in_16ths == 4:
                beat_strength = 3
            elif beat_in_16ths == 8:
                beat_strength = 1
            elif beat_in_16ths == 12:
                beat_strength = 2
            elif beat_in_16ths == 16:
                beat_strength = 1
            else:
                beat_strength = 0
        
            ## is offset position?
            position = int(element.offset/0.25)
            
            duration = int(element.quarterLength*4)

            token_encountered[1]=position
            token_encountered[2]=beat_in_16ths
            token_encountered[3]=beat_strength
            token_encountered[4]=midi_note
            token_encountered[5]=duration
            token_encountered[6]=intrument


            
        elif isinstance(element, m21.harmony.ChordSymbol):
            
            
            # m21.harmony.ChordSymbol.roo
            ## STUDY HOW MANY DIFFERENT CHORD kinds
            token_encountered[0]=4
            element = element.transpose(gap)

            degree = scale.getScaleDegreeAndAccidentalFromPitch(element.root())
            root = element.root().midi %12
            beat_in_16ths = int(element.beat*4)

            ## paper extracts chords on strong beats, maybe not worth and better to have position
            ## as a token

            position= int(element.offset/0.25)
            if beat_in_16ths == 4:
                beat_strength = 3
            elif beat_in_16ths == 8:
                beat_strength = 1
            elif beat_in_16ths == 12:
                beat_strength = 2
            elif beat_in_16ths == 16:
                beat_strength = 1
            else:
                beat_strength = 0
            
            mode, extension = extract_chord_mode_extension(element)
            if(mode==0):
                print('what')
                print(element.fullName)
                continue
            token_encountered[1]=beat_strength
                                
            token_encountered[2]=beat_in_16ths
            token_encountered[3]=degree[0]
            token_encountered[4]=root
            token_encountered[5]=mode
            token_encountered[6]=extension

            aot_encounter.append(token_encountered)
            chord_tokens.append(token_encountered)
            # position = int(element.offset/0.25)
    	    ### to extract relevant data for tokenization. 
            if(chord_counter is not None):
                if element.chordKind not in chord_counter:
                    chord_counter[element.chordKind] = 0
                chord_counter[element.chordKind] += 1
            if(chord_degree_counter is not None):
                ## getting chord degree

                degree = scale.getScaleDegreeFromPitch(element.root(), comparisonAttribute='pitchClass')
                degree2 = scale.getScaleDegreeAndAccidentalFromPitch(element.root())
                if degree is not None:
                    if degree not in chord_degree_counter['scale_degree_without']:
                        chord_degree_counter['scale_degree_without'][degree] = 0
                    chord_degree_counter['scale_degree_without'][degree] += 1
                else:
                    if 'None' not in chord_degree_counter['scale_degree_without']:
                        chord_degree_counter['scale_degree_without']['None'] = {}
                    if element.root() not in chord_degree_counter['scale_degree_without']['None']:
                        chord_degree_counter['scale_degree_without']['None'][element.root()] = 0
                    chord_degree_counter['scale_degree_without']['None'][element.root()] +=1
                if degree2 is not None:
                    if degree2[0] not in chord_degree_counter['scale_degree_acc']:
                        chord_degree_counter['scale_degree_acc'][degree2[0]] = {}
                    
                    
                    if degree2[1] is not None:
                        if degree2[1].name not in chord_degree_counter['scale_degree_acc'][degree2[0]]:

                            chord_degree_counter['scale_degree_acc'][degree2[0]][degree2[1].name]=0
                        chord_degree_counter['scale_degree_acc'][degree2[0]][degree2[1].name] += 1
                    else:
                        if 'None' not in chord_degree_counter['scale_degree_acc'][degree2[0]]:
                            chord_degree_counter['scale_degree_acc'][degree2[0]]['None'] = 0
                        chord_degree_counter['scale_degree_acc'][degree2[0]]['None'] += 1
                else:
                    print('degree2 is none')
            

 
            continue

        # Read the current time signature
        elif isinstance(element, m21.meter.TimeSignature):

            # ts_seq.append(element)
            continue

        else:
            continue
        
        if element.quarterLength==0:
            continue
        
        aot_encounter.append(token_encountered)
        
        melody_tokens.append(token_encountered)


    return aot_encounter,melody_tokens,chord_tokens


### TESTING THE CODE ABOVE/ create token sequences

Things to check for, is chord position important? should i encode chord position instead of beat strength?
or is positional encoding enough



In [50]:
from contextlib import redirect_stdout
import pickle



def convert_files(filenames, fromDataset=True):

    print('\nConverting %d files...' %(len(filenames)))

    ## EACH SEQUENCE IS A SONG

    complete_melody_with_chords_sequences= []
    complete_soprano_melody_sequences=[]
    complete_chords_sequences = []

    ## what is data corpus
    data_corpus = []

    complete_alto_sequences_mel= []
    complete_tenor_sequences_mel = []
    complete_bass_sequences_mel = []

    complete_three_sequences_voice_mix = []

    # study_chord_kind = {}
    scale = m21.scale.MajorScale('C')

    # study_chord_degrees = {}

    # if isinstance(study_chord_degrees, dict):
    #     study_chord_degrees['scale_degree_acc']={}
    #     study_chord_degrees['scale_degree_without']={}


    for filename_idx in trange(len(filenames)):

        # Read this music file
        filename = filenames[filename_idx]

        # Ensure that suffixes are valid
        if os.path.splitext(filename)[-1] not in EXTENSION:
            continue

        # try:
        # Read this music file
        score = m21.converter.parse(filename)

        # Read each part
        ## LOop over  score parts, our case 4 parts. 
        # print('song # %d' %filename_idx)


        ## for each song we are creating an array that contains
        ## token arrays. 

        this_song_soprano_w_chords = []
        this_song_soprano_melody =[]
        this_song_alto_melody = []
        this_song_tenor_melody = []
        this_song_bass_melody = []

        this_three_voice_mix = []
        




                        


        for idx, part in enumerate(score.parts):
            

                
            part = quant_score(part)
            # print('before splitting. part for idx %d' %idx)
            ## There is no split exxtra, everything seems normal.
            splited_score, gap_list = split_by_key(part)
            # print("splited_score")
            # print(splited_score)
            # print('gap_list')
            # print(gap_list)

            # print(len(splited_score))
            # continue
            if idx==0:

                

                # Convert soprano 
                ## Nota al calze, even if in loop, len is always 1. 
                for s_idx in range(len(splited_score)):
                    melody_part = splited_score[s_idx]

                    ## changed all codes
                    this_song_soprano_w_chords, this_song_soprano_melody, this_song_chords = melody_reader(melody_part, gap_list[s_idx], 0,scale,chord_counter=None,chord_degree_counter=None)
                    
                    ##adding important tokens to 
                    this_song_soprano_w_chords.insert(0,[0]*7)
                    this_song_soprano_w_chords.insert(1,[1,0,0,0,0,0,1])
                    this_song_soprano_w_chords.insert(2, [2,0,0,0,0,0,0])
                    this_song_soprano_w_chords.append([5,0,0,0,0,0,0])

            
                    ## adding to mel
                    this_song_soprano_melody.insert(0,[0]*7)
                    this_song_soprano_melody.insert(1,[1,0,0,0,0,0,1])
                    this_song_soprano_melody.insert(2, [2,0,0,0,0,0,0])
                    this_song_soprano_melody.append([5,0,0,0,0,0,0])

                    ## adding to chord sequence
                    this_song_chords.insert(0,[0]*7)
                    this_song_chords.insert(1,[2,0,0,0,0,0,0])
                    this_song_chords.append([5,0,0,0,0,0,0])

               
                    
                    complete_melody_with_chords_sequences.append(this_song_soprano_w_chords.copy())
                    complete_soprano_melody_sequences.append(this_song_soprano_melody.copy())
                    complete_chords_sequences.append(this_song_chords.copy())
                    
                    
            else:
                    
                    # Convert alto, tenor and bass
                for s_idx in range(len(splited_score)):
                    melody_part = splited_score[s_idx]
                    ## CHANGE code, interestignly enough, will different instruments be trained at the same time?
                    complete,melody_part,_ = melody_reader(melody_part, gap_list[s_idx],idx,scale,None,None)


                if idx==1:
                    temp_alto_mel =melody_part.copy()
                    temp_alto_mel.insert(0,[0]*7)
                    temp_alto_mel.insert(1,[1,0,0,0,0,0,idx])
                    temp_alto_mel.insert(2, [2,0,0,0,0,0,0])
                    temp_alto_mel.append([5,0,0,0,0,0,0])
                    complete_alto_sequences_mel.append(temp_alto_mel.copy())


                elif idx==2:
                    temp_tenor_mel =melody_part.copy()
                    temp_tenor_mel.insert(0,[0]*7)
                    temp_tenor_mel.insert(1,[1,0,0,0,0,0,idx])
                    temp_tenor_mel.insert(2, [2,0,0,0,0,0,0])
                    temp_tenor_mel.append([5,0,0,0,0,0,0])
                    complete_tenor_sequences_mel.append(temp_tenor_mel.copy())


                elif idx==3:
                    temp_bass_mel = melody_part.copy()
                    temp_bass_mel.insert(0,[0]*7)
                    temp_bass_mel.insert(1,[1,0,0,0,0,0,idx])
                    temp_bass_mel.insert(2, [2,0,0,0,0,0,0])
                    temp_bass_mel.append([5,0,0,0,0,0,0])
                    complete_bass_sequences_mel.append(temp_bass_mel.copy())


                this_three_voice_mix+=melody_part

        
        # ## After all_parts have been read, add the end and start three voice
            
        # temp_three_voice_mix.sort(key=lambda x: x[1])


        this_three_voice_mix.insert(0,[0]*7)
        this_three_voice_mix.insert(1,[1,0,0,0,0,0,2])
        this_three_voice_mix.insert(2,[1,0,0,0,0,0,3])
        this_three_voice_mix.insert(3,[1,0,0,0,0,0,4])
        this_three_voice_mix.insert(4, [2,0,0,0,0,0,0])
        this_three_voice_mix.append([5,0,0,0,0,0,0])

        complete_three_sequences_voice_mix.append(this_three_voice_mix.copy())




    input_sequence_dict ={
        'all_parts': complete_melody_with_chords_sequences,
        'soprano': complete_soprano_melody_sequences,
        'chords': complete_chords_sequences
    } 
    ##output sequence is

    output_sequence_dict ={
        'alto': complete_alto_sequences_mel,
        'tenor': complete_tenor_sequences_mel,
        'bass': complete_bass_sequences_mel,
        'all_parts': complete_three_sequences_voice_mix
    }
    
 
    return input_sequence_dict, output_sequence_dict


    # data_corpus.append((input_sequence, output_sequence))
with open('output_convert.txt', 'w') as file:
    with redirect_stdout(file):
        

        ##amount of files

        files = get_filenames(DATASET_PATH)

        print('Amount of files extracted: %d' %len(files))

        # Split 80,10,10
        train_files = files[:int(len(files)*0.8)]
        val_files = files[int(len(files)*0.8):int(len(files)*0.9)]
        test_files = files[int(len(files)*0.9):]

        
        input_seq,output_seq = convert_files(train_files, fromDataset=False)
        with open('train_input_sequence.pkl', 'wb') as f:
            pickle.dump(input_seq, f)
        with open('train_output_sequence.pkl', 'wb') as f:
            pickle.dump(output_seq, f)

        input_seq,output_seq = convert_files(test_files, fromDataset=False)
        with open('test_input_sequence.pkl', 'wb') as f:
            pickle.dump(input_seq, f)
        with open('test_output_sequence.pkl', 'wb') as f:
            pickle.dump(output_seq, f)
    
        input_seq,output_seq = convert_files(val_files, fromDataset=False)
        with open('val_input_sequence.pkl', 'wb') as f:
            pickle.dump(input_seq, f)
        with open('val_output_sequence.pkl', 'wb') as f:
            pickle.dump(output_seq, f)
    
    

KeyboardInterrupt: 

### Padding sequences, ready for dataloader

In [9]:
# padding sequences for transformer

import numpy as np

def return_padded_sequences(song):
    
    # Determine the maximum sequence length
    max_length = max(len(seq) for seq in song)

    # Pad sequences to the maximum length
    padded_sequences = [
        seq + [[6, 0,0,0,0,0,0]] * (max_length - len(seq))  # Pad with [0, 0]
        for seq in song
    ]

    return padded_sequences

type_of_data = ['train_', 'val_', 'test_']
all_songs_desafinau ={
    'input':{},
    'output':{}
}
for t in type_of_data:
    input_seq = pickle.load(open(t+'input_sequence.pkl', 'rb'))
    output_seq = pickle.load(open(t+'output_sequence.pkl', 'rb'))

    all_songs_desafinau['input'][t]={}
    all_songs_desafinau['output'][t]={}
    for key, i in input_seq.items():
        sequences_of_songs =return_padded_sequences(i)
        sos = np.array(sequences_of_songs)
        all_songs_desafinau['input'][t][key]=sos
        # print(song_counter)
    for key, i in output_seq.items():
        sequences_of_songs =return_padded_sequences(i)
        sos = np.array(sequences_of_songs)
        # print(sos.shape)
        all_songs_desafinau['output'][t][key]=sos
        # print(song_counter)


with open('all_songs_split.pkl', 'wb') as f:
    pickle.dump(all_songs_desafinau, f)

In [8]:
a

NameError: name 'a' is not defined

### ¿Relevant Data?

In [None]:
different_kind_chords ={'major': 8898,
 'minor': 4559,
 'minor-seventh': 604, 
 'diminished': 501, 
 'dominant-seventh': 902, 
 'diminished-seventh': 190, 
 'augmented': 45, 
 'suspended-fourth': 680, 
 'half-diminished-seventh': 412, 
 'minor-ninth': 14, 
 'augmented-major-seventh': 18,
 'power': 48, 
 'major-seventh': 126, 
 'suspended-second': 115, 
 'major-ninth': 16, 
 'minor-major-seventh': 2, 
 'dominant-ninth': 4, 
 'minor-major-ninth': 1}

## Legend


scale_degree_acc= {
    1: {
        'None': 3866, 
        'sharp': 126}, 
    6: {'None': 3227}, 
    4: {'None': 1507, 'sharp': 132}, 
    2: {'None': 2505}, 
    5: {'None': 2908, 'sharp': 180}, 
    7: {'None': 695, 'flat': 173}, 
    3: {'None': 1753, 'flat': 63}
    }


### Total
input_seq_pk={
'all_parts':38275,
'soprano':21140,
'chords':18110
}

output_seq_pk={
    'alto':24303,
'tenor':25068,
'bass':25331,
'all_parts':72752,
}


### Dataloader for complete sequence
- Input: [soprano,chord]
- output: [alto,tenor,bass]


In [1]:
import torch
from torch.utils.data import Dataset, DataLoader
import pickle

## Input is mixed (soprano and chords)
## output is mixed alto, tenor and bass

class MusicDatasetComplete(Dataset):
    def __init__(self, input_sequences, output_sequences):
        self.input_sequences = input_sequences
        self.output_sequences = output_sequences

    def __len__(self):
        return len(self.input_sequences)

    def __getitem__(self, idx):
        input_seq = self.input_sequences[idx]
        output_seq = self.output_sequences[idx]
        return torch.tensor(input_seq, dtype=torch.float), torch.tensor(output_seq, dtype=torch.float)

## Testing dataset
# input_seq = pickle.load(open('train_input_sequence.pkl', 'rb'))
# output_seq = pickle.load(open('train_output_sequence.pkl', 'rb'))
# dataset = MusicDatasetComplete(input_seq['all_parts'], output_seq['all_parts'])


# dataloader = DataLoader(dataset, batch_size=3, shuffle=False)
# count=0



### Need to transform data to sequences...

### Evaluation, token error rate, there has to be another way to objective evaluate.....

In [2]:
# def calculate_token_error_rate(predicted, target):
#     """
#     Calculate the token error rate between predicted and target sequences.
    
#     Args:
#     predicted (torch.Tensor): The predicted token sequences.
#     target (torch.Tensor): The target token sequences.
    
#     Returns:
#     float: The token error rate.
#     """
#     # Ensure the predicted and target sequences have the same shape
#     assert predicted.shape == target.shape, "Shape mismatch between predicted and target sequences"
    
#     # Calculate the number of errors
#     errors = (predicted != target).sum().item()
    
#     # Calculate the total number of tokens
#     total_tokens = target.numel()
    
#     # Calculate the token error rate
#     token_error_rate = errors / total_tokens
    
#     return token_error_rate

# # Example usage
# predicted = torch.tensor([
#     [[18, 19, 20], [21, 22, 23], [24, 25, 26]],
#     [[27, 28, 29], [30, 31, 32], [33, 34, 35]]
# ])

# target = torch.tensor([
#     [[18, 19, 20], [21, 22, 23], [24, 25, 26]],
#     [[27, 28, 29], [30, 31, 32], [33, 34, 36]],
#     [[27, 2outputs8, 29], [30, 31, 32], [33, 34, 36]]  # Note the last token is different
# ])

# token_error_rate = calculate_token_error_rate(predicted, target)
# print(f"Token Error Rate: {token_error_rate:.4f}")

### BUEEENOOO, after model definition, model creation and training?

In [3]:
from gaby_deeplearning.model_architecture import *
# import wandb
import pickle





device = torch.device("cpu")

## embdedding size Según paper hay 128, 356
hidden_size = [128, 256]

##number of transformer blocks., hyper parameter to choose.
num_layers = [(1, 1), (2, 3)]

## number of heads, Según paper2 4
num_heads = 4

## segun paper, iteration=110
num_iterations = 110

# learning rate segun paper2
learning_rate = 0.0005

# num_embeddings, interesting question (paper recommends 2048)
num_embeddings = [1024, 2048, 4096]




##loading pickle for input and outputimport wandb


positional_encodings = ["sinusoidal", "libro", None]


##loading pickle for input and output
all_inputs = pickle.load(open('all_songs_split.pkl', 'rb'))
all_inputs.keys()

dict_keys(['input', 'output'])

: 

### Creating an embeddor for multiple token types

In [4]:





train_dataset = MusicDatasetComplete(all_inputs['input']['train_']['all_parts'], all_inputs['output']['train_']['all_parts'])






NUM_INPUT_TOKENS = 7
NUM_OUTPUT_TOKENS = 6
FEATURES_INSIDE_ARRAY=6

test_dataset = MusicDatasetComplete(all_inputs['input']['test_']['all_parts'], all_inputs['output']['test_']['all_parts'])
val_dataset = MusicDatasetComplete(all_inputs['input']['test_']['all_parts'], all_inputs['output']['test_']['all_parts'])
train_dataloader = DataLoader(train_dataset, batch_size=48, shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=48, shuffle=False)
val_dataloader = DataLoader(val_dataset, batch_size=48, shuffle=False)

# print(next(iter(train_dataloader)))
epochs = 100
learning_rate = 0.001

for h_s in hidden_size:
    for n_l in num_layers:

        for p_enc in positional_encodings:


                

            model = EncoderDecoderv1(
                number_of_tokens_input=NUM_INPUT_TOKENS,
                number_of_tokens_output=NUM_INPUT_TOKENS,
                feature_size=FEATURES_INSIDE_ARRAY,
                
                hidden_size=h_s,
                num_heads=num_heads,
                num_layers=n_l,
                positional_encoding=p_enc,
            )
            model.to(device)
            print(model)

            ## wandb
            # wandb.init(
            #     # set the wandb project where this run will be logged
            #     project="model_music_train",
            #     config={print
            #         "learning_rate": 0.001,
            #         "architecture": "CNN",
            #         "hidden_size": h_s,
            #         "num_layers": n_l,
            #         "num_heads": num_heads,
            #         "num_embeddings": n_emb,
            #         "position_encoding": p_enc,
            #     },
            # )

            ## Choosing cross entropy because chord thing does so
            criterion = torch.nn.CrossEntropyLoss()

            ## optimizer
            optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

            
            for epoch in range(epochs):
                model.train()
                running_loss = 0.0
                for batch_indx, (input_seq, output_seq) in enumerate(train_dataloader):
                    input_seq, output_seq = input_seq.to(device), output_seq.to(device)
                    print(input_seq.shape)
                    print(output_seq.shape)
                    # break


                    
                    optimizer.zero_grad()

                    # Forward pass
                    decoded_seq = model(input_seq,output_seq)

                    print('shapes')
                    print(decoded_seq.shape)
                    print(output_seq.shape)

                                        # Reshape decoded_seq and output_seq
                    decoded_seq = decoded_seq.view(-1, decoded_seq.size(-1))  # Shape: [batch_size * sequence_length, num_classes]
                    output_seq = output_seq.view(-1, output_seq.size(-1))  # Shape: [batch_size * sequence_length, num_classes]
                    # Print shapes after reshaping
                    print("decoded_seq shape after reshaping:", decoded_seq.shape)
                    print("output_seq shape after reshaping:", output_seq.shape)

                    # Compute the loss

                    loss = criterion(
                       decoded_seq, output_seq
                    )

                    # Backward pass and optimize
                    loss.backward()
                    optimizer.step()

                    running_loss += loss.item()

                # Log the average loss for this epoch
                avg_loss = running_loss / len(dataloader)
                # wandb.log({"epoch": epoch + 1, "loss": avg_loss})
                print(f"Epoch [{epoch + 1}/{epochs}], Loss: {avg_loss:.4f}")

            # wandb.finish()

EncoderDecoderv1(
  (encoder): Encoder(
    (musical_embedding): MusicalEmbeddings(
      (feature_extractor_embeddings): Embedding(7, 128)
      (linear_reg_features): Linear(in_features=6, out_features=128, bias=True)
      (concat_layer): Linear(in_features=256, out_features=128, bias=True)
    )
    (pos_embedding): SinusoidalPosEmbedding()
    (blocks): ModuleList(
      (0): TransformerBlock(
        (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.2, inplace=False)
        (attn1): AttentionBlock(
          (multihead_attention): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=128, out_features=128, bias=True)
          )
        )
        (norm_mlp): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
        (dropout_mlp): Dropout(p=0.2, inplace=False)
        (mlp): Sequential(
          (0): Linear(in_features=128, out_features=512, bias=True)
          (1): ReLU()
          (2): Linear(i

: 

: 

The statement "we apply different embeddings for the different features sharing the same axis" means that for each type of token (e.g., note_token and chord_token), you use separate embedding layers to transform the tokens into their respective embeddings. This is useful when different types of tokens have different meanings and should be embedded differently.

In your example, each token is characterized by the first index, and the other indices contain different information. You can use separate embedding layers for note_token and chord_token to ensure that each type of token is embedded appropriately.

Example Implementation
Let's assume you have two types of tokens: note_token and chord_token. You can create separate embedding layers for each type and apply them accordingly.

### Examplecode might use

In [None]:
import torch
import torch.nn as nn

class MultiFeatureEmbedding(nn.Module):
    def __init__(self, num_note_embeddings, num_chord_embeddings, hidden_size):
        super(MultiFeatureEmbedding, self).__init__()
        self.note_embedding = nn.Embedding(num_note_embeddings, hidden_size)
        self.chord_embedding = nn.Embedding(num_chord_embeddings, hidden_size)

    def forward(self, tokens):
        # tokens: (batch_size, seq_len, feature_dim)
        batch_size, seq_len, feature_dim = tokens.shape

        # Separate the tokens based on the first index
        note_tokens = tokens[:, :, 0]  # Assuming the first index indicates the type
        chord_tokens = tokens[:, :, 1]  # Assuming the second index indicates the type

        # Apply the respective embeddings
        note_emb = self.note_embedding(note_tokens)
        chord_emb = self.chord_embedding(chord_tokens)

        # Combine the embeddings (e.g., concatenate or add)
        combined_emb = note_emb + chord_emb  # Example: element-wise addition

        return combined_emb

# Example usage
num_note_embeddings = 10
num_chord_embeddings = 10
hidden_size = 128

model = MultiFeatureEmbedding(num_note_embeddings, num_chord_embeddings, hidden_size)

# Example input tensor (batch_size, seq_len, feature_dim)
input_tokens = torch.tensor([
    [[1, 2, 3, 4], [2, 2, 3, 4]],  # Batch 1
    [[1, 2, 3, 4], [2, 2, 3, 4]]   # Batch 2
])

output = model(input_tokens)
print(output.shape)  # Output shape: (batch_size, seq_len, hidden_size)

In [None]:
## Interesting statement occured, ddifferent tokens are  embeddeded in a different space.


: 

<a href='https://youtu.be/Kujr55fNbGo?si=tw16yQxev55ezIDx'>Fundamental</a>

<small>Descripción tiene la letra.</small>