[MusicBERT]: Need help understanding loop in preprocess.F method #47

aspil · 2022-04-11T21:21:59Z

Hello!

I'm trying to fine-tune the pretrained model using another dataset, but I'm stuck at the loop block below.
I understand the final format of output_str_list, but I simply cannot get a grasp on what this code does, so I was hoping you could provide me with an explanation.

output_str_list = []
sample_step = max(round(sample_len_max / sample_overlap_rate), 1)
for p in range(0 - random.randint(0, sample_len_max - 1), len(e), sample_step):
    L = max(p, 0)
    R = min(p + sample_len_max, len(e)) - 1
    bar_index_list = [e[i][0] for i in range(L, R + 1) if e[i][0] is not None]
    bar_index_min = 0
    bar_index_max = 0
    if len(bar_index_list) > 0:
        bar_index_min = min(bar_index_list)
        bar_index_max = max(bar_index_list)
    offset_lower_bound = -bar_index_min
    offset_upper_bound = bar_max - 1 - bar_index_max
    # to make bar index distribute in [0, bar_max)
    bar_index_offset = random.randint(
        offset_lower_bound, offset_upper_bound) if offset_lower_bound <= offset_upper_bound else offset_lower_bound
    e_segment = []
    for i in e[L: R + 1]:
        if i[0] is None or i[0] + bar_index_offset < bar_max:
            e_segment.append(i)
        else:
            break
    tokens_per_note = 8
    output_words = (['<s>'] * tokens_per_note) \
        + [('<{}-{}>'.format(j, k if j > 0 else k + bar_index_offset) if k is not None else '<unk>') for i in e_segment for j, k in enumerate(i)] \
        + (['</s>'] * (tokens_per_note - 1)
           )  # tokens_per_note - 1 for append_eos functionality of binarizer in fairseq
    output_str_list.append(' '.join(output_words))

Also, in gen_genre.py, why do we want to sample the train set multiple times? Why do we need output_str_list four times?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

mlzeng · 2022-04-12T03:21:41Z

Hello @aspil,

Some octuple token sequences from the LMD dataset can be very long (more than 1024 octuple tokens), and the Transformer model cannot handle long sequences due to GPU memory size bound. So we use the sliding window style random sampling method to crop very long sequences into multiple shorter segments (which may overlap) for pre-training.
We randomly select multiple segments to avoid overfitting and wasting train data. Randomly cropping long sequences on-the-fly during training could be better, but it requires some code.
The performance of the model won't significantly degrade if only one segment is used for every sequence (n_time = 1).

Thanks for using MusicBERT!

aspil · 2022-04-15T16:12:11Z

Thanks for the reply!

aspil closed this as completed Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MusicBERT]: Need help understanding loop in preprocess.F method #47

[MusicBERT]: Need help understanding loop in preprocess.F method #47

aspil commented Apr 11, 2022

mlzeng commented Apr 12, 2022 •

edited

aspil commented Apr 15, 2022

[MusicBERT]: Need help understanding loop in preprocess.F method #47

[MusicBERT]: Need help understanding loop in preprocess.F method #47

Comments

aspil commented Apr 11, 2022

mlzeng commented Apr 12, 2022 • edited

aspil commented Apr 15, 2022

mlzeng commented Apr 12, 2022 •

edited