Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MusicBERT]: Need help understanding loop in preprocess.F method #47

Closed
aspil opened this issue Apr 11, 2022 · 2 comments
Closed

[MusicBERT]: Need help understanding loop in preprocess.F method #47

aspil opened this issue Apr 11, 2022 · 2 comments

Comments

@aspil
Copy link

aspil commented Apr 11, 2022

Hello!

I'm trying to fine-tune the pretrained model using another dataset, but I'm stuck at the loop block below.
I understand the final format of output_str_list, but I simply cannot get a grasp on what this code does, so I was hoping you could provide me with an explanation.

output_str_list = []
sample_step = max(round(sample_len_max / sample_overlap_rate), 1)
for p in range(0 - random.randint(0, sample_len_max - 1), len(e), sample_step):
    L = max(p, 0)
    R = min(p + sample_len_max, len(e)) - 1
    bar_index_list = [e[i][0] for i in range(L, R + 1) if e[i][0] is not None]
    bar_index_min = 0
    bar_index_max = 0
    if len(bar_index_list) > 0:
        bar_index_min = min(bar_index_list)
        bar_index_max = max(bar_index_list)
    offset_lower_bound = -bar_index_min
    offset_upper_bound = bar_max - 1 - bar_index_max
    # to make bar index distribute in [0, bar_max)
    bar_index_offset = random.randint(
        offset_lower_bound, offset_upper_bound) if offset_lower_bound <= offset_upper_bound else offset_lower_bound
    e_segment = []
    for i in e[L: R + 1]:
        if i[0] is None or i[0] + bar_index_offset < bar_max:
            e_segment.append(i)
        else:
            break
    tokens_per_note = 8
    output_words = (['<s>'] * tokens_per_note) \
        + [('<{}-{}>'.format(j, k if j > 0 else k + bar_index_offset) if k is not None else '<unk>') for i in e_segment for j, k in enumerate(i)] \
        + (['</s>'] * (tokens_per_note - 1)
           )  # tokens_per_note - 1 for append_eos functionality of binarizer in fairseq
    output_str_list.append(' '.join(output_words))

Also, in gen_genre.py, why do we want to sample the train set multiple times? Why do we need output_str_list four times?

Thanks in advance!

@mlzeng
Copy link
Collaborator

mlzeng commented Apr 12, 2022

Hello @aspil,

  1. Some octuple token sequences from the LMD dataset can be very long (more than 1024 octuple tokens), and the Transformer model cannot handle long sequences due to GPU memory size bound. So we use the sliding window style random sampling method to crop very long sequences into multiple shorter segments (which may overlap) for pre-training.
  2. We randomly select multiple segments to avoid overfitting and wasting train data. Randomly cropping long sequences on-the-fly during training could be better, but it requires some code.
  3. The performance of the model won't significantly degrade if only one segment is used for every sequence (n_time = 1).

Thanks for using MusicBERT!

@aspil
Copy link
Author

aspil commented Apr 15, 2022

Thanks for the reply!

@aspil aspil closed this as completed Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants