Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry on Audio Prompts Implementation in musicgen Model #451

Open
LiuZH-19 opened this issue Apr 22, 2024 · 0 comments
Open

Inquiry on Audio Prompts Implementation in musicgen Model #451

LiuZH-19 opened this issue Apr 22, 2024 · 0 comments

Comments

@LiuZH-19
Copy link

I am currently exploring the musicgen model and have some questions regarding the application of audio prompts within the model's architecture, particularly in relation to the cross_attention layers:

  1. Role of Audio Prompts: Is the audio prompt used as a cross-attention signal within the cross_attention layers of the musicgen model?
musicgen :
  (transformer): StreamingTransformer(
    (layers): ModuleList(
      (0-47): 48 x StreamingTransformerLayer(
        (self_attn): StreamingMultiheadAttention(
          (out_proj): Linear(in_features=2048, out_features=2048, bias=False)
        )
        (linear1): Linear(in_features=2048, out_features=8192, bias=False)
        (dropout): Dropout(p=0.0, inplace=False)
        (linear2): Linear(in_features=8192, out_features=2048, bias=False)
        (norm1): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.0, inplace=False)
        (dropout2): Dropout(p=0.0, inplace=False)
        (layer_scale_1): Identity()
        (layer_scale_2): Identity()
        (cross_attention): StreamingMultiheadAttention(
          (out_proj): Linear(in_features=2048, out_features=2048, bias=False)
        )
        (dropout_cross): Dropout(p=0.0, inplace=False)
        (norm_cross): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (layer_scale_cross): Identity()
      )
    )
  )
  1. Request for Training Code: Could you provide examples or documentation on how to properly use audio prompts as model inputs during training?

Thank you for your time and assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant