Clarification Regarding "All Patch Representations" in the Pre-training Diagram

I hope this email finds you well. I am currently working on understanding your work regarding the pre-training procedure for masked audio prediction, and I have encountered a question related to the "All Patch Representations" in the figure of your paper.
Specifically, I am referring to the notation {M} in the "Label Predictor" block. I would appreciate it if you could clarify the following points:
If {M} refers to directly using the masked feature as it is, I am concerned that the dimensions may not align properly. 
Alternatively, if I follow the description provided in the paper, should the {M} be all zeros as part of the masking process? I am unsure if my understanding of this aspect is correct, and I would greatly appreciate your confirmation.

Thank you for your time, and I look forward to your response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification Regarding "All Patch Representations" in the Pre-training Diagram #1689

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification Regarding "All Patch Representations" in the Pre-training Diagram #1689

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions