a question about image mask #1

aaronma2020 · 2021-12-16T07:32:24Z

In train.py(103-110)

103: # for image
104: _visual_mask = torch.zeros((batch_size, visual_token_num), dtype=torch.float32, device=device)
105: # need to mask token content in selected_idx for prediction/generation
106: num_masks = random.randint(max(1, int(0.1 * visual_token_num)), visual_token_num)
107: selected_idx = random.sample(range(visual_token_num), num_masks)
108: _visual_mask[:, selected_idx] = 1
109: mask_position = (_visual_mask == 1).to(torch.long).view(-1)
110: mask_position = mask_position.nonzero().squeeze()

I think '_visual_mask = 1' means the model can see it, '_visual_mask = 0' is the opposite. The above codes randomly sample mask position, which selects which grid(8*8) the model can see(_visual_mask=1). The position that really needs to be masked is the position where the _visual_mask is equal to 0. So the code on line 109 should be changed to
mask_position = (_visual_mask == 0).to(torch.long).view(-1)
is this right?

The text was updated successfully, but these errors were encountered:

aaronma2020 closed this as completed Dec 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a question about image mask #1

a question about image mask #1

aaronma2020 commented Dec 16, 2021

a question about image mask #1

a question about image mask #1

Comments

aaronma2020 commented Dec 16, 2021