Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input_mask behavior #27

Open
AliOskooeiTR opened this issue May 19, 2021 · 0 comments
Open

input_mask behavior #27

AliOskooeiTR opened this issue May 19, 2021 · 0 comments

Comments

@AliOskooeiTR
Copy link

I have a question about how the input_mask works in RoutingTransformerLM. I have been using a random mask (with causal =False), as used in MLM and playing with the masking ratio but it appears that the ratio is not really affecting how the model learns. I even went to the extremes and masked 90% of the inputs and yet the model continued to learn rapidly. I am training the LM with HuggingFace Trainer. I am copying below my compute_loss method for reference. I have tested the mask itself and the input data and they're fine.

def compute_loss(self, model, inputs):

      model_dim = self.args.model_dim
      model_seq_len = self.args.model_seq_len

      source = inputs["input_ids"].to(self.args.device)
      input_mask = torch.ones_like(source).bool().to(self.args.device)
      masked_tokens = random.sample(
          range(source.shape[1]),
          int(self.args.mask_ratio*source.shape[1])
      )
      input_mask[0, masked_tokens] = torch.tensor(False).to(self.args.device)
          
      output, aux_loss = model(
          source,
          input_mask=input_mask,
          return_loss=True
      )
      loss = F.cross_entropy(
          output.transpose(1, 2),
          source
      ) + aux_loss

      return loss.squeeze()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant