Understanding adaptive-span loss #13
Comments
It's a model parameter, so it will be updated by optimizer.step() like any other parameter. |
Thanks for your reply. I wanted to ask:
|
|
Hi, |
We just counted all the flops in the model. For example, a linear layer has |
Thanks for your reply.
So in this case, I don't think memory consumption is being reduced, since now the dimensions have risen many fold, and more FLOPS are required. Am I right or am I missing something? So for now, I've removed this operation.
These results are noted during inference. Did you measure FLOPS (as per in the paper) during training (since spans only change during this process only) ? My spans are changing after some changes, but the FLOPS are same. Is it because trimming operations are solely responsible for reducing FLOPS ? |
As noted in the paper, |
Hi,
Sorry to bother you. I have gone through the paper several times. I've also looked at the code many times
I just had one query with adaptive span loss. Here's what I interpreted:
This parameter
self.current_val = nn.Parameter(torch.zeros(*shape) + init_val)
is responsible for calculating loss, mask and span.In this case, this parameter will be initialized with zero values since as per your config since
init_val
is kept as 0 (since the mean of all the values of the parameter will be 0).My question is how is this parameter getting updated ?
When I call
adaptive_span.get_loss()
, it in turn calls:self._loss_coeff * self._max_span * self._mask.current_val.mean()
which will also return 0.When I do :
adaptive_span.clamp_param()
, nothing will happen since all the values inside the parameter were initialized with 0. These are the only two function calls happening inside train method.Can you please point out what am I missing ?
The text was updated successfully, but these errors were encountered: