You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Weights of Sinusoidal Position Embedding should be static.
But after model.from_pretrained, the weights are changed. Even though weights of position embeddings does not appears in state dict.
Notes
I have the following Sinusoidal Position Embedding implementation
# MultiMolecule# Copyright (C) 2024-Present MultiMolecule# Copyright (C) 2020 The Facebook AI Research Team Authors# Copyright (C) 2020 The HuggingFace Inc. team.# This program is free software: you can redistribute it and/or modify# it under the terms of the GNU Affero General Public License as published by# the Free Software Foundation, either version 3 of the License, or# any later version.# This program is distributed in the hope that it will be useful,# but WITHOUT ANY WARRANTY; without even the implied warranty of# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the# GNU Affero General Public License for more details.# You should have received a copy of the GNU Affero General Public License# along with this program. If not, see <http://www.gnu.org/licenses/>.from __future__ importannotationsimportmathimporttorchimporttorch.onnx.operatorsfromtorchimportTensor, nnclassSinusoidalEmbedding(nn.Embedding):
""" This module produces sinusoidal positional embeddings of any length. We don't want to save the weight of this embedding since it's not trained (deterministic) and it can be huge. Padding symbols are ignored. These embeddings get automatically extended in forward if more positions is needed. """# _is_hf_initialized = Truedef__init__(self, num_embeddings: int, embedding_dim: int, padding_idx: int|None=None):
weight=self.get_embedding(num_embeddings, embedding_dim, padding_idx)
super().__init__(num_embeddings, embedding_dim, padding_idx, _weight=weight.detach(), _freeze=True)
defupdate_weight(self, num_embeddings: int, embedding_dim: int, padding_idx: int|None=None):
weight=self.get_embedding(num_embeddings, embedding_dim, padding_idx).to(
dtype=self.weight.dtype, device=self.weight.device# type: ignore[has-type]
)
self.weight=nn.Parameter(weight.detach(), requires_grad=False)
@staticmethoddefget_embedding(num_embeddings: int, embedding_dim: int, padding_idx: int|None=None) ->Tensor:
""" Build sinusoidal embeddings. This matches the implementation in tensor2tensor, but differs slightly from the description in Section 3.5 of "Attention Is All You Need". """half_dim=embedding_dim//2emb=torch.exp(torch.arange(half_dim, dtype=torch.float) *-(math.log(10000) / (half_dim-1)))
emb=torch.arange(num_embeddings, dtype=torch.float).unsqueeze(1) *emb.unsqueeze(0)
emb=torch.cat([torch.sin(emb), torch.cos(emb)], dim=1).view(num_embeddings, -1)
ifembedding_dim%2==1:
# zero pademb=torch.cat([emb, torch.zeros(num_embeddings, 1)], dim=1)
ifpadding_idxisnotNone:
emb[padding_idx, :] =0returnemb@staticmethoddefmake_positions(tensor, padding_idx: int):
""" Replace non-padding symbols with their position numbers. Position numbers begin at padding_idx+1. Padding symbols are ignored. """# The series of casts and type-conversions here are carefully# balanced to both work with ONNX export and XLA. In particular XLA# prefers ints, cumsum defaults to output longs, and ONNX doesn't know# how to handle the dtype kwarg in cumsum.mask=tensor.ne(padding_idx).int()
return (torch.cumsum(mask, dim=1).type_as(mask) *mask).long() +padding_idxdefforward(self, input: Tensor):
_, seq_len=input.shape[:2]
max_pos=seq_lenifself.padding_idxisnotNone:
max_pos+=self.padding_idx+1ifmax_pos>self.weight.size(0):
# expand embeddings if neededself.update_weight(max_pos, self.embedding_dim, self.padding_idx)
positions=self.make_positions(input, self.padding_idx)
returnsuper().forward(positions)
The current workaround is to overwrite _load_from_state_dict, as from_pretrained do not call load_state_dict.
I'm still inspecting where it is changed.
System Info
transformers
version: 4.41.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
model.from_pretrained
Expected behavior
Weights of Sinusoidal Position Embedding should be static.
But after
model.from_pretrained
, the weights are changed. Even though weights of position embeddings does not appears in state dict.Notes
I have the following Sinusoidal Position Embedding implementation
This should be equivalent to the one used in msft
So this issue should also apply to methods in transformers library.
The text was updated successfully, but these errors were encountered: