Positional embedding weight #5

StevenLau6 · 2022-09-30T20:41:46Z

Hi @luyang-huang96, thanks so much for posting the code. I noticed that the function align_embed_position keeps the first 1026 tokens' positional embedding weight and concatenates the same weight for tokens after 1026.

LongDocSum/Model/longbart/longbartmodel.py

Lines 278 to 285 in d2b9bd0

    
           def align_embed_position(self): 
        
               self.embed_positions_new.weight.data[:1026, :] = self.embed_positions.weight.data 
        
               self.embed_positions_new.weight.data[1026:, :] = self.embed_positions.weight.data[-1][None, :].repeat(self.max_source_positions-1024, 1) 
        
               if self.section: 
        
                   self.embed_section.weight.data[4:1028, :] = self.embed_positions.weight.data[2:, :] 
        
                   self.embed_section.weight.data[0:2, :] = self.embed_positions.weight.data[0:2, :] 
        
                   self.embed_section.weight.data[1028:, :] = self.embed_positions.weight.data[-1][None, :].repeat(self.max_source_positions-1026, 1) 
        
               # self.embed_positions = self.embed_positions_new

I have two questions:

Considering the 1026th token can be the eos token, I wonder whether it should keep the first 1025 tokens' positional embedding weight.
Why not copy the first 1026 tokens' positional embedding weight for the tokens after 1026, like people discussed in
Does BART support more than 1024 tokens in inference of summarization task? facebookresearch/fairseq#1685 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Positional embedding weight #5

Positional embedding weight #5

StevenLau6 commented Sep 30, 2022 •

edited

Positional embedding weight #5

Positional embedding weight #5

Comments

StevenLau6 commented Sep 30, 2022 • edited

StevenLau6 commented Sep 30, 2022 •

edited