How to convert Tokens to ids correctly #140

nuistZPZ · 2024-05-27T07:15:54Z

我使用self.tokenizer.convert_tokens_to_ids尝试将模型输出的text_feat转化为input_id进而转化为文本，代码如下所示：

text_output = self.text_encoder.bert(text.input_ids, attention_mask=text.attention_mask,
                                             return_dict=True, mode='text')

text_embeds = text_output.last_hidden_state
text_feat = F.normalize(self.text_proj(text_embeds[:, 0, :]), dim=-1)

input_ids = self.tokenizer.convert_tokens_to_ids(text_feat[0])
# 转换 `input_id` 为文本
decoded_text = self.tokenizer.decode(input_ids, skip_special_tokens=True)
print('decoded_text', decoded_text)

但是输出结果一直有误，要么全部得到[PAD]，或者得到[100, 100]。我检查了Token的值，发现他们并不一样，我觉得是我代码出了问题，我想知道正确的做法应该是什么。

----translation-----
I use self.tokenizer.convert_tokens_to_ids to try to convert the text_feat output of the model to input_id and then to text, as follows:

But the output keeps getting wrong, either all [PAD] or all [100, 100]. I checked the value of the Token and found that they were not the same, I felt that there was something wrong with my code and I wanted to know what the right thing to do should be.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert Tokens to ids correctly #140

How to convert Tokens to ids correctly #140

nuistZPZ commented May 27, 2024

How to convert Tokens to ids correctly #140

How to convert Tokens to ids correctly #140

Comments

nuistZPZ commented May 27, 2024