Question about ITG #714

jhkwag970 · 2024-06-18T19:41:39Z

    ##================= Image Captioning ========================##
    decoder_input_ids = text_tokens.input_ids.clone()
    decoder_input_ids[:, 0] = self.tokenizer.bos_token_id
    labels = decoder_input_ids.masked_fill(
        decoder_input_ids == self.tokenizer.pad_token_id, -100
    )


    query_atts = torch.ones(query_tokens.size()[:-1], dtype=torch.long).to(
        image.device
    )
    attention_mask = torch.cat([query_atts, text_tokens.attention_mask], dim=1)
    lm_output = self.Qformer(
        decoder_input_ids,
        attention_mask=attention_mask,
        past_key_values=query_output.past_key_values,
        return_dict=True,
        labels=labels,
    )


    loss_lm = lm_output.loss

Hello, Thank you for your great work!

As I am working on reviewing the implementation, I came up with a question about ITG.

Is Image captioning loss above consider as ITG in the paper?

Then, is it possible to further enhance LLM result by using captioning result from ITG?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about ITG #714

Question about ITG #714

jhkwag970 commented Jun 18, 2024

Question about ITG #714

Question about ITG #714

Comments

jhkwag970 commented Jun 18, 2024