So confused about the return value of the get_loss_img2text_image function in the Trainer file. #1

CCaoWWei · 2024-07-03T12:09:07Z

Thank you for open-sourcing the code. This article has been very insightful and inspiring to me. However, I have some questions while reviewing the code.

Q1：This function currently appears to only have the Lc loss from the paper and does not include the loss from the Lr component.

Q2：In the get_loss_img2text function, the loss and extra_loss within the if branch do not correspond to those in the else branch.

Q1：

Q2：

suoych · 2024-07-03T12:30:12Z

Thank you for open-sourcing the code. This article has been very insightful and inspiring to me. However, I have some questions while reviewing the code.

Q1：This function currently appears to only have the Lc loss from the paper and does not include the loss from the Lr component.

Q2：In the get_loss_img2text function, the loss and extra_loss within the if branch do not correspond to those in the else branch.

Q1：

Q2：

Hi, Thank you for your interest!

First, I apologize for the raw code presented in the repo, promise I will reformat and add more docstrings.

For the first question, in our real implementation, we seperately train the two branches (Each with 4 GPUs). The get_loss_img2text_image function is the image-only contrastive branch and the get_loss_img2text is the textual alignment branch.

As for the second question, we only train the model using 4 cards thus you can always refer to the if branch. The else branch is only used for debug which can be ignored.

CCaoWWei · 2024-07-04T05:25:39Z

Thank you for open-sourcing the code. This article has been very insightful and inspiring to me. However, I have some questions while reviewing the code.
Q1：This function currently appears to only have the Lc loss from the paper and does not include the loss from the Lr component.
Q2：In the get_loss_img2text function, the loss and extra_loss within the if branch do not correspond to those in the else branch.
Q1：
Q2：

Hi, Thank you for your interest!

First, I apologize for the raw code presented in the repo, promise I will reformat and add more docstrings.

For the first question, in our real implementation, we seperately train the two branches (Each with 4 GPUs). The get_loss_img2text_image function is the image-only contrastive branch and the get_loss_img2text is the textual alignment branch.

As for the second question, we only train the model using 4 cards thus you can always refer to the if branch. The else branch is only used for debug which can be ignored.

I understand now. Thank you for your response. ^_^

suoych closed this as completed Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

So confused about the return value of the get_loss_img2text_image function in the Trainer file. #1

So confused about the return value of the get_loss_img2text_image function in the Trainer file. #1

CCaoWWei commented Jul 3, 2024

suoych commented Jul 3, 2024

CCaoWWei commented Jul 4, 2024

So confused about the return value of the get_loss_img2text_image function in the Trainer file. #1

So confused about the return value of the get_loss_img2text_image function in the Trainer file. #1

Comments

CCaoWWei commented Jul 3, 2024

suoych commented Jul 3, 2024

CCaoWWei commented Jul 4, 2024