Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about the x dimension. #26

Closed
tianjunyu0871 opened this issue Feb 14, 2022 · 7 comments
Closed

A question about the x dimension. #26

tianjunyu0871 opened this issue Feb 14, 2022 · 7 comments

Comments

@tianjunyu0871
Copy link

When I train only transformer mapping network,I found that the dimension of x is(40 , 512),but prefix_dim = 640.I don't know why this is happening. Is it caused by the extraction of clip features? Hope to get your help, thank you.
image

@rmokady
Copy link
Owner

rmokady commented Feb 14, 2022

Hi @tianjunyu0871 ,
There is two version of CLIP (Resnet and VIT)
Their encoding size is different - 500 and 640
I assume this is your issue

It should be solvable using different command line arguments

Is it helpful?

@tianjunyu0871
Copy link
Author

Thanked your reply.
Does the parameter is_rn represent resnet?But the following command appears is_rn?Is it a clerical error?
image
In addition, can you share the pre-training weights of MLP and the program evaluation code? Thank you so much!!

@rmokady
Copy link
Owner

rmokady commented Feb 15, 2022

Yes this is an error
Thank you very much for pointing it out
I will fix it ASAP

We use the evaluation code as used in the OSCAR repository
Just replacing the JSON files with our JSONs

We already shared the weights of MLP - see "Inference Notebooks" section in the readme.

@tianjunyu0871
Copy link
Author

I tried to modify the prediction code and the following error occurred while loading the pre-trained Transformer data.
image
I don't know if there is a problem with my code. Can you share your code for forecasting with Transformer? Thank you very much!

@rmokady
Copy link
Owner

rmokady commented Feb 19, 2022

Prediction with transformer is available in this notebook

@tianjunyu0871
Copy link
Author

I have gained a lot from your work, but I still have a few questions, and I hope to get your answers.
First question: I tried to remove the stoptoken, but the effect is not good, is there a good way to generate more than one sentence?
Second question: Have you tried using different GPT models? Such as GPT2-medium or GPT2-large . Is the difference significant?
Third question: what does the prefix_length_clip parameter mean in training?
Looking forward to your reply, thank you very much!

@rmokady
Copy link
Owner

rmokady commented Mar 6, 2022

To generate more than one sentence you should replace the inference algorithm (e.g. beam search)
Using a variants of beam search you can produce different captions.

We haven't tried to use different GPT models.

prefix_length_clip control the transformer mapping network - size (in tokens) of the clip embedding, as some of the prefix is a learned const.

@rmokady rmokady closed this as completed May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants