-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in running the evaluation script #10
Comments
What's the error message here? (if there is none, what does it say when you exit with ctrl+c?) |
There isn't any error messages. Anf if I do ctrl + c this is displayed 1%|▉ | 16/2064 [5:48:32<743:32:32, 1307.01s/it] |
I see, it looks like it's having trouble reading one of the URLs for image retrieval. Could you try pulling from HEAD (c7de07a) and seeing if it works? I've disabled loading of retrieval embeddings by default since we don't need them for evals. |
For Visdial I am getting the error Traceback (most recent call last): |
Sorry, this should be fixed with d85ad06. Not sure why I didn't catch it when I ran the eval earlier. |
Thank you for the help kohjingyu What is the maximum epochs you have trained to get the final result reported in the paper As I am not able to reproduce the same number given in the table. OR Is there some other issues which can create this problem and what is the image size have you used fro calculating the LPIPS score OR what is the image resize operation have you used like cv2, F.intrepolate or PIL resize LPIPS SCore(VIST) - reproduced (0.7314) Clip Score(VIST) - (reproduced (0.64018) |
Was this a model you trained yourself? The models we released were trained as follows:
Since the CLIP scores you have are similar to those of the paper, it seems like the issue might be with resizing for LPIPS. We have to resize them to 256x256 since the model being used is AlexNet. We used the torchvision resize for this: Lines 35 to 36 in 232eb02
|
Thank you for helping me out. I have obtained the equivalent numbers as reported in the paper. |
Those tables are mostly ablation results and we probably won't be releasing the scripts for those. For the contextual image retrieval eval, you can refer to the FROMAGe repo for instructions. |
@kohjingyu How many iterations did you have per epoch? (you highlighted 20k iterations with a batch size of 200) Was it 200 iterations/epoch for a total of 100 epochs? |
The epoch doesn't really matter since the data is randomly shuffled. I think it only affects how often the evals are run. I think I used 2000 iterations / epoch for 10 epochs, but in principle the iterations * batch_size is the only one that affects the final results (i.e., the model should see ~4m image-text pairs). Hope that makes sense! |
Thanks for your answer, I was also trying to figure out the number of image-text pairs you used for each epoch in your training. In this setup, your model saw 400k randomly selected image-text pairs from the training set in each epoch, right? |
That’s correct. |
I still have the problem in running the evaluation script.
After certain point of the iteration the code is getting stuck.
Saving to /home/mbzuaiser/gill/gill_vist_outputs/514809043.png████████████████████████████████████████████▉ | 49/50 [00:02<00:00, 17.78it/s]100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:02<00:00, 17.43it/s]
Saving to /home/mbzuaiser/gill/gill_vist_outputs/514808431.png ████████████████████████████████████████████▉ | 49/50 [00:02<00:00, 17.76it/s]
6%|█████▍ | 279/4990 [51:08<10:47:19, 8.24s/it]
In both the case of VIST and VisDial.
As you have given the solution earlier I have [add an except for OSError](). Still I am facing the same issue.
It would be great if you can please help me in this
The text was updated successfully, but these errors were encountered: