-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low accuracy re-train with the provided config files of the pretrained models #8
Comments
Hi, thank you for your interests. Were you able to evaluate the pretrained model? If so, could you share the evaluated results? Thanks. |
I have already evaluated the pretrained models provided by your project, and it's the same as your paper.
But training with the same hps.json provided by the pretrained models, the result is too poor.. |
The pretrained models are trained with 4 GPUs (each of 16GB). Therefore, the effective batch size is 64x4 = 256. Since you are using 2GPUs, the effective batch size for you is 64x2 = 128. My assumption is that the learning rate may be too big for your batch size. |
FYI, if you run into similar errors as follows during evaluating, please pull the repo again.
Sorry about the inconvenience. |
I sincerely appreciate your reply.
No errors encountered at the evaluation phase. But during training, one error (just epoch 0 will occur) is shown as:
Then the model will continue to train as normal. I am not sure whether it affects performance? |
Thank you for your patience. I have never seen this error before. It seems to happen inside tqdm package, I don't think it would affect the model performance. One thing I do notice is that the train_loss and norm are extremely large at epoch 0. Usually they are around 10, not 100, 000. Can you share the exact config file and cmd you used for training? I will try to run it at my end to replicate the error. Thanks! |
Here is my detail config (Cause *.json can not upload, I have changed the filename to hps.txt): Did you use some pretrained models as the initial parameters of network? |
I am running the same codes. But I get better results than yours. I run 20 epochs. But I think more epochs should be run. |
Probably lr should be adjusted. |
Closed due to inactivity. The aforementioned error is not reproducible on my end. |
How to encode the relation type into the explicit encoder. I found the code didn't represent it. If you have answer, please help me. I am sorry to bother you. I guess the label of relation type is from datasets and the code didn't include the auxiliary classifier for the 15 semantic type and 11 geo type. |
To replicate results from our paper, please follow the instructions to download the exact data. For spatial adj matrix, please refer to #9 . |
Thank you for sharing your perfect job.
I use the
pretrained_models/regat_implicit/ban_1_implicit_vqa_196/hps.json
for the training phase with 2 GPU, each 10GB. The CUDA version is 10.0 and python is 3.6. All datasets are downloaded.The trained dataset is VQA 2.0 dataset .
Here is my detail config:
The results are very poor. After 20 epochs, the log.txt shows:
I also train the dataset using
pretrained_models/regat_implicit/butd_implicit_vqa_6371
, and it reach 58Can you give me some advice about reproducing the accuracy score of the paper?
The text was updated successfully, but these errors were encountered: