-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on how to evaluate inherited SubTransformers. #7
Comments
Hi ihish52, Thanks for your question! You can download the SuperTransformer checkpoints and then test with the following command:
Note that the models used in Figure 5 are not exactly the models we released, so the BLEU numbers may have some small differences. Best, |
Hi Hanrui-Wang, Thanks for the quick reply on this. I have run the exact example you provided above (testing the inherited SubTrainsformer for HAT_wmt14ende_raspberrypi@6.0s_bleu@28.2.yml). The code runs into two problems:
I am using the downloaded SuperTransformer checkpoint and the provided config files, so nothing has been changed from the project. Do you have any idea what may be causing this? Below is the full terminal output for the example command you provided: TransformerSuperModel( |
Hi Hanrui-Wang, Thanks for getting back to me. The system I am running this on does have a GPU and Pytorch recognizes it (torch.cuda.current_device()). It's a Jetson Nano with an NVIDIA Tegra X1 GPU. However, I still run into those errors. I have also tested the above example command on a Linux server with 4 1080Ti GPUs, the same RuntimeError and ZeroDivisionError pops up. I have also changed generate.py as you mentioned before running the code. Is there something else that could be causing this? Looking forward to your reply. Thanks! |
Hi ihish52, Can you try to set args.fp16 always to False and see whether the issue still exists? The ZeroDivisionError error is because after having the RuntimeError, there is no generated translations. So if the RuntimeError is fixed, ZeroDivisionError will disappear. Best, |
Hi Hanrui-Wang, As suggested, I set args.fp16 to False always in generate.py but the same error persists. Would you have any other suggestions? Would changing the tensor type for the torch.div() line in search.py be necessary to fix this? Thanks for your help. |
Hi ihish52, I cannot reproduce the error, but you may try replacing the search.py line 81 with this: self.beams_buf = torch.div(self.indices_buf, vocab_size).type_as(self.beams_buf) Best, |
Hi Hanrui-Wang, Made the change to line 81 as suggested. Indeed the code did run and I was able to obtain a reasonable BLEU score! As the tensor type was changed, this warning pops us several times for each iteration of the script: UserWarning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [7].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at /opt/conda/conda-bld/pytorch_1603729021865/work/aten/src/ATen/native/Resize.cpp:19.)_ Do you think this would affect the output of the script (bleu score calculation)? Thanks for the advice. |
Hi ihish52, Great! bash configs/wmt14.en-de/test.sh ./downloaded_models/HAT_wmt14ende_super_space0.pt configs/wmt14.en-de/subtransformer/HAT_wmt14ende_raspberrypi@6.0s_bleu@28.2.yml Best, |
Hi Hanrui-Wang, Yes, I can confirm a BLEU score of 25.99 when I run the above command. Thanks for your help with this. Regards, |
Hi Hishan, I will close the issue for now. Feel free to reopen if you have any further questions! Best, |
Hi,
Table 5 in the paper mentions that the "Inherited" BLEU score is similar to the "From-Scratch" BLEU score.
Can you please specify which part of the code can be used to run inference/test the inherited SubTransformers (without training from scratch) or how to use the code to perform this task?
In other words, I would like to know how to test translations from a specific SubTransformer in the SuperTransformer design space without training the model again from scratch.
Hope you can point me in the right direction on this. Thanks for your help.
The text was updated successfully, but these errors were encountered: