-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated code with load_state_dict #9
Comments
Hi, the cloning and copying was actually a workaround for the older version of PyTorch where it was not allowed to call backward within a custom backward function (recall that a DEQ's backward requires vector-Jacobian products). With the latest versions of PyTorch, this code can be significantly simplified and we no longer need files such as That being said, if you are interested in what the new implementation would be like (with hook and |
Thanks for the quick response! I will wait for the updated code then, I want to use the WikiText-103 Transformer code and the tutorial code has only some basic image classification examples. Looking forward to when the major renovation comes, it would be super helpful :) Thank you! |
Hi @sarthmit, I have updated the code to the After you check out to the bash run_wt103_deq_transformer.sh train --debug --data ../data/wikitext-103 --f_thres 30 --eval --load [PRETRAINED_FILE].pkl --mem_len 300 --pretrain_step 0 It should give you something like 23.2ppl on WT103. Please let me know if you have any issue running with this new implementation!! |
@sarthmit I'm closing this issue but if you have trouble with the cleaner version code, feel free to re-open it! |
Hi, I wanted to know more about the codebase. Why does it have cloning and copying of data instead of just having one Transformer + Residual layer that is iteratively queried?
Also, do you have a more up to date codebase that uses load_state_dict instead of cloning and copying?
Thank you so much!
The text was updated successfully, but these errors were encountered: