Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated code with load_state_dict #9

Closed
sarthmit opened this issue Feb 11, 2021 · 4 comments
Closed

Updated code with load_state_dict #9

sarthmit opened this issue Feb 11, 2021 · 4 comments

Comments

@sarthmit
Copy link

Hi, I wanted to know more about the codebase. Why does it have cloning and copying of data instead of just having one Transformer + Residual layer that is iteratively queried?

Also, do you have a more up to date codebase that uses load_state_dict instead of cloning and copying?

Thank you so much!

@jerrybai1995
Copy link
Member

Hi, the cloning and copying was actually a workaround for the older version of PyTorch where it was not allowed to call backward within a custom backward function (recall that a DEQ's backward requires vector-Jacobian products).

With the latest versions of PyTorch, this code can be significantly simplified and we no longer need files such as deq.py or deq_transformer_module.py, and there will be no need to clone or copy. Instead, we can use backward hook and autograd.grad. I have been planning to make a major renovation to the repo but never got the chance to do so. Maybe I'll do that in the next few days ;-)

That being said, if you are interested in what the new implementation would be like (with hook and autograd.grad), you can take a look at the tutorial code from NeurIPS 2020: http://implicit-layers-tutorial.org/deep_equilibrium_models/

@sarthmit
Copy link
Author

Thanks for the quick response! I will wait for the updated code then, I want to use the WikiText-103 Transformer code and the tutorial code has only some basic image classification examples. Looking forward to when the major renovation comes, it would be super helpful :)

Thank you!

@jerrybai1995
Copy link
Member

Hi @sarthmit,

I have updated the code to the beta branch of this repo. Since only the implementation for the Transformer instantiation is available now, and I haven't been able to fully test out the cleaner implementation on all experimental settings, it'll probably be merged with the master branch later this year.

After you check out to the beta branch of the repo (i.e., git pull followed by git checkout beta), you can download the pretrained DEQ-Transformer model (use the link in the beta branch README!), and run:

bash run_wt103_deq_transformer.sh train --debug --data ../data/wikitext-103 --f_thres 30 --eval --load [PRETRAINED_FILE].pkl --mem_len 300 --pretrain_step 0

It should give you something like 23.2ppl on WT103.

Please let me know if you have any issue running with this new implementation!!

@jerrybai1995
Copy link
Member

@sarthmit I'm closing this issue but if you have trouble with the cleaner version code, feel free to re-open it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants