Updated code with load_state_dict #9

sarthmit · 2021-02-11T01:07:29Z

Hi, I wanted to know more about the codebase. Why does it have cloning and copying of data instead of just having one Transformer + Residual layer that is iteratively queried?

Also, do you have a more up to date codebase that uses load_state_dict instead of cloning and copying?

Thank you so much!

jerrybai1995 · 2021-02-11T01:45:33Z

Hi, the cloning and copying was actually a workaround for the older version of PyTorch where it was not allowed to call backward within a custom backward function (recall that a DEQ's backward requires vector-Jacobian products).

With the latest versions of PyTorch, this code can be significantly simplified and we no longer need files such as deq.py or deq_transformer_module.py, and there will be no need to clone or copy. Instead, we can use backward hook and autograd.grad. I have been planning to make a major renovation to the repo but never got the chance to do so. Maybe I'll do that in the next few days ;-)

That being said, if you are interested in what the new implementation would be like (with hook and autograd.grad), you can take a look at the tutorial code from NeurIPS 2020: http://implicit-layers-tutorial.org/deep_equilibrium_models/

sarthmit · 2021-02-11T01:59:35Z

Thanks for the quick response! I will wait for the updated code then, I want to use the WikiText-103 Transformer code and the tutorial code has only some basic image classification examples. Looking forward to when the major renovation comes, it would be super helpful :)

Thank you!

jerrybai1995 · 2021-02-18T23:55:08Z

Hi @sarthmit,

I have updated the code to the beta branch of this repo. Since only the implementation for the Transformer instantiation is available now, and I haven't been able to fully test out the cleaner implementation on all experimental settings, it'll probably be merged with the master branch later this year.

After you check out to the beta branch of the repo (i.e., git pull followed by git checkout beta), you can download the pretrained DEQ-Transformer model (use the link in the beta branch README!), and run:

bash run_wt103_deq_transformer.sh train --debug --data ../data/wikitext-103 --f_thres 30 --eval --load [PRETRAINED_FILE].pkl --mem_len 300 --pretrain_step 0

It should give you something like 23.2ppl on WT103.

Please let me know if you have any issue running with this new implementation!!

jerrybai1995 · 2021-03-03T18:36:09Z

@sarthmit I'm closing this issue but if you have trouble with the cleaner version code, feel free to re-open it!

jerrybai1995 closed this as completed Mar 3, 2021

zaccharieramzi added a commit to zaccharieramzi/deq that referenced this issue Feb 1, 2023

Merge pull request locuslab#9 from zaccharieramzi/data-aug-on-disk

7c13360

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated code with load_state_dict #9

Updated code with load_state_dict #9

sarthmit commented Feb 11, 2021

jerrybai1995 commented Feb 11, 2021

sarthmit commented Feb 11, 2021

jerrybai1995 commented Feb 18, 2021

jerrybai1995 commented Mar 3, 2021

Updated code with load_state_dict #9

Updated code with load_state_dict #9

Comments

sarthmit commented Feb 11, 2021

jerrybai1995 commented Feb 11, 2021

sarthmit commented Feb 11, 2021

jerrybai1995 commented Feb 18, 2021

jerrybai1995 commented Mar 3, 2021