Skip to content
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.

Need some help getting Tensor Rematerialization to work #81

Closed
WhereAmO opened this issue Mar 31, 2021 · 7 comments
Closed

Need some help getting Tensor Rematerialization to work #81

WhereAmO opened this issue Mar 31, 2021 · 7 comments

Comments

@WhereAmO
Copy link

Resolving dependencies...
cabal.exe: Could not resolve dependencies:
[__0] trying: twremat-0.1.0.0 (user goal)
[__1] next goal: base (dependency of twremat)
[__1] rejecting: base-4.15.0.0/installed-4.15.0.0 (conflict: twremat =>
base>=4.12.0.0 && <4.15.0.0)
[__1] skipping: base-4.15.0.0 (has the same characteristics that caused the
previous version to fail: excluded by constraint '>=4.12.0.0 && <4.15.0.0'
from 'twremat')
[__1] rejecting: base-4.14.1.0, base-4.14.0.0, base-4.13.0.0, base-4.12.0.0,
base-4.11.1.0, base-4.11.0.0, base-4.10.1.0, base-4.10.0.0, base-4.9.1.0,
base-4.9.0.0, base-4.8.2.0, base-4.8.1.0, base-4.8.0.0, base-4.7.0.2,
base-4.7.0.1, base-4.7.0.0, base-4.6.0.1, base-4.6.0.0, base-4.5.1.0,
base-4.5.0.0, base-4.4.1.0, base-4.4.0.0, base-4.3.1.0, base-4.3.0.0,
base-4.2.0.2, base-4.2.0.1, base-4.2.0.0, base-4.1.0.0, base-4.0.0.0,
base-3.0.3.2, base-3.0.3.1 (constraint from non-upgradeable package requires
installed instance)
[__1] fail (backjumping, conflict set: base, twremat)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: base, twremat

I've got absolutely no clue how to get the cabal command to work, is there a specific version of everything that I need to install? I'm on Windows if that helps.

@nshepperd
Copy link
Owner

Ah, that's because you have a newer version of ghc than I used. Not to worry, I updated it to work with your version (ghc 9.0 which corresponds to base-4.15). Pull the repo and try again.

@WhereAmO
Copy link
Author

WhereAmO commented Apr 1, 2021

That worked perfectly, as far as I am aware, though now when I try to run it using Tensor Rematerialization it throws an error, seemingly about not being able to find it, is there a step I'm missing? I ran the command cabal v2-install --installdir=../bin and then added --twremat and --twremat_memlimit, but then it says this.

bin\twremat C:\Users\natha\AppData\Local\Temp\tmpsc906qwm C:\Users\natha\AppData\Local\Temp\tmp5lue0uvk
Traceback (most recent call last):
File "train.py", line 314, in
main()
File "train.py", line 146, in main
(train_loss, opt_grads) = tfremat.tf_remat((train_loss, opt_grads), memlimit=args.twremat_memlimit)
File "D:\Hobbies\AI\gpt-2-finetuning\src\tfremat.py", line 163, in tf_remat
steps = twremat.runtwremat(node_info, memlimit, {from_op[c] for c in compute_ops})
File "D:\Hobbies\AI\gpt-2-finetuning\src\twremat.py", line 47, in runtwremat
proc = Popen([TWREMAT, fname, outname])
File "C:\Users\natha\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 800, in init
restore_signals, start_new_session)
File "C:\Users\natha\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 1207, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

@nshepperd
Copy link
Owner

Hmm... That looks like some sort of difference with windows. I can only guess what because I don't do any development on windows. Maybe because it's called twremat.exe there? Try editing src/twremat.py and overriding TWREMAT to the full path of twremat.exe? Probably D:\Hobbies\AI\gpt-2-finetuning\bin\twremat.exe in your case.

@nshepperd
Copy link
Owner

Verify that the file is actually there too, I guess.

@WhereAmO
Copy link
Author

WhereAmO commented Apr 1, 2021

How would I do that? Would I simply replace TWREMAT=os.path.join(BINDIR, 'twremat') with TWREMAT=D:\Hobbies\AI\gpt-2-finetuning\bin\twremat.exe') or am I thinking about this totally wrong? And sorry for all the questions I'm completely new to all of this (asides from bashing my head against it multiple times over the years and this week). Also the file is indeed there so that's good.

@nshepperd
Copy link
Owner

TWREMAT=r'D:\Hobbies\AI\gpt-2-finetuning\bin\twremat.exe'

@WhereAmO
Copy link
Author

WhereAmO commented Apr 1, 2021

Ah, that worked! Thank you for all the help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants