-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running the API #26
Comments
Hm that checkpoint change you made makes me think Model Parallel isn't being picked up. It should be set to 2 for that model. |
I couldn't figure out what combination of settings in Since #31 has the 350M model working but needs to use |
The 125M checkpoint seems to work on a single node. |
I've encountered the same problem and fixed it by forcing utils.py to |
Hi,
Following up on #19 and #23 in a separate issue.
So far I've made the following changes to
constants.py
:My
/home/hlang/opt_models
looks like:dict.txt
is from Stephen's link in #19 andreshard-model_part-0.pt
,reshard-model_part-1.pt
are from the OPT-125M links.I found that I also had to modify
checkpoint_utils.py
becauseget_paths_to_load
wasn't actually finding those .pt files. So I just directly returned them (maybe this is not the right thing?):Now when I run
metaseq-api-local
I get:I tried Stephen's advice from #19 of setting
--model-parallel N
for N=0, N=1, N=2 but none worked.The text was updated successfully, but these errors were encountered: