-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading a checkpoint for MP=0 but world size is 1 #40
Comments
Trying to load 7B with MP=1 too but got a memory error. What's the size of GPU you used? I couldn't load it with 24GB. |
hi, change |
At least 32GB. Maybe you can try fp16 for a 24GB card. |
Change the line |
It works! Thanks! |
@Deep1994 May I ask how large is your VRAM? |
Yes, you guys figured it out! |
help me please. I can't run the 33B with 4*ntx 3090 export TARGET_FOLDER=./models/llama
torchrun --nproc_per_node 4 example.py --ckpt_dir $TARGET_FOLDER/33B --tokenizer_path $TARGET_FOLDER/tokenizer.model error message is :
please help me to fix this problem. Thank you~ |
sweet jesus - applied someone's PR patch for enabling compatibility w/ M2 macbooks, I got 16GB ram - it immediately locked up my system and everything crashed lol this was on 34B model, also changed some variables to make world size 1. I'm not visiting llama land again, I'm happy with GPT4, aloha ! |
Hello everyone! I have also encountered this error issue, but I followed the prompts above to modify my code. The error message remains unchanged. Can someone help me? please! TARGET_FOLDER=./ but the result is: root@autodl-container-07e5119850-d5a71bd1:~/llama-llama_v1# ./bingo.sh
Failures:
|
It seems not work. Help!
The text was updated successfully, but these errors were encountered: