Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to run it, need more details #7

Open
SeekPoint opened this issue Jun 1, 2023 · 2 comments
Open

how to run it, need more details #7

SeekPoint opened this issue Jun 1, 2023 · 2 comments

Comments

@SeekPoint
Copy link

and how to install alpaca-rlhf

@l294265421
Copy link
Owner

l294265421 commented Jun 2, 2023

and how to install alpaca-rlhf

  1. download this repo
  2. Enter ./alpaca_rlhf directory
  3. Run the step1, step2 and step3 commands in the Stey by Step section of README

@SeekPoint
Copy link
Author

(gh_alpaca-rlhf) amd00@asus00:/llm_dev/alpaca-rlhf$
(gh_alpaca-rlhf) amd00@asus00:
/llm_dev/alpaca-rlhf$ sh run.sh --num_gpus 1 ./alpaca_rlhf/deepspeed_chat/training/step1_supervised_finetuning/main.py --sft_only_data_path MultiTurnAlpaca --data_output_path ./rlhf-tmp/ --model_name_or_path /hf_model/llama-7b-hf --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --max_seq_len 128 --learning_rate 3e-4 --num_train_epochs 1 --gradient_accumulation_steps 8 --num_warmup_steps 100 --output_dir ./rlhf/actor --lora_dim 8 --lora_module_name q_proj,k_proj --only_optimize_lora --deepspeed --zero_stage 2
start 20230602162350--------------------------------------------------
[2023-06-02 16:23:51,869] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/amd00/anaconda3/envs/gh_alpaca-rlhf/bin/deepspeed:6 in │
│ │
│ 3 from deepspeed.launcher.runner import main │
│ 4 │
│ 5 if name == 'main': │
│ ❱ 6 │ main() │
│ 7 │
│ │
│ /home/amd00/anaconda3/envs/gh_alpaca-rlhf/lib/python3.8/site-packages/deepspeed/launcher/runner. │
│ py:407 in main │
│ │
│ 404 │ │ resource_pool = {} │
│ 405 │ │ device_count = get_accelerator().device_count() │
│ 406 │ │ if device_count == 0: │
│ ❱ 407 │ │ │ raise RuntimeError("Unable to proceed, no GPU resources available") │
│ 408 │ │ resource_pool['localhost'] = device_count │
│ 409 │ │ args.master_addr = "127.0.0.1" │
│ 410 │ │ multi_node_exec = False │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Unable to proceed, no GPU resources available
20230602162352
(gh_alpaca-rlhf) amd00@asus00:
/llm_dev/alpaca-rlhf$ nvidia-smi
Fri Jun 2 16:24:04 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 0% 45C P8 18W / 350W | 768MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1085 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1967 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 259783 C ...Speed-Chat/bin/python3.10 755MiB |
+-----------------------------------------------------------------------------+
(gh_alpaca-rlhf) amd00@asus00:~/llm_dev/alpaca-rlhf$

I got one 3090 and I changed gpu_nums to 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants