Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问chatglm with lora 什么时候支持多卡fine tune啊 #15

Open
Zarc98 opened this issue Apr 11, 2023 · 6 comments
Open

请问chatglm with lora 什么时候支持多卡fine tune啊 #15

Zarc98 opened this issue Apr 11, 2023 · 6 comments
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@Zarc98
Copy link

Zarc98 commented Apr 11, 2023

No description provided.

@Zarc98 Zarc98 added the enhancement New feature or request label Apr 11, 2023
@shibing624
Copy link
Owner

把模型和数据放置到不同的device上就可以并行了,你可以参考这个实现:https://github.com/yuanzhoulvpi2017/zero_nlp/tree/main/Chatglm6b_ModelParallel

@bash99
Copy link

bash99 commented Apr 21, 2023

把模型和数据放置到不同的device上就可以并行了,你可以参考这个实现:https://github.com/yuanzhoulvpi2017/zero_nlp/tree/main/Chatglm6b_ModelParallel

我的卡是半精度比单精度快很多的型号,用fp16=true似乎训练速度没有提升,是需要增加其它参数吗?

training_chatglm_csc_demo.py: 102
model.train_model(args.train_file, args={'fp16': True})

@shibing624
Copy link
Owner

我还在解决这个问题,fp16训练当前只减少显存占用了,没有起到加速作用。

@bash99
Copy link

bash99 commented Apr 21, 2023

我还在解决这个问题,fp16训练当前只减少显存占用了,没有起到加速作用。

赞。我也试了折腾int8,改了一点之后,还是卡在
AttributeError: 'CastOutputToFloat' object has no attribute 'weight'

mambaforge/lib/python3.10/site-packages/peft/utils/other.py:75 in  prepare_model_for_int8_training
│    72 │   if hasattr(model, output_embedding_layer_name):                                        │
│    73 │   │   output_embedding_layer = getattr(model, output_embedding_layer_name)               │
│    74 │   │   print(f"debug: {output_embedding_layer}")                                          │
│ ❱  75 │   │   input_dtype = output_embedding_layer.weight.dtype                                  │
│    76 │   │                                                                                      │
│    77 │   │   class CastOutputToFloat(torch.nn.Sequential):                                      │
│    78 │   │   │   r"""                                                                           │
output_embedding_layer  打印出来是这个
debug: CastOutputToFloat(
  (0): Linear(in_features=4096, out_features=130528, bias=False)
)

下面是一些小改动,改完才可以进到上面的错误中

diff --git a/textgen/chatglm/chatglm_model.py b/textgen/chatglm/chatglm_model.py
index fab945b..dff559e 100644
--- a/textgen/chatglm/chatglm_model.py
+++ b/textgen/chatglm/chatglm_model.py
@@ -103,11 +103,13 @@ class ChatGlmModel:
             model_name,
             config=config,
             trust_remote_code=True,
+            device_map='auto',
             load_in_8bit=self.args.int8,
         )
-        if self.args.fp16:
-            self.model.half()
-        self.model.to(self.device)
+        if not self.args.int8:
+            if self.args.fp16:
+                self.model.half()
+            self.model.to(self.device)

         if self.args.quantization_bit:
             logger.debug(f"Quantized to {self.args.quantization_bit} bit")

@bash99
Copy link

bash99 commented May 4, 2023

我还在解决这个问题,fp16训练当前只减少显存占用了,没有起到加速作用。

看到transformer文档里面似乎也是表示fp16可能会在大batch size时省显存,要加速对模型有苛刻要求。
https://huggingface.co/docs/transformers/v4.13.0/en/performance
”So there is only a real memory saving if we train at a high batch size (and it’s not half) and at batch sizes lower than 8, you actually get a bigger memory footprint (because of the overhead mentioned above). The gain for FP16 training is that in each of those cases, the training with the flag --fp16 is twice as fast, which does require every tensor to have every dimension be a multiple of 8 (examples pad the tensors to a sequence length that is a multiple of 8).“

另外,我试着把batch size换成4能提速20%左右(V100有32G内存),换成8还能再提升5%但是训练无效果。

Copy link

stale bot commented Dec 27, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.(由于长期不活动,机器人自动关闭此问题,如果需要欢迎提问)

@stale stale bot added the wontfix This will not be worked on label Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants