我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ #18

Salierioo · 2023-04-21T11:27:09Z

No description provided.

ChawDoe · 2023-04-26T00:55:00Z

No description provided.

请问您解决这个问题了吗？

ChawDoe · 2023-04-26T00:55:10Z

No description provided.

我也遇到相同的问题了

rurubaobao · 2023-05-05T07:20:56Z

我也有这个问题怎么解决的呀

Salierioo · 2023-05-05T09:19:57Z

解决方法我在accelerate的github中与deepspeed相关的文档中找到了，我贴个链接： https://github.com/huggingface/accelerate/blob/e60f3cab7a54a5519bf8f200fa1c998ce46e75bb/docs/source/usage_guides/deepspeed.mdx 方法在saving and loading那一节中。简单来说就是用save_checkpoint方法生成的py文件对ckpt进行合并(这步对存储空间和内存大小都有要求)，生成pytorch_model.bin之后就可以用from_pretrain来load了。 yiahs ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: "OpenLMLab/MOSS" ***@***.***>; 发送时间: 2023年5月5日(星期五) 下午3:21 ***@***.***>; ***@***.***>;"State ***@***.***>; 主题: Re: [OpenLMLab/MOSS] 我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ (Issue #18) 我也有这个问题怎么解决的呀 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: ***@***.***>

Salierioo · 2023-05-05T09:23:46Z

No description provided.

请问您解决这个问题了吗？

No description provided.

我也遇到相同的问题了

我的解决方法回复在下面了，不知道您是否已经解决了。

rurubaobao · 2023-05-06T01:14:47Z

你好，我把它合起来大概60多个G，但是官方模型30多个G，你是直接加载60多个G的模型嘛？

Salierioo · 2023-05-06T01:49:06Z

我是直接加载60多G的模型 yiahs ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: "OpenLMLab/MOSS" ***@***.***>; 发送时间: 2023年5月6日(星期六) 上午9:14 ***@***.***>; ***@***.***>;"State ***@***.***>; 主题: Re: [OpenLMLab/MOSS] 我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ (Issue #18) 你好，我把它合起来大概60多个G，但是官方模型30多个G，你是直接加载60多个G的模型嘛？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: ***@***.***>

Salierioo · 2023-05-14T09:25:35Z

没遇到过，但我猜测这个错误原因是不能from_pretrained .bin文件。类似于对原来的模型使用from_pretrained方法，中间填入的参数应当是.bin所在的文件夹路径，其中要包含config等文件。 yiahs ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: "OpenLMLab/MOSS" ***@***.***>; 发送时间: 2023年5月14日(星期天) 下午5:16 ***@***.***>; ***@***.***>;"State ***@***.***>; 主题: Re: [OpenLMLab/MOSS] 我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ (Issue #18) 如果直接from_pretrained(''./pytorch_model.bin")会报“Can't load the configuration” 的错，这个要具体怎么解决呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: ***@***.***>

usun1997 · 2023-05-29T09:18:25Z

卡在where expected condition to be a boolean tensor, but got a tensor with dtype Half 这一步了，人傻了。用zero_to_fp32.py 将checkpoint .pt文件转换成了一个60多GB的pytorch_model.bin，然后在pytorch_model.bin.index 将所有的模型名称改成了pytorch_model.bin, 运行推理的时候报这个错误

mayurou · 2023-06-06T10:32:07Z

用zero_to_fp32.py 将checkpoint .pt文件转换成了一个60多GB的pytorch_model.bin，然后在pytorch_model.bin.index 将所有的模型名称改成了pytorch_model.bin

无法运行！用zero_to_fp32.py 将checkpoint .pt文件转换成了一个60多GB的pytorch_model.bin，然后在pytorch_model.bin.index 将所有的模型名称改成了pytorch_model.bin. 这样子报错TypeError: expected str, bytes or os.PathLike object, not NoneType

Salierioo · 2023-06-06T10:45:54Z

我之前做finetune的事情已经过去挺久了，可能是因为我没记清

但我确实对手动添加编写.index.json没什么印象，是跟着文档说明就能直接load，也没遇到什么问题。

lmc8133 · 2023-06-30T07:09:35Z

解决方法我在accelerate的github中与deepspeed相关的文档中找到了，我贴个链接： https://github.com/huggingface/accelerate/blob/e60f3cab7a54a5519bf8f200fa1c998ce46e75bb/docs/source/usage_guides/deepspeed.mdx 方法在saving and loading那一节中。简单来说就是用save_checkpoint方法生成的py文件对ckpt进行合并(这步对存储空间和内存大小都有要求)，生成pytorch_model.bin之后就可以用from_pretrain来load了。 yiahs @.***
…
------------------ 原始邮件 ------------------ 发件人: "OpenLMLab/MOSS" @.>; 发送时间: 2023年5月5日(星期五) 下午3:21 @.>; @.>;"State @.>; 主题: Re: [OpenLMLab/MOSS] 我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ (Issue #18) 我也有这个问题怎么解决的呀 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

这个链接看的更舒服点～
https://huggingface.co/docs/accelerate/usage_guides/deepspeed#saving-and-loading

Salierioo closed this as completed Apr 21, 2023

KickyGong mentioned this issue May 29, 2023

请问微调完的checkpoint怎么加载 #314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ #18

我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ #18

Salierioo commented Apr 21, 2023

ChawDoe commented Apr 26, 2023

ChawDoe commented Apr 26, 2023

rurubaobao commented May 5, 2023

Salierioo commented May 5, 2023 via email

Salierioo commented May 5, 2023

rurubaobao commented May 6, 2023

Salierioo commented May 6, 2023 via email

Salierioo commented May 14, 2023 via email

usun1997 commented May 29, 2023

mayurou commented Jun 6, 2023

Salierioo commented Jun 6, 2023

lmc8133 commented Jun 30, 2023

我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ #18

我使用accelerate和deepspeed zero-stage3微调的模型，使用fitune脚本中的save_checkpoint存下ckpt后应该怎么load？ #18

Comments

Salierioo commented Apr 21, 2023

ChawDoe commented Apr 26, 2023

ChawDoe commented Apr 26, 2023

rurubaobao commented May 5, 2023

Salierioo commented May 5, 2023 via email

Salierioo commented May 5, 2023

rurubaobao commented May 6, 2023

Salierioo commented May 6, 2023 via email

Salierioo commented May 14, 2023 via email

usun1997 commented May 29, 2023

mayurou commented Jun 6, 2023

Salierioo commented Jun 6, 2023

lmc8133 commented Jun 30, 2023