优化_replace_image_tags防止数据中不包含具体图像的<img></img> tag中断训练 #3683

zsxm1998 · 2025-03-26T21:24:55Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

在一次GRPO训练中程序意外中断于此处，原因是模型生成了<img>乱七八糟的内容</img>这样的文本，_replace_image_tags函数未加检查地错误地提取了其中内容，然后和数据中原始的images冲突。

本次PR更新了_replace_image_tags，主要实现两个逻辑。

对<img></img>包裹的内容进行判断，若为合法的图像（如url、本地路径、base64等），才提取并compat <img></img> 为<image>，否则不处理。
在原来有<image> tag且inputs.images有相应图像的情况下，保证<img></img>插入inputs.images的正确位置。

Experiment results

经过测试可以忽略<img></img>内非图片内容，且对后续<image>和image文件的处理没有影响，能够正常训练。

原始数据中若inputs.images和<image>数量匹配，则没有问题。若inputs.images数量比原始<image>多，则由后续_add_default_tags函数处理。若inputs.images数量比原始<image>少，且在有效<img></img>前面有没有图像对应的<image> tag，则会造成图像错位，但实际上可能会因为最终<image>和inputs.images数量不匹配，在后续模型中报错（如llava）。

Jintao-Huang · 2025-03-27T17:03:34Z

感谢PR，这个问题我想个别的方法修复一下❤️

Jintao-Huang · 2025-03-28T02:53:56Z

该问题将在这个PR中被修复：#3704

zsxm1998 · 2025-03-28T13:31:22Z

okok

zsxm1998 added 2 commits March 27, 2025 05:01

优化_replace_image_tags防止数据中不包含具体图像的<img></img> tag中断训练

b0387d0

update _replace_image_tags pass precommit

1677457

Jintao-Huang closed this Apr 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

优化_replace_image_tags防止数据中不包含具体图像的<img></img> tag中断训练 #3683

优化_replace_image_tags防止数据中不包含具体图像的<img></img> tag中断训练 #3683

Uh oh!

zsxm1998 commented Mar 26, 2025 •

edited

Loading

Uh oh!

Jintao-Huang commented Mar 27, 2025

Uh oh!

Jintao-Huang commented Mar 28, 2025

Uh oh!

zsxm1998 commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

优化_replace_image_tags防止数据中不包含具体图像的<img></img> tag中断训练 #3683

优化_replace_image_tags防止数据中不包含具体图像的<img></img> tag中断训练 #3683

Uh oh!

Conversation

zsxm1998 commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

Jintao-Huang commented Mar 27, 2025

Uh oh!

Jintao-Huang commented Mar 28, 2025

Uh oh!

zsxm1998 commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zsxm1998 commented Mar 26, 2025 •

edited

Loading