Skip to content

Conversation

@Sai-Suraj-27
Copy link
Contributor

What does this PR do?

Fixes these failing tests in Glm4vMoeIntegrationTest.

image

As discussed here, It depends on how pytest collects the different tests, so similar to Lfm2MoeIntegrationTest, Qwen3MoeIntegrationTest (which are not failing in their get_model() function calls with this issue) it would be ideal to have setUpClass that initializes cls.model always.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ydshieh @Cyrilvallez

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 1, 2025

Thank you @Sai-Suraj-27 I will take a closer look.

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 2, 2025

It turns out an issue caused by run_first from from transformers.testing_utils.

The following script shows the issue, and if we removed the @run_first everything is ✅ .

Have no extra time to dive into run_first at this moment, so I will approve the PR.

import unittest

from transformers.testing_utils import run_first

class Glm4vMoeModelTest(unittest.TestCase):

    def test_foo(self):
        assert 1 == 1


class Glm4vMoeIntegrationTest(unittest.TestCase):
    model = None

    @classmethod
    def get_model(cls):
        if cls.model is None:
            cls.model = {"a": "b"}
        return cls.model

    @classmethod
    def tearDownClass(cls):
        if hasattr(cls, "model"):
            del cls.model

    def setUp(self):
        pass

    def tearDown(self):
        pass

    def test_foo2(self):
        model = self.get_model()
        assert 1 == 1

    def test_foo3(self):
        model = self.get_model()
        assert 1 == 1

    @run_first
    def test_foo4(self):
        model = self.get_model()
        assert 1 == 1

@classmethod
def get_model(cls):
if cls.model is None:
if not hasattr(cls, "model") or cls.model is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need not hasattr(cls, "model"). Any reason you add this?

Copy link
Contributor Author

@Sai-Suraj-27 Sai-Suraj-27 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added it as an extra safety condition. Since python follows lazy evaluation for if-conditions if there is no attribute model, then it will not evaluate cls.model is None and directly assings the appropriate value. So, the Glm4vMoeIntegrationTest has no attribute model failure will never happen. But yes, I think we can remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. CI Looks green.

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 2, 2025

Thanks. We still have some issues, but that is irrelevant to this attribute and the run_first. So I am going to merge and talk to the team with the following issue

    def test_small_model_integration_test_with_video(self):
        processor = AutoProcessor.from_pretrained("zai-org/GLM-4.5V", max_image_size={"longest_edge": 50176})
>       model = self.get_model()

tests/models/glm4v_moe/test_modeling_glm4v_moe.py:416:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/models/glm4v_moe/test_modeling_glm4v_moe.py:304: in get_model
    cls.model = Glm4vMoeForConditionalGeneration.from_pretrained(
src/transformers/modeling_utils.py:250: in _wrapper
    return func(*args, **kwargs)
src/transformers/modeling_utils.py:4001: in from_pretrained
    model, missing_keys, unexpected_keys, mismatched_keys, offload_index, error_msgs = cls._load_pretrained_model(
src/transformers/modeling_utils.py:4141: in _load_pretrained_model
    convert_and_load_state_dict_in_model(
src/transformers/core_model_loading.py:892: in convert_and_load_state_dict_in_model
    disk_offload_index = offload_and_maybe_resave_param(
src/transformers/core_model_loading.py:618: in offload_and_maybe_resave_param
    disk_offload_index = offload_weight(param, target_name, disk_offload_folder, disk_offload_index)      
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

weight = tensor([[[-2.2827e-02, -4.9438e-03,  8.5449e-03,  ...,  1.3000e-02,
           1.1597e-03, -1.2573e-02],
         [-6....[-8.4839e-03, -6.4392e-03, -2.1667e-03,  ..., -9.5215e-03,
          -3.1281e-03,  3.4485e-03]]], dtype=torch.bfloat16)
weight_name = 'model.language_model.layers.1.mlp.experts.down_proj', offload_folder = None
offload_index = {'lm_head.weight': {'dtype': 'bfloat16', 'safetensors_file': '/mnt/cache/hub/models--zai-o
rg--GLM-4.5V/snapshots/ed474...ddceff371bca04/model-00001-of-00046.safetensors', 'weight_name': 'model.language_model.layers.1.mlp.gate.weight'}, ...}

    def offload_weight(weight: torch.Tensor, weight_name: str, offload_folder: str | None, offload_index: dict) -> dict:
        """Write `weight` to disk inside `offload_folder`, and update `offload_index` accordingly. Everything is
        saved in `safetensors` format."""

        if offload_folder is None:
>           raise ValueError(
                "The current `device_map` had weights offloaded to the disk, which needed to be re-saved. This is either "
                "because the weights are not in `safetensors` format, or because the model uses an internal weight format "
                "different than the one saved (i.e. most MoE models). Please provide an `offload_folder` for them in "
                "`from_pretrained`."
            )
E           ValueError: The current `device_map` had weights offloaded to the disk, which needed to be re-
saved. This is either because the weights are not in `safetensors` format, or because the model uses an in
ternal weight format different than the one saved (i.e. most MoE models). Please provide an `offload_folder` for them in `from_pretrained`.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm4v_moe

@ydshieh ydshieh merged commit ac0769c into huggingface:main Dec 2, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants