- 
                Notifications
    
You must be signed in to change notification settings  - Fork 31k
 
Open
Labels
Description
System Info
transformersversion: 5.0.0.dev0- Platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.35
 - Python version: 3.10.17
 - Huggingface_hub version: 1.0.0.rc6
 - Safetensors version: 0.5.3
 - Accelerate version: 1.10.1
 - Accelerate config: not found
 - DeepSpeed version: 0.18.0
 - PyTorch version (accelerator?): 2.8.0+cu126 (CUDA)
 - Using distributed or parallel set-up in script?:
 - Using GPU in script?:
 - GPU type: NVIDIA RTX A6000
 
Who can help?
@ArthurZucker and @itazap
Information
- The official example scripts
 - My own modified scripts
 
Tasks
-  An officially supported task in the 
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
 
Reproduction
It is a simple custom head on top of Qwen-VL-4B
import json
with open("prompts.json", "r") as f:
    prompt_dict = json.load(f)
    
def format_data(sample):
    images = []
    dims_selected = []
    # print(len(sample["image"]), len(sample["annotation"]))
    for image in range(len(sample['image'])):
        images.append(sample['image'][image])
        try:
            if random.random()>0.5:
                # sample a dim with score>=0 
                dims_selected.append(random.choice(list([i for i in dims if sample['annotation'][image][i]>=0])))
            else:
                # sample a dim with score<0
                dims_selected.append(random.choice(list([i for i in dims if sample['annotation'][image][i]<0])))
        except IndexError:
            dims_selected.append(random.choice(dims))
            
    prompts = [prompt_dict[dim] for i, dim in enumerate(dims_selected)]
    images = list(sample['image'])
    n_images = len(images) 
    n_prompts = len(prompts) 
    messages = [[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": images[i],
                },
                {"type": "text", "text": prompt},
            ],
        }
    ] for i, prompt in enumerate(prompts)]
    text = processor.apply_chat_template(
        messages,
        add_generation_prompt=True, 
        return_dict=True,
        return_tensors="pt",
        padding=True
    ) 
    inputs = processor(
        images=[[images[i]] for i in range(n_images)],
        text=text,
        return_tensors="pt",
        padding=True,
    )
    print(inputs['pixel_values'].shape) #torch.Size([45312, 1536])
    inputs["text"] = text
    answers = [1 if i[dim]<0 else (0 if i[dim]>0 else 0.5)for i, dim in zip(sample["annotation"], dims_selected)]
    labels = torch.tensor(answers)
    inputs['labels'] = labels
    inputs['dim'] = [dims.index(dim) for dim in dims_selected]
    return inputsExpected behavior
The shape should be BxLxD, but got (BxL)xD, I also tried put the image in messages, same results