Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuyu training fails with a padding error for the same inputs as the test case #27255

Closed
2 of 4 tasks
Carolinabanana opened this issue Nov 2, 2023 · 3 comments
Closed
2 of 4 tasks

Comments

@Carolinabanana
Copy link

System Info

transformers-4.36.0.dev0 (commit commit 552ff24)

Who can help?

@molbap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import io
from datasets import Dataset
from PIL import Image
import requests
import torch
from transformers import AutoTokenizer,Trainer,TrainingArguments,FuyuImageProcessor,FuyuProcessor,FuyuForCausalLM

pretrained_path = "adept/fuyu-8b"
tokenizer = AutoTokenizer.from_pretrained(pretrained_path, pad_token_id=0, padding=True,truncation=True)
tokenizer.pad_token = tokenizer.eos_token
image_processor = FuyuImageProcessor()
processor = FuyuProcessor(image_processor=image_processor, tokenizer=tokenizer)

text_prompt = "Answer the following DocVQA question based on the image. \n Which is the metro in California that has a good job Outlook?"
jobs_image_url = "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/jobs.png"
jobs_image_pil = Image.open(io.BytesIO(requests.get(jobs_image_url).content))
second_text_prompt = "Answer the following DocVQA question based on the image. \n What if the maximum male life expectancy?"
chart_image_url = "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/chart.png"
chart_image_pil = Image.open(io.BytesIO(requests.get(chart_image_url).content))

model_inputs = processor(text=[text_prompt,second_text_prompt], images=[jobs_image_pil,chart_image_pil]).to("cuda:0")

model = FuyuForCausalLM.from_pretrained(pretrained_path, device_map='cuda:0', torch_dtype=torch.bfloat16)

generation = processor.tokenizer.batch_decode(model.generate(
    **model_inputs, max_new_tokens=10)[:, -10:], skip_special_tokens=True)

for batched_generation in generation:
    answer = batched_generation.split('\x04 ', 1)[1] if '\x04' in batched_generation else ''
    print(answer)

#Results : Los Angeles, 80.7

tokenized_dataset = Dataset.from_dict(model_inputs)

trainer = Trainer(
    model=model,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer
)

trainer.train()

#ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`image_patches` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

Expected behavior

When running Fuyu model training, the same inputs that work correctly in model.generate (the ones from the test case) fail with ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (image_patches in this case) have excessive nesting

However, I have already enabled 'padding=True' 'truncation=True' on the tokenizer, and I have used the exact output of FuyuProcessor as the inputs.

I provide example code that modifies the existing test case and test data to add a Trainer.

@molbap
Copy link
Contributor

molbap commented Nov 7, 2023

Hi @Carolinabanana , I answered here #26997 (comment) to another training script issue, can you check it it helps your case?

@leo4life2
Copy link

Hey did you get your script working? I'm trying to fine-tune Fuyu as well and would appreciate an example script.

@BlackHandsomeLee
Copy link

Hey,I came across your fine-tuned language model repository and am very interested in learning about the fine-tuning process. Would you be willing to share any details about the techniques or code used to fine-tune the model? Understanding how others have approached fine-tuning would be really helpful as I'm new to this area.Thank you for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants