Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: forward () got an unexpected keyword argument "position_ids" #19

Closed
GZHU-DVL opened this issue Jun 28, 2023 · 6 comments
Closed

Comments

@GZHU-DVL
Copy link

1687915914102
According to the tutorial, I can execute this project, but the execution will report an error when I reach this position.

@mmaaz60
Copy link
Member

mmaaz60 commented Jun 28, 2023

Hi @GZHU-DVL,

Thank you for your interest in our work. Please make sure that you followed the mentioned environment setup process and using the correct versions of the libraries.

If the issue still exists, please provide the script and command that you are running to understand the issue.

I hope it will help. Thanks

@GZHU-DVL
Copy link
Author

The versions of the libraries are as follows:
torch~=2.0.0
tqdm~=4.65.0
transformers
numpy~=1.23
Pillow~=9.5.0
decord~=0.6.0
gradio~=3.23.0
markdown2~=2.4.8
einops~=0.6.1
requests~=2.30.0
sentencepiece~=0.1.99
protobuf~=4.23.2
accelerate~=0.20.3
accelerate==0.19.0
tokenizers>=0.13.3

The command are as follows:
torchrun video_chatgpt/train/train_mem.py
--model_name_or_path /gemini/data-2/7b/
--version v1
--data_path /gemini/code/Video-ChatGPT/scripts/video_chatgpt_training.json
--video_folder /gemini/data-2/ActivityNet_Train_Video-ChatGPT_Clip-L14_Features/activity_clip-14L_spatio_temporal_356/
--tune_mm_mlp_adapter True
--mm_use_vid_start_end
--bf16 True
--output_dir ./Video-ChatGPT_7B-1.1_Checkpoints
--num_train_epochs 3
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 3000
--save_total_limit 3
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 100
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--lazy_preprocess True

@GZHU-DVL
Copy link
Author

The problem was solved after I changed the version of Transformer.

@mmaaz60 mmaaz60 closed this as completed Jun 28, 2023
@whcpumpkin
Copy link

The problem was solved after I changed the version of Transformer.

Hi!
I faced the same problem. Could you tell me which version of Transformers you use?
Thanks!

@zhanwenchen
Copy link

The root cause can be seen in this issue: huggingface/transformers#24130

@zhanwenchen
Copy link

zhanwenchen commented Nov 1, 2023

The root cause can be seen in this issue: huggingface/transformers#24130

Actually, I was wrong. The problem is with the flash_attn monkey patch not being updated to reflect the breaking code changes in transformers. To fix this, update the llama_flash_attn_monkey_patch.py in this repository to match this one: https://github.com/lm-sys/FastChat/blob/dd84d166d7694f0cc0c766e5a811d995f5801c77/fastchat/train/llama_flash_attn_monkey_patch.py

The specific commit with this fix is this one: lm-sys/FastChat@daa9c11

But after that, you also need to add a kwarg, padding_mask: Optional[torch.LongTensor] = None, in the forward like this (if the FastChat repo hasn't when you see this):

# ...video_chatgpt/train/llama_flash_attn_monkey_patch.py
...
def forward(
    self,
    hidden_states: torch_Tensor,
    attention_mask: Optional[torch_Tensor] = None,
    position_ids: Optional[torch_Tensor] = None,
    past_key_value: Optional[Tuple[torch_Tensor]] = None,
    output_attentions: bool = False,
    use_cache: bool = False,
    padding_mask: Optional[torch_LongTensor] = None,
) -> Tuple[torch_Tensor, Optional[torch_Tensor], Optional[Tuple[torch_Tensor]]]:
    if output_attentions:
...

(Ignore my _s. Treat them as . I enjoy silly runtime optimizations).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants