You should supply an instance of `transformers.BatchFeature` or list of `transformers.BatchFeature` to this method that includes input_values, but you provided ['file', 'audio', 'label'] #25748

c1ekrt · 2023-08-25T06:39:36Z

System Info

transformers version: 4.33.0.dev0
Platform: Windows-10-10.0.22621-SP0
Python version: 3.10.12
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.2
Accelerate version: 0.21.0
Accelerate config: not found
PyTorch version (GPU?): 2.0.1+cu118 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

run audio_classification_CMD.py with following arguments

audio_classification_CMD.py
run_audio_classification.py --output_dir .\output --overwrite_output_dir --model_name_or_path facebook/wav2vec2-base --dataset_name superb  --dataset_config_name ks --hub_model_id Audio_Classification --do_train --do_eval --fp16 --train_split_name train --remove_unused_columns False --load_best_model_at_end --metric_for_best_model accuracy --gradient_accumulation_steps 4 --push_to_hub --push_to_hub_model_id Audio_Classification --save_safetensors --save_step 200 --save_strategy epoch --evaluation_strategy epoch --logging_strategy steps --logging_steps 10 --max_length_seconds 1 --seed 0 --num_train_epochs 5 --save_total_limit 3 --learning_rate 3e-5 --per_device_train_batch_size 16 --per_device_eval_batch_size 3 --warmup_ratio 0.1

however return the error below

Traceback (most recent call last):
  File "D:\Jhou's Workshop\transformers-main\examples\pytorch\audio-classification\run_audio_classification.py", line 443, in <module>
    main()
  File "D:\Jhou's Workshop\transformers-main\examples\pytorch\audio-classification\run_audio_classification.py", line 417, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\trainer.py", line 1546, in train
    return inner_training_loop(
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\trainer.py", line 1815, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\accelerate\data_loader.py", line 384, in __iter__
    current_batch = next(dataloader_iter)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\torch\utils\data\dataloader.py", line 633, in __next__
    data = self._next_data()
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\torch\utils\data\dataloader.py", line 677, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\torch\utils\data\_utils\fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\data\data_collator.py", line 249, in __call__
    batch = self.tokenizer.pad(
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\feature_extraction_sequence_utils.py", line 132, in pad
    raise ValueError(
ValueError: You should supply an instance of `transformers.BatchFeature` or list of `transformers.BatchFeature` to this method that includes input_values, but you provided ['file', 'audio', 'label']
  0%|                                                                                                                                                                                  | 0/5055 [00:01<?, ?it/s]

Expected behavior

Expect to be start training.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-08-25T07:21:41Z

cc @sanchit-gandhi

sanchit-gandhi · 2023-08-25T14:07:20Z

Hey @c1ekrt - thanks for the issue report. Unfortunately, I'm not able to reproduce the error you're facing with the given command. I launched training using the arguments you provided, and training was executed successfully. See logs at wandb. Could you confirm that you are using the latest version of the examples script without modifications? Thanks!

c1ekrt · 2023-08-26T02:44:30Z

Thanks for replying! I will reinstall the package and rerun the example after this weekend.

c1ekrt · 2023-08-28T00:24:23Z

I had modified two lines since this error message popped out

  File "D:\Jhou's Workshop\transformers-main\examples\pytorch\audio-classification\run_audio_classification.py", line 443, in <module>
    main()
  File "D:\Jhou's Workshop\transformers-main\examples\pytorch\audio-classification\run_audio_classification.py", line 417, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\trainer.py", line 1546, in train
    return inner_training_loop(
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\trainer.py", line 1837, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\trainer.py", line 2682, in training_step
    loss = self.compute_loss(model, inputs)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\trainer.py", line 2707, in compute_loss
    outputs = model(**inputs)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\accelerate\utils\operations.py", line 581, in forward
    return model_forward(*args, **kwargs)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\accelerate\utils\operations.py", line 569, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 2136, in forward
    loss = loss_fct(logits.view(-1, self.config.num_labels), labels.view(-1))
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\torch\nn\modules\loss.py", line 1174, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "C:\Users\jim\.conda\envs\diffhug\lib\site-packages\torch\nn\functional.py", line 3029, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int'

So I changed the code in line 400 to

 # Initialize our trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=raw_datasets["train"].with_format("torch") if training_args.do_train else None,
        eval_dataset=raw_datasets["eval"].with_format("torch") if training_args.do_eval else None,
        compute_metrics=compute_metrics,
        tokenizer=feature_extractor,
    )

And transformers.BatchFeature error popped up

I have reinstalled transformers package but the issue remained

sanchit-gandhi · 2023-08-29T17:10:59Z

Hey @c1ekrt - you can't pass the raw dataset with {audio, text} to the trainer, you need to pass the pre-processed dataset with the features {normalised audio, token ids}:

# Initialize Trainer
trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=vectorized_datasets["train"] if training_args.do_train else None,
    eval_dataset=vectorized_datasets["eval"] if training_args.do_eval else None,
    tokenizer=processor,
)

c1ekrt · 2023-08-31T08:11:14Z

I still can't get the example work. The pre-process part of the code which is this section
Line 317-329

def train_transforms(batch):
        """Apply train_transforms across a batch."""
        subsampled_wavs = []
        for audio in batch[data_args.audio_column_name]:
            wav = random_subsample(
                audio["array"], max_length=data_args.max_length_seconds, sample_rate=feature_extractor.sampling_rate
            )
            subsampled_wavs.append(wav)
        inputs = feature_extractor(subsampled_wavs, sampling_rate=feature_extractor.sampling_rate)
        output_batch = {model_input_name: inputs.get(model_input_name)}
        output_batch["labels"] = list(batch[data_args.label_column_name])
        return output_batch

never run despite set_transform being called
Line 390

raw_datasets["train"].set_transform(train_transforms, output_all_columns=False)

all of these code are unmodified.

sanchit-gandhi · 2023-08-31T16:08:17Z

Indeed, the pre-processing function is defined here:

transformers/examples/pytorch/audio-classification/run_audio_classification.py

Line 317 in eaf5e98

def train_transforms(batch):

And the transformation is applied here:

transformers/examples/pytorch/audio-classification/run_audio_classification.py

Line 390 in eaf5e98

raw_datasets["train"].set_transform(train_transforms, output_all_columns=False)

Can you try running the script un-changed from the default script provided? As mentioned above, can do a training run using the command you provided without any issue

It's worth trying updating the accelerate package:

pip install --upgrade accelerate

And checking that your PyTorch version is up to date (maybe even try the nightly install?)

c1ekrt · 2023-09-01T03:32:30Z

OK. It seems that the 'label' of the superb dataset that passed into the cross entropy calculation happened to be wrong dtype. Hence the error

"nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int'

occurred. After changing the dtype to torch.int64 the code start running without any error.

sanchit-gandhi · 2023-09-01T17:45:32Z

Interesting! I couldn't repro this on my side. Will leave as closed for now, but feel free to re-open if you see this phenomenon in the examples scripts again. Sorry we didn't find the complete fix this time!

Emmekea · 2023-11-30T14:14:04Z

I faced the exact same issue. For me, upgrading datasets (pip3 install --upgrade datasets) did the trick.

c1ekrt closed this as completed Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

You should supply an instance of `transformers.BatchFeature` or list of `transformers.BatchFeature` to this method that includes input_values, but you provided ['file', 'audio', 'label'] #25748

You should supply an instance of `transformers.BatchFeature` or list of `transformers.BatchFeature` to this method that includes input_values, but you provided ['file', 'audio', 'label'] #25748

c1ekrt commented Aug 25, 2023 •

edited by ArthurZucker

ArthurZucker commented Aug 25, 2023

sanchit-gandhi commented Aug 25, 2023

c1ekrt commented Aug 26, 2023

c1ekrt commented Aug 28, 2023 •

edited

sanchit-gandhi commented Aug 29, 2023 •

edited

c1ekrt commented Aug 31, 2023 •

edited

sanchit-gandhi commented Aug 31, 2023

c1ekrt commented Sep 1, 2023

sanchit-gandhi commented Sep 1, 2023

Emmekea commented Nov 30, 2023

You should supply an instance of transformers.BatchFeature or list of transformers.BatchFeature to this method that includes input_values, but you provided ['file', 'audio', 'label'] #25748

You should supply an instance of transformers.BatchFeature or list of transformers.BatchFeature to this method that includes input_values, but you provided ['file', 'audio', 'label'] #25748

Comments

c1ekrt commented Aug 25, 2023 • edited by ArthurZucker

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Aug 25, 2023

sanchit-gandhi commented Aug 25, 2023

c1ekrt commented Aug 26, 2023

c1ekrt commented Aug 28, 2023 • edited

sanchit-gandhi commented Aug 29, 2023 • edited

c1ekrt commented Aug 31, 2023 • edited

sanchit-gandhi commented Aug 31, 2023

c1ekrt commented Sep 1, 2023

sanchit-gandhi commented Sep 1, 2023

Emmekea commented Nov 30, 2023

You should supply an instance of `transformers.BatchFeature` or list of `transformers.BatchFeature` to this method that includes input_values, but you provided ['file', 'audio', 'label'] #25748

You should supply an instance of `transformers.BatchFeature` or list of `transformers.BatchFeature` to this method that includes input_values, but you provided ['file', 'audio', 'label'] #25748

c1ekrt commented Aug 25, 2023 •

edited by ArthurZucker

c1ekrt commented Aug 28, 2023 •

edited

sanchit-gandhi commented Aug 29, 2023 •

edited

c1ekrt commented Aug 31, 2023 •

edited