We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformers
No response
examples
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[8], [line 1](vscode-notebook-cell:?execution_count=8&line=1) ----> [1](vscode-notebook-cell:?execution_count=8&line=1) asr_out = transcribing_pipe( [2](vscode-notebook-cell:?execution_count=8&line=2) '../SampleData/Saba_interview_short.wav', [3](vscode-notebook-cell:?execution_count=8&line=3) return_timestamps="word", [4](vscode-notebook-cell:?execution_count=8&line=4) generate_kwargs={"language": "danish"} [5](vscode-notebook-cell:?execution_count=8&line=5) ) [7](vscode-notebook-cell:?execution_count=8&line=7) asr_out File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\pipelines\automatic_speech_recognition.py:284, in AutomaticSpeechRecognitionPipeline.__call__(self, inputs, **kwargs) [221](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:221) def __call__( [222](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:222) self, [223](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:223) inputs: Union[np.ndarray, bytes, str], [224](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:224) **kwargs, [225](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:225) ): [226](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:226) """ [227](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:227) Transcribe the audio sequence(s) given as inputs to text. See the [`AutomaticSpeechRecognitionPipeline`] [228](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:228) documentation for more information. (...) [282](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:282) `"".join(chunk["text"] for chunk in output["chunks"])`. [283](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:283) """ --> [284](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:284) return super().__call__(inputs, **kwargs) File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\pipelines\base.py:1246, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs) [1244](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1244) return self.iterate(inputs, preprocess_params, forward_params, postprocess_params) [1245](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1245) elif self.framework == "pt" and isinstance(self, ChunkPipeline): -> [1246](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1246) return next( [1247](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1247) iter( [1248](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1248) self.get_iterator( [1249](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1249) [inputs], num_workers, batch_size, preprocess_params, forward_params, postprocess_params [1250](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1250) ) [1251](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1251) ) [1252](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1252) ) [1253](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1253) else: [1254](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1254) return self.run_single(inputs, preprocess_params, forward_params, postprocess_params) File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\pipelines\pt_utils.py:125, in PipelineIterator.__next__(self) [123](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:123) # We're out of items within a batch [124](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:124) item = next(self.iterator) --> [125](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:125) processed = self.infer(item, **self.params) [126](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:126) # We now have a batch of "inferred things". [127](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:127) if self.loader_batch_size is not None: [128](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:128) # Try to infer the size of the batch File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\pipelines\automatic_speech_recognition.py:587, in AutomaticSpeechRecognitionPipeline.postprocess(self, model_outputs, decoder_kwargs, return_timestamps, return_language) [584](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:584) stride_right /= sampling_rate [585](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:585) output["stride"] = chunk_len, stride_left, stride_right --> [587](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:587) text, optional = self.tokenizer._decode_asr( [588](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:588) model_outputs, [589](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:589) return_timestamps=return_timestamps, [590](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:590) return_language=return_language, [591](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:591) time_precision=time_precision, [592](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:592) ) [593](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:593) else: [594](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:594) items = np.concatenate(final_items, axis=1) File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\models\whisper\tokenization_whisper.py:832, in WhisperTokenizer._decode_asr(self, model_outputs, return_timestamps, return_language, time_precision) [831](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:831) def _decode_asr(self, model_outputs, *, return_timestamps, return_language, time_precision): --> [832](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:832) return _decode_asr( [833](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:833) self, [834](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:834) model_outputs, [835](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:835) return_timestamps=return_timestamps, [836](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:836) return_language=return_language, [837](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:837) time_precision=time_precision, [838](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:838) ) File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\models\whisper\tokenization_whisper.py:1032, in _decode_asr(tokenizer, model_outputs, return_timestamps, return_language, time_precision) [1030](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1030) current_tokens.append(token) [1031](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1031) if return_timestamps == "word": -> [1032](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1032) start_time = round(token_timestamps[i] + time_offset, 2) [1033](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1033) if i + 1 < len(token_timestamps): [1034](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1034) end_time = round(token_timestamps[i + 1] + time_offset, 2) IndexError: list index out of range
I've uploaded the audio that I am trying to process here
I have a discussion on the whisper's hub page, where I have implemented a fix that seems to work. here
Colab link here
The text was updated successfully, but these errors were encountered:
cc @sanchit-gandhi @kamilakesbi
Sorry, something went wrong.
No branches or pull requests
System Info
transformers
version: 4.42.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I've uploaded the audio that I am trying to process here
I have a discussion on the whisper's hub page, where I have implemented a fix that seems to work. here
Colab link here
Expected behavior
The text was updated successfully, but these errors were encountered: