-
|
Hello! I'm currently reading in a .wav file, passing it into def audio_transcribe(audio: AudioSegment, audio_time_start: float, audio_time_end: float):
trimmed_audio = audio[(audio_time_start * 1000): (audio_time_end * 1000)] # convert seconds to MS
raw_audio = trimmed_audio.raw_data
loaded_audio = load_audio(raw_audio)
model = whisper.load_model("base")
result = model.transcribe(loaded_audio)
return result["text"]I've taken the load_audio function from this discussion def load_audio(file: (str, bytes), sr: int = 16000):
"""
Open an audio file and read as mono waveform, resampling as necessary
Parameters
----------
file: (str, bytes)
The audio file to open or bytes of audio file
sr: int
The sample rate to resample the audio if necessary
Returns
-------
A NumPy array containing the audio waveform, in float32 dtype.
"""
if isinstance(file, bytes):
inp = file
file = 'pipe:'
else:
inp = None
try:
# This launches a subprocess to decode audio while down-mixing and resampling as necessary.
# Requires the ffmpeg CLI and `ffmpeg-python` package to be installed.
out, _ = (
ffmpeg.input(file, threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr)
.run(cmd="ffmpeg", capture_stdout=True, capture_stderr=True, input=inp)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio:\n {e.stderr.decode()}") from e
return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0But I'm unable to get it to work. I always run into this exception when trying to use ffmpeg to preprocess the audio: I know the audio is valid after slicing (I can export to a chunk file and listen to it to make sure). So it seems Note: |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
As your input is raw from pipe then you need to describe input audio. For example: ffmpeg.input('pipe:', format="s16le", acodec="pcm_s16le", ac=1, ar=48000) |
Beta Was this translation helpful? Give feedback.
-
|
Thank you so much! I didn't think about needing the specify the input audio format as it doesn't look like the default whisper Ended up solving it with the below: out, _ = (
ffmpeg.input(file, format="s16le", acodec="pcm_s16le", ac=2, ar=48000)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr)
.run(cmd="ffmpeg", capture_stdout=True, capture_stderr=True, input=inp)
)Importantly I was getting gibberish translations until running |
Beta Was this translation helpful? Give feedback.
As your input is raw from pipe then you need to describe input audio. For example:
ffmpeg.input('pipe:', format="s16le", acodec="pcm_s16le", ac=1, ar=48000)