Skip to content

Comments

In-memory audio input mode#65

Open
amdrozdov wants to merge 1 commit intoshashikg:mainfrom
amdrozdov:main
Open

In-memory audio input mode#65
amdrozdov wants to merge 1 commit intoshashikg:mainfrom
amdrozdov:main

Conversation

@amdrozdov
Copy link

Hello Whisper S2T team!

In our project we need to work with pre-loaded audio chunks and I did a small PR that adds file_io flag to the whisper s2t model. This mode allows to call transcribe() with np.arrays (without working with file io). Usage example:

model = whisper_s2t.load_model(
    model_identifier=./models/faster-whisper-large-v3",
    backend='CTranslate2',
    n_mels=128,
    file_io=False
)

# some audio chunks
audio_chunks = [np.frombuffer(my_data, np.int16).flatten().astype(np.float32)/32768.0]

result = model.transcribe(audio_chunks,
  lang_codes=lang_codes,
  tasks=tasks,
  initial_prompts=initial_prompts,
  batch_size=32
)

Please let me know if I will need to change tests or benchmarks as well in order to to merge the PR.

P.S. There is a ticket #25 and this PR can be a first step for it. (if we control external VAD and hypothesis buffer outside of whisper s2t).

Best regards,
Andrei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant