# OpenVINO Speech-to-Text Audio Transcripts

The `SpeechToTextLoader` allows to transcribe audio files with the [OpenVINO Speech-to-Text OVMOdelForSpeechSeq2Seq API](https://github.com/huggingface/optimum-intel/blob/main/docs/source/openvino/inference.mdx) and loads the transcribed text into documents.

To use it, you should have the `optimum[openvino,nncf]` python package installed and ffmpeg.

- [Optimizing Whisper and Distil-Whisper for Speech Recognition with OpenVINO and NNCF](https://blog.openvino.ai/blog-posts/optimizing-whisper-and-distil-whisper-for-speech-recognition-with-openvino-and-nncf)

## Installation & setup

First, you need to install the `optimum[openvino,nncf]` [python package.](https://github.com/huggingface/optimum-intel?tab=readme-ov-file#installation)


In [None]:
%pip install --upgrade-strategy eager "optimum[openvino,nncf]==1.23.3" langchain-huggingface --quiet

## Example

The `OpenVINOSpeechToTextLoader` must include the `model_id` and `file_path` arguments. Audio files can be specified as a local file path.
Additional optional arguments can also increase performance such as `device="GPU"`, `load_in_8bit=True`, `batch_size=2`

In [None]:
!apt-get update && apt-get install wget ffmpeg -y
!wget -O audio.wav "https://github.com/intel/intel-extension-for-transformers/raw/refs/heads/main/intel_extension_for_transformers/neural_chat/assets/audio/welcome.wav"

In [None]:
from langchain_community.document_loaders import OpenVINOSpeechToTextLoader

model_id = "distil-whisper/distil-small.en"
file_path = "./audio.wav" # mp3 also supported

loader = OpenVINOSpeechToTextLoader(file_path, model_id, device="CPU") 

docs = loader.load()

Note: Calling `loader.load()` blocks until the transcription is finished.

The transcribed text is available in the `page_content`:

In [None]:
docs[0].page_content

```
"Welcome to neural chat."
```

The `metadata` contains the full JSON response with more meta information:

In [None]:
docs[0].metadata

```json
{
  'language': 'en-US',
  'timestamp': '(0.0, 13.0)','
  'result_total_latency': '2'
}
```