<img src="../images/coefficient-pyconde.png" width=1200>

# Whispered Secrets: Building An Open-Source Tool To Live Transcribe & Summarize Conversations
## 1. Transcription
**Questions?** contact@coefficient.ai / [@CoefficientData](https://twitter.com/CoefficientData)

---

## 0. Imports üì¶

In [None]:
from queue import Queue

import numpy as np
import speech_recognition as sr
import torch
import whisper

## 1. Listen üé§Ô∏è

<img src="../images/speechrecognition.png" width=1200>

<img src="../images/sr-enginesupport.png" width=400>

### Configure the microphone

In [None]:
sr.Microphone.list_microphone_names()

In [None]:
print("Available microphone devices are: ")
for index, name in enumerate(sr.Microphone.list_microphone_names()):
    print(f'{index}: Microphone with name "{name}" found')

In [None]:
mic_index = int(input("Please enter the index of the microphone you want to use: "))

In [None]:
source = sr.Microphone(sample_rate=16000, device_index=mic_index)

### Listen & transcribe

In [None]:
recorder = sr.Recognizer()

In [None]:
with sr.Microphone() as source:
    print("Say something!")
    audio = recorder.listen(source)

In [None]:
try:
    print(
        f"Whisper thinks you said: '{recorder.recognize_whisper(audio, language="english").strip()}'",
    )
except sr.UnknownValueError:
    print("Whisper could not understand audio")
except sr.RequestError as e:
    print(f"Could not request results from Whisper; {e}")

### Live transcription

In [None]:
audio_model = whisper.load_model("tiny.en")

In [None]:
# SpeechRecognizer will detect when speech ends.
recorder = sr.Recognizer()

# Energy level for mic to detect.
recorder.energy_threshold = 300

In [None]:
# Dynamic energy compensation lowers the energy threshold dramatically to
# a point where the SpeechRecognizer never stops recording.
recorder.dynamic_energy_threshold = False

In [None]:
with source:
    recorder.adjust_for_ambient_noise(source)

In [None]:
# Thread safe Queue for passing data from the threaded recording callback.
data_queue = Queue()

In [None]:
def record_callback(_, audio: sr.AudioData) -> None:
    """
    Threaded callback function to receive audio data when recordings finish.

    audio: An AudioData containing the recorded bytes.
    """
    data_queue.put(audio.get_raw_data())

In [None]:
transcription = [""]

#### üëá **START TALKING!**

In [None]:
# How real time the recording is in seconds.
record_timeout = 2.0

# Create a background thread that will pass us raw audio bytes.
# We could do this manually but SpeechRecognizer provides a nice helper.
recorder.listen_in_background(
    source,
    record_callback,
    phrase_time_limit=record_timeout,
)

print("Model 'tiny.en' loaded & listening...\n")

In [None]:
data_queue.empty()

In [None]:
data_queue

In [None]:
# Combine audio data from queue
audio_data = b"".join(list(data_queue.queue))
data_queue.queue.clear()

In [None]:
audio_data

In [None]:
from IPython.display import Audio

In [None]:
# Play the audio
sample_rate = 44100
Audio(audio_data, rate=sample_rate)

In [None]:
# Convert in-ram buffer to something the model can use directly without needing a
# temp file. Convert data from 16 bit wide integers to floating point with a width
# of 32 bits. Clamp the audio stream frequency to a PCM wavelength compatible
# default of 32768hz max.
audio_np = np.frombuffer(audio_data, dtype=np.int16).astype(np.float32) / 32768.0
audio_np

In [None]:
Audio(audio_np.tobytes(), rate=44100)

In [None]:
# Read the transcription.
result = audio_model.transcribe(audio_np, fp16=torch.cuda.is_available())
result

In [None]:
text = result["text"].strip()
text

In [None]:
transcription.append(text)

# 2. Live transcription demo - run `python -m demo.transcribe` from repo root üîä

<img src="../images/transcribe.gif" width=1200>

### Change #1: typer CLI

<img src="../images/typer.png" width=1000>

<img src="../images/typer2.png" width=1000>

### Change #2: Load tiny, small, medium models

<img src="../images/load-models.png" width=800>

### Change #3: Infinite loop!

<img src="../images/loop.png" width=800>

### Change #4: Phrase detection

<img src="../images/phrase1.png" width=800>

<img src="../images/phrase2.png" width=800>