# 60-flask-UI-transcribe.ipynb

This notebook creates all necessary files to create a flask app that transcribes incoming audio using WHISPER and prints it to the screen.
It has minimal bells + whistles for now.

Relies on the SpeechRecognition(https://pypi.org/project/SpeechRecognition/) package, and borrows code snippets from: https://github.com/mallorbc/whisper_mic and edits to use the API instead of downloaded model.
The flask code was mostly generated with ChatGPT.

## Get OpenAI API Key
This code uses the OpenAI API. Insert your key to write into the app file here.

In [1]:
import openai
from getpass import getpass
openai_api_key = getpass()

········


## Write the index.html File
This file edits how the flask webpage looks. The index.html file must be inside a `templates` directory, which we'll also make here.

In [None]:
!mkdir templates

The index.html file includes HTML code to display the transcription, and adds a button to start and stop recording.

In [None]:
%%writefile ./templates/index.html

<!DOCTYPE html>
<html>
<head>
    <title>Transcriber</title>
    <style>
        .recording {
            background-color: yellow;
        }
    </style>
    <script type="text/javascript">
        var source = new EventSource("/updates");
        source.onmessage = function(event) {
            document.getElementById("updated-text").innerHTML = event.data;
        };
        function toggleVariable() {
            var xhr = new XMLHttpRequest();
            xhr.open('POST', '/toggle', true);
            xhr.onreadystatechange = function() {
                if (xhr.readyState === XMLHttpRequest.DONE && xhr.status === 200) {
                    console.log('Toggle success');
                    // Change button color and text
                    var button = document.getElementById("toggle-button");
                    button.classList.toggle("recording");
                    if (button.innerHTML === "Start Recording") {
                        button.innerHTML = "Stop Recording";
                    } else {
                        button.innerHTML = "Start Recording";
                    }
                }
            };
            xhr.send()
        }
    </script>
</head>
<body>
    <h1>Transcriber</h1>
    <button id="toggle-button" onclick="toggleVariable()">Start Recording</button>
    <p id="updated-text"></p>
</body>
</html>

## Write the Flask Application

This contains the entire code for the flask application.
This code:
Opens a thread that continuously records audio and adds audio to a queue.
Opens a thread that continuously checks the audio queue for sound, then writes it to a file and transcribes it.
Opens a thread that continuously checks for transcription snippets and prints to the screen.

In [2]:
%%writefile app.py

import speech_recognition as sr # https://pypi.org/project/SpeechRecognition/
import queue
import time
import threading
import sys
import openai
openai.api_key = open_api_key

'''
Set up queues and microphone for recording
'''

# Queues
# Audio queue stores audio data from mic/recorder
# Result queue stores transcribed output
audio_queue  = queue.Queue()
result_queue = queue.Queue()

# variables to control app
global text_all
text_all = "" # output result string
break_threads = False # quit out

# SETUP MICROPHONE RECORDING
# Reference: https://github.com/Uberi/speech_recognition/blob/master/examples/background_listening.py
# General Reference: https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst

# Init mic and recorder
# Recorder will use the mic and callback function
# to constantly listen for audio and add it to the audio_queue
print('Detected Microphones:')
print(sr.Microphone.list_microphone_names()) # list microphones
mic = sr.Microphone()
recorder = sr.Recognizer()
sample_rate = mic.SAMPLE_RATE # use default sample rate for mic, whisper API can handle it

# Mic/Recorder Settings
pause_threshold = .5
energy = 300
dynamic_energy = False

# Represents the energy level threshold for sounds. Values below this threshold are considered silence, and values above this threshold are considered speech
recorder.energy_threshold = energy

# Represents the minimum length of silence (in seconds) that will register as the end of a phrase
# Smaller values result in the recognition completing more quickly, but might result in slower speakers being cut off
recorder.pause_threshold  = pause_threshold

# Automatically increase/decrease energy to account for ambient noise
recorder.dynamic_energy_threshold = dynamic_energy

# Adjusts the energy threshold dynamically
with mic as source:
    recorder.adjust_for_ambient_noise(source)

# 1ST THREAD - This is called from the background thread
# takes data from mic and adds directly to audio queue
def record_callback(_, audio:sr.AudioData):
    global toggle_variable
    if toggle_variable:
        # only add data to audio queue if button has been pressed
        data = audio.get_raw_data()
        audio_queue.put_nowait(data)

# Start listening in another thread
# Spawns a thread to repeatedly record phrases from mic
# phrase time limit - maximum length of recorded phrases (seconds)
recorder.listen_in_background(mic, record_callback, phrase_time_limit=10)
print("Microphone ready!")

'''
Define some necessary functions
'''

# Transcribe audio with WHISPER
def transcribe(audio_file):
    transcript = openai.Audio.transcribe("whisper-1", audio_file, language='en')
    return transcript

# Open audio file and transcribe
def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        transcript = transcribe(audio_file)

    return transcript["text"]

# Get audio data stored in audio_queue
# Pulls data from the queue that was put there with record_callback()
# Will pull data until queue is empty or (elapsed-time > min_time)
def get_all_audio(min_time=-1):
    audio = bytes()
    got_audio = False
    time_start = time.time()
    while not got_audio or time.time() - time_start < min_time: # min time unused right now
        # loops as long as there's something in the audio queue
        while not audio_queue.empty():
            audio += audio_queue.get() # pull data from audio queue
            got_audio = True

    data = sr.AudioData(audio,sample_rate,2)
    return data

# Get data from audio queue, save it as .wav, and transcribe .wav
# Adds transcribed .wav to result queue
def transcribe_data_from_queue():
    audio_data = get_all_audio() # get audio data from queue

    # Save audio data as .wav
    with open("latest.wav", "wb") as f:
        f.write(audio_data.get_wav_data())

    # Transcribe the saved audio file
    transcript_text = transcribe_audio('./latest.wav')

    # Add transcription to queue
    result_queue.put_nowait(transcript_text)

# Loop to run transcribe() in its own thread
# Continuosly grabs audio, transcribes it, and adds output to result queue
# status and break_threads need to be global?
def transcribe_loop():
    while True:
        if break_threads:
            break
        else:
            transcribe_data_from_queue()
    sys.exit()

# Loop to pull results and print it
# Continuosly grabs transcribed from result_queue and prints it
# result_queue, text_all, break_threads need to be global?
def print_result_loop():
    while True:
        result = result_queue.get() # get data from result queue
        print(result)
        global text_all
        text_all += result + '<br>'    # append it to output result string

        # If output result string too long, reset it
        #if len(text_all) > 2000:
        #    text_all = ""

        # Quit if 'stop' is said
        # need better way to quit threads?
        if result.lower().find('stop') > -1:
            #text_all += '. breaking...'
            break_threads = True
            break
    sys.exit()
    
'''
Run the flask app
'''
from flask import Flask, render_template, Response
import time

app = Flask(__name__)
app.debug = True

threading.Thread(target=print_result_loop).start()# print output thread
#thread1.daemon = True  # Set the thread to daemon so it ends when the main thread ends
#thread1.start()
threading.Thread(target=transcribe_loop).start()   # transcribe thread
#thread2.daemon = True  # Set the thread to daemon so it ends when the main thread ends
#thread2.start()

# Global variable
global toggle_variable
toggle_variable = False

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/toggle', methods=['POST'])
def toggle():
    global toggle_variable
    toggle_variable = not toggle_variable
    return 'Success'

@app.route('/updates')
def updates():
    def generate_updates():
        while True:
            # Generate the updated text here
            global text_all
            global toggle_variable
            updated_text = text_all

            # Yield the SSE-formatted response
            yield f"data: {updated_text}\n\n"
            
            # Wait before sending the next update
            time.sleep(0.2)

    return Response(generate_updates(), mimetype='text/event-stream')

if __name__ == '__main__':
    app.run(port=5001, use_reloader=False)


Overwriting app.py


## Run the Flask Application
This will open a new tab (follow the link below) in your browser. Push "start recording" to begin a transcription.

In [3]:
!python full_app.py

Detected Microphones:
['MacBook Pro Microphone', 'MacBook Pro Speakers', 'ZoomAudioDevice']
Microphone ready!
 * Serving Flask app 'full_app'
 * Debug mode: on
 * Running on http://127.0.0.1:8001
[33mPress CTRL+C to quit[0m
127.0.0.1 - - [17/Jul/2023 11:06:26] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [17/Jul/2023 11:06:26] "GET /updates HTTP/1.1" 200 -
127.0.0.1 - - [17/Jul/2023 11:06:28] "POST /toggle HTTP/1.1" 200 -
Thank you for watching!
Thank you.
^C
