<a href="https://colab.research.google.com/github/radhakrishnan-omotec/speechkraft-repo/blob/main/2_Speech_to_Text_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Speech to Text Engine

Python code represents a Speech Recognition Engine implemented on a Raspberry Pi microcontroller. Utilizing the SpeechRecognition library, PyAudio, and pyttsx3, the system captures user speech through an integrated microphone, transcribes it to text using Google's Speech Recognition service, and generates synthesized speech responses. The code serves as a foundation for integrating a Language Model (LLM) to analyze and enhance user interactions, offering capabilities such as sentence completion, synonym suggestions, and corrections. The user experience involves a dynamic interplay between spoken input and synthesized output, demonstrating the potential for advanced voice-based applications.

*Initialization*: The code initializes a SpeechRecognitionEngine class, incorporating a speech recognizer, a microphone, a text-to-speech (TTS) engine (pyttsx3), and the PyAudio library for audio functionality.

In [None]:
!pip install pyttsx3



In [None]:
!pip install PyAudio



In [None]:
!pip install SpeechRecognition



In [None]:
import speech_recognition as sr
import pyttsx3
import pyaudio

### The SpeechRecognitionEngine class:
This serves as an interface for capturing user speech, converting it to text, and generating synthesized speech outputs, making it a versatile tool for speech-based applications.

1. The constructor initializes key components: a speech recognizer (recognizer), a microphone (microphone), a text-to-speech engine (speaker), and an audio interface (audio). These components enable the class to capture audio, transcribe speech, and generate synthesized speech.

2. *Capture Audio*: The `capture_audio` method uses the microphone to capture user speech. It adjusts for ambient noise and returns the audio data.

3. *Speech to Text*: The `speech_to_text` method transcribes the captured audio into text using the Google Speech Recognition API. It handles unknown value errors and request errors.

4. *Text to Speech*: The `text_to_speech` method converts the LLM model outputs or any text into speech, utilizing the TTS engine.

In [None]:
class SpeechRecognitionEngine:
    def __init__(self):
        self.recognizer = sr.Recognizer()
        self.microphone = sr.Microphone()
        self.speaker = pyttsx3.init()
        self.audio = pyaudio.PyAudio()
        self.listening = True

    def announce(self, message):
        self.speaker.say(message)
        self.speaker.runAndWait()

    def capture_audio(self):
        with self.microphone as source:
            print("Listening...")
            self.recognizer.adjust_for_ambient_noise(source)
            audio_data = self.recognizer.listen(source)
        return audio_data

    def speech_to_text(self, audio_data):
        try:
            print("Transcribing...")
            text = self.recognizer.recognize_google(audio_data)
            print(f"User's speech: {text}")
            return text
        except sr.UnknownValueError:
            print("Speech recognition could not understand audio")
            return None
        except sr.RequestError as e:
            print(f"Could not request results from Google Speech Recognition service; {e}")
            return None

    def text_to_speech(self, output_text):
        print(f"Output: {output_text}")
        self.speaker.say(output_text)
        self.speaker.runAndWait()

    def stop_listen(self, text):
        return text and "stop" in text.lower()

5. *Main Code*: The main loop continuously captures user speech, transcribes it, and processes it using the LLM model. The LLM model outputs are then displayed and spoken back to the user.

In [None]:
if __name__ == "__main__":
    speech_engine = SpeechRecognitionEngine()

    while speech_engine.listening:
        user_audio = speech_engine.announce("Start Speaking")

        user_audio = speech_engine.capture_audio()
        user_text = speech_engine.speech_to_text(user_audio)

        if speech_engine.stop_listen(user_text):
            speech_engine.listening = False


Listening...
Transcribing...
User's speech: text message
Listening...
Transcribing...
User's speech: my first text message
Listening...
Transcribing...
Speech recognition could not understand audio
Listening...
Transcribing...
Speech recognition could not understand audio
Listening...
Transcribing...
User's speech: cartoon aap Dal sakte hain vah Badal Sakte to understand whether we can improve
Listening...
Transcribing...
Speech recognition could not understand audio
Listening...
Transcribing...
User's speech: WhatsApp
Listening...
Transcribing...
Speech recognition could not understand audio
Listening...
Transcribing...
Speech recognition could not understand audio
Listening...
Transcribing...
Speech recognition could not understand audio
Listening...
Transcribing...
User's speech: Vijay Ganesha ne bola tha ki preparation of charge and presentation values that can but before that try to see what are the
Listening...
Transcribing...
Speech recognition could not understand audio
Listeni