# Introduction to Speech Recognition

## 1. **Key Concepts:**

* **What is Speech Recognition?**
    - Speech recognition refers to the process of converting spoken words into text.
    - It involves recognizing audio signals, processing them, and translating them into meaningful text.
    - **Real-life applications**:
      - **Virtual Assistants**: Siri, Alexa, Google Assistant.
      - **Dictation Software**: Tools for converting speech into text for documents, emails, etc.
      - **Voice-controlled Systems**: Smart homes, in-car systems.
      - **Accessibility**: Assisting people with disabilities by converting spoken commands to digital actions.

* **Challenges in Speech Recognition**:
    - **Accents and Dialects**: Varying ways of pronouncing words can confuse systems.
    - **Background Noise**: Competing sounds (music, people talking) can degrade the accuracy.
    - **Multiple Speakers**: Difficulty distinguishing between voices when more than one person speaks.
    - **Language Variations**: Handling different languages or multilingual speakers.
    
* **Popular Speech Recognition Systems**:
    - **Google Speech API**: One of the most widely used, supports many languages (What we'll also use here).
    - **Amazon Transcribe**: Offers scalable speech recognition for businesses.
    - **IBM Watson Speech to Text**: Known for accuracy and advanced language models.
    - **Microsoft Azure Speech**: Provides real-time transcription with support for multiple languages.

## 2. Speech Recognition in Python 

* **SpeechRecognition Library**:
    - The **SpeechRecognition** library is a simple Python package that allows speech-to-text conversion.
    - It supports several speech engines and APIs like Google Web Speech API, Microsoft Bing Voice Recognition, and others.
    - **Why use it**: It abstracts much of the complexity of handling audio input and communicating with APIs, making it easier to integrate speech recognition into Python applications.

* **Streamlit Library**:
    - **Streamlit** is a Python framework that helps you create web apps quickly.
    - It’s particularly useful for building interactive applications like the **Speech Recognition** app because it can display buttons, texts, and real-time outputs easily.
    - **Why use Streamlit**: It simplifies the front-end work, allowing developers to focus on the functionality (like speech transcription), and is great for rapid prototyping.


## 3. Setting Up Speech Recognition in Python

1. **Install Required Libraries**:
    - **SpeechRecognition**: For converting spoken words into text.
      ```bash
      pip install SpeechRecognition
      ```
    - **Streamlit**: For building a simple web interface.
      ```bash
      pip install streamlit
      ```

2. **Install PyAudio (if using microphone input)**:
    - PyAudio is required for capturing real-time audio input from the microphone.
    - It can be tricky to install on some platforms, especially Windows. Install with:
      ```bash
      pip install pyaudio
      ```
    - **Note**: PyAudio is not required if you're transcribing pre-recorded audio files.

**4.Code Breakdown**

In [2]:
# Create the file speech_recognition_app.py in write mode
with open("speech_recognition_app.py", "w") as file:
    # Writing the Streamlit code into the file
    file.write('''
    
# We'll go through a simple app that records audio from the microphone and transcribes it into text using Google's Speech API.

# Step 1: Import Required Libraries
import streamlit as st
import speech_recognition as sr


# Step 2: Define the Speech Recognition Function
def transcribe_speech():
    # Initialize recognizer class
    r = sr.Recognizer()
    
    # Use the microphone as the audio source
    with sr.Microphone() as source:
        st.info("Speak now...")
        audio_text = r.listen(source)
        st.info("Transcribing...")

        try:
            # Using Google Speech Recognition
            text = r.recognize_google(audio_text)
            return text
        except:
            return "Sorry, I did not get that."

## Explanation:
#  - "sr.Recognizer()": Initializes the recognizer object.
#  - "sr.Microphone()": Captures real-time audio from the microphone.
#  - "r.listen()": Listens to the speech and stores it as an audio object.
#  - "r.recognize_google()": Uses Google's speech recognition engine to convert audio into text.

## Working with Audio Files
# If you want to transcribe pre-recorded audio files instead of using the microphone, PyAudio is not needed. Here's how you can modify the transcribe_speech() function:

def transcribe_audio_file(file_path):
    # Initialize recognizer class
    r = sr.Recognizer()

    # Use the pre-recorded audio file as the source
    with sr.AudioFile(file_path) as source:
        audio_text = r.record(source)  # Read the audio file
        st.info("Transcribing...")

        try:
            # Using Google Speech Recognition
            text = r.recognize_google(audio_text)
            return text
        except:
            return "Sorry, I did not get that."
            
## Explanation:
# sr.AudioFile(): Opens the audio file as a source.
# r.record(): Reads the audio from the file, instead of capturing it live.


# Step 3: Create the Main Function
def main():
    st.title("Speech Recognition App")
    st.write("Click on the button to transcribe audio file:")
    
    if st.button("Start Transcription"):
        file_path = r"C:\\Users\\pc\\Desktop\\B-older\\Data and Stuff\\GMC\\ML GMC\\harvard.wav"
        text = transcribe_audio_file(file_path)
        #text = transcribe_speech() 
        st.write("Transcription: ", text)

  
## Explanation:
#  - "st.button()": Creates an interactive button in the Streamlit interface.
#  - "transcribe_audio_file()": When the button is clicked, the function is called to transcribe the audio file into text.



# Step 4: Run the App:

if __name__ == "__main__":
    main()

# This ensures the app runs when executed.
    ''')

print("speech_recognition_app.py creation executed successfully!")

speech_recognition_app.py creation executed successfully!


#### 5. Working with Audio Files

If you want to transcribe pre-recorded audio files instead of using the microphone, PyAudio is **not** needed. Here's how you can modify the `transcribe_speech()` function:

```python
def transcribe_audio_file(file_path):
    # Initialize recognizer class
    r = sr.Recognizer()

    # Use the pre-recorded audio file as the source
    with sr.AudioFile(file_path) as source:
        audio_text = r.record(source)  # Read the audio file
        st.info("Transcribing...")

        try:
            # Using Google Speech Recognition
            text = r.recognize_google(audio_text)
            return text
        except:
            return "Sorry, I did not get that."
```

* **Explanation**:
  - `sr.AudioFile()`: Opens the audio file as a source.
  - `r.record()`: Reads the audio from the file, instead of capturing it live.

#### 6. Troubleshooting and Common Issues

1. **PyAudio Installation Issues**:
   - PyAudio is essential if you’re using a microphone. On Windows, you may encounter errors like `error: Microsoft Visual C++ Build Tools required`.
     - Solution: Download pre-built PyAudio binaries from [this site](https://www.lfd.uci.edu/~gohlke/pythonlibs/) if you're on Windows.

2. **Recognize Errors**:
   - Speech recognition sometimes fails due to poor microphone input or background noise.
     - Solution: Encourage the user to speak clearly or adjust the microphone sensitivity.

## Speech Enabled Chatbots

Now, we will incorporate speech recognition functionality into the **chatbot** we created earlier, we'll follow these steps:

1. Add the **speech recognition** features so that the user can either speak or type their question.
2. Update the chatbot to **transcribe speech** using the microphone or an audio file, and pass the transcribed text to the chatbot system.
3. Ensure the chatbot processes both types of input (typed or transcribed).

In [None]:
# Create the file wonderland_voice_chatbot.py in write mode
with open("wonderland_voice_chatbot.py", "w") as file:
    # Writing the Streamlit code into the file
    file.write('''
    
##### Chatbot with Speech Recognition #####

# Import necessary libraries
import nltk  # Natural Language Toolkit for text processing
import streamlit as st  # Streamlit for building the web app interface
from nltk.tokenize import word_tokenize, sent_tokenize  # Tokenizing sentences and words
from nltk.corpus import stopwords  # Stopwords list to remove common words like "the", "and"
from nltk.stem import WordNetLemmatizer  # Lemmatizer for converting words to their root form
import string  # String for handling punctuation
import speech_recognition as sr  # Speech recognition library for converting speech to text

# Initialize the speech recognition engine
recognizer = sr.Recognizer()

# Load stopwords (common words to be removed) and initialize the lemmatizer (for root word extraction)
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

# Step 1: Function to transcribe live speech input using the microphone
def transcribe_speech():
    try:
        with sr.Microphone() as source:  # Use the microphone as the audio source
            st.info("Speak now... Please speak clearly!")
            audio = recognizer.listen(source)  # Capture the speech from the microphone
            st.info("Transcribing speech...")
            return recognizer.recognize_google(audio)  # Transcribe speech to text using Google API
    except sr.UnknownValueError:  # Error handling if speech is not recognized
        return "Sorry, I couldn't understand that."
    except sr.RequestError:  # Error handling for API unavailability
        return "API unavailable or unresponsive."
    except Exception as e:  # Generic error handling
        return f"An error occurred: {str(e)}"

# Step 2: Function to transcribe an audio file input
def transcribe_audio_file(file_path):
    try:
        with sr.AudioFile(file_path) as source:  # Use the uploaded audio file as the source
            audio = recognizer.record(source)  # Record the entire file
            st.info("Transcribing audio file...")
            return recognizer.recognize_google(audio)  # Transcribe the audio to text using Google API
    except sr.UnknownValueError:  # Error handling for unrecognized audio
        return "Sorry, I couldn't understand the audio."
    except sr.RequestError:  # Error handling if the API is unresponsive
        return "API unavailable or unresponsive."
    except Exception as e:  # Generic error handling
        return f"An error occurred: {str(e)}"
    
# Text preprocessing function: tokenizes, removes stopwords, lemmatizes, and removes punctuation
def preprocess(sentence):
    words = word_tokenize(sentence.lower())  # Tokenize the sentence into words and convert to lowercase
    words = [word for word in words if word not in stop_words and word not in string.punctuation]  # Remove stopwords and punctuation
    words = [lemmatizer.lemmatize(word) for word in words]  # Lemmatize the words to their root forms
    return words

# Function to load and preprocess the text file (Alice in Wonderland)
def load_text():
    try:
        file_path = r'C:\\Users\\pc\\Desktop\\B-older\\Data and Stuff\\GMC\\ML GMC\\alice_in_wonderland.txt'  # Path to the text file
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read().replace('\\n', ' ')  # Load and return the text, replacing line breaks with spaces
    except FileNotFoundError:  # Error handling if the file is not found
        st.error("Text file not found.")
        return ""

# Tokenizes the text into sentences and preprocesses each sentence
def prepare_corpus(text):
    sentences = sent_tokenize(text)  # Split the text into individual sentences
    return [preprocess(sentence) for sentence in sentences]  # Preprocess each sentence

# Function to calculate Jaccard similarity between two sets of words
def jaccard_similarity(query, sentence):
    query_set = set(query)  # Convert the query to a set of unique words
    sentence_set = set(sentence)  # Convert the sentence to a set of unique words
    if len(query_set.union(sentence_set)) == 0:  # If both sets are empty
        return 0
    return len(query_set.intersection(sentence_set)) / len(query_set.union(sentence_set))  # Calculate Jaccard similarity

# Function to find the most relevant sentence based on Jaccard similarity
def get_most_relevant_sentence(query, corpus, original_sentences):
    query = preprocess(query)  # Preprocess the user query
    max_similarity = 0  # Initialize maximum similarity
    best_sentence = "I couldn't find a relevant answer."  # Default response if no relevant sentence is found
    for i, sentence in enumerate(corpus):  # Iterate through each sentence in the corpus
        similarity = jaccard_similarity(query, sentence)  # Calculate similarity between query and the current sentence
        if similarity > max_similarity:  # If the current sentence has higher similarity than the previous max
            max_similarity = similarity
            best_sentence = original_sentences[i]  # Set the best sentence to the original sentence with the highest similarity
    return best_sentence

# Main function to create the Streamlit interface for the chatbot
def main():
    st.title("Wonderland's Novice Chatbot with Voice Input")  # App title
    st.write("Ask me anything related to Alice in Wonderland! You can either speak, upload an audio file, or type your question.")  # App description

    # Suggestions for users to try asking
    with st.expander("Click me for suggestions"):
        st.write("""
        1. Who does Alice meet first in Wonderland?
        2. What is the Cheshire Cat's famous line?
        3. How does Alice enter Wonderland?
        4. What is the Queen of Hearts known for?
        """)

    # Load the text of Alice in Wonderland and prepare the corpus
    text = load_text()  # Load the text
    if text:
        corpus = prepare_corpus(text)  # Preprocess the corpus (tokenized and cleaned)
        original_sentences = sent_tokenize(text)  # Store the original sentences

        # User input options: live speech, file upload, or text input
        speech_input = st.button("Speak your question")  # Button for live speech input
        file_input = st.file_uploader("Upload an audio file:", type=['wav', 'mp3'])  # Upload an audio file
        user_input = st.text_input("Or type your question:")  # Text input option

        # Handle live speech input
        if speech_input:
            user_input = transcribe_speech()  # Transcribe the speech to text
            st.write(f"Transcribed Text: {user_input}")  # Display the transcribed text

        # Handle audio file input
        if file_input:
            user_input = transcribe_audio_file(file_input)  # Transcribe the audio file to text
            st.write(f"Transcribed Audio File: {user_input}")  # Display the transcribed text

        # Process the user input if available
        if user_input:
            response = get_most_relevant_sentence(user_input, corpus, original_sentences)  # Get the most relevant sentence
            st.write(f"Chatbot: {response}")  # Display the chatbot's response
        else:
            st.write("Please ask a question by speaking, uploading a file, or typing.")  # Prompt the user to ask a question

# Run the Streamlit app
if __name__ == "__main__":  # If the script is being run directly
    main()  # Run the main function
    ''')

print("wonderland_voice_chatbot.py creation executed successfully!")

### Running the App:
- **Live Speech**: Press the "Speak your question" button to record audio and have it transcribed.
- **Audio File**: Upload an audio file using the file uploader.
- **Text Input**: Type your question in the text input field.

The chatbot will respond based on the processed input, and you can continue interacting with it through any of the available options.

---
_**Your Dataness**_,  
**`Obinna Oliseneku`** (_**Hybraid**_)  
**[LinkedIn](https://www.linkedin.com/in/obinnao/)** | **[GitHub](https://github.com/hybraid6)**  