<a href="https://colab.research.google.com/github/niteshmaurya-max/notebooks/blob/main/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Create a Python program for Raspberry Pi 5 that uses the Gemini API to answer children's questions in simple language, taking voice input and providing voice output.

## Set up the raspberry pi environment

### Subtask:
Install necessary libraries for audio processing and the Gemini API.


**Reasoning**:
The first step is to update the system packages on the Raspberry Pi. This will ensure we have the latest versions of the software before installing new libraries.



In [None]:
# This command needs to be executed on the Raspberry Pi terminal
# !sudo apt update

**Reasoning**:
Now that the package list is updated, I need to upgrade the installed packages and then install the required libraries for audio processing (`portaudio19-dev`, `python3-pyaudio`) and the Gemini API (`google-generativeai`), speech-to-text (`SpeechRecognition`, `pyaudio`), and text-to-speech (`gTTS`). I can group these installation steps into a single code block for efficiency.



In [None]:
# These commands need to be executed on the Raspberry Pi terminal
# !sudo apt upgrade -y
# !sudo apt install portaudio19-dev python3-pyaudio -y
# !pip install google-generativeai SpeechRecognition pyaudio gTTS

## Implement speech-to-text

### Subtask:
Use a library to convert the child's voice input into text.


**Reasoning**:
Import the `speech_recognition` library and instantiate a `Recognizer` object. This addresses steps 1 and 2 of the instructions.



**Reasoning**:
The previous attempt to import `speech_recognition` failed because the library is not installed in this environment. Although the initial plan was to install libraries on the Raspberry Pi, the current execution environment is not that device. Since the subtask is to demonstrate the code for voice-to-text conversion, I need to install the required library in this environment to proceed.



## Implement speech-to-text

### Subtask:
Retry implementing speech-to-text. Use a library to convert the child's voice input into text.


**Reasoning**:
Implement the speech-to-text functionality using the `speech_recognition` and `pyaudio` libraries, handling potential errors.



**Reasoning**:
The previous attempt to import `speech_recognition` failed because the module was not found. Need to install the required libraries.



## Process the text input

### Subtask:
Use the Gemini API to understand the child's query and generate a simple, child-friendly response.


**Reasoning**:
Import the google.generativeai library, configure it with a placeholder API key, initialize the generative model, define a function to process child's queries, create a suitable prompt within the function, generate a response using the model, extract the text, and return the response.



## Implement text-to-speech

### Subtask:
Use a library to convert the Gemini API's text response into speech.


**Reasoning**:
Import the gTTS library and define a function to convert text to speech.



**Reasoning**:
The previous command failed because the `gtts` library was not found. Install the library using pip.



**Reasoning**:
Now that the `gTTS` library is installed, try importing it and defining the text-to-speech function again.



## Integrate the components

### Subtask:
Combine the Speech-to-Text, Gemini API processing, and Text-to-Speech components to create a conversational flow.


**Reasoning**:
Import the necessary libraries and functions from the previously completed subtasks: Speech-to-Text, Gemini API processing, and Text-to-Speech.



**Reasoning**:
The `speech_recognition` library is not installed. Install the missing library.



## Integrate the components

### Subtask:
Retry integrating the components. Combine the Speech-to-Text, Gemini API processing, and Text-to-Speech components to create a conversational flow.


**Reasoning**:
Attempt to install the `speech_recognition` library again, as it was not successfully installed in the previous attempt and is required for speech-to-text functionality.



## Integrate the components

### Subtask:
Retry integrating the components. Combine the Speech-to-Text, Gemini API processing, and Text-to-Speech components to create a conversational flow.


**Reasoning**:
Import the necessary libraries and define the `listen_and_respond` function to integrate the components for the conversational flow. Include error handling and the loop mechanism as requested.



## Integrate the components

### Subtask:
Retry integrating the components. Combine the Speech-to-Text, Gemini API processing, and Text-to-Speech components to create a conversational flow.


**Reasoning**:
Acknowledge the installation issue and present the integrated code with comments and instructions as requested.



## Test and refine

### Subtask:
Test the program with sample inputs and refine the responses and voice interactions for a child audience.


## Summary:

### Data Analysis Key Findings

*   The primary obstacle throughout the process was the inability to install the `speech_recognition` library and its dependencies (`pyaudio`) within the execution environment, which prevented the implementation and testing of the speech-to-text and integrated conversational flow components.
*   The code for processing text input using the Gemini API (`get_child_friendly_answer` function) was successfully developed, including setting up the model and crafting a child-friendly prompt structure.
*   The code for text-to-speech conversion using `gTTS` (`text_to_speech` function) was also successfully developed and demonstrated, although the actual audio playback on a Raspberry Pi would require additional steps.
*   Integrated Python code combining the three components was generated, but its execution failed due to the missing `speech_recognition` dependency.
*   Instructions and recommended installation commands for setting up the necessary libraries on the target Raspberry Pi environment were provided to the user.

### Insights or Next Steps

*   The user needs to execute the provided installation commands directly on their Raspberry Pi to install `speech_recognition`, `pyaudio`, and other required libraries before the integrated code can function correctly.
*   Once the libraries are installed on the Raspberry Pi, the user can execute the integrated Python script and proceed with testing and refining the program's performance with actual voice inputs and outputs.


In [None]:
# This code is intended to be run on a Raspberry Pi 5.
# Ensure you have the necessary libraries installed by running the following commands on your Raspberry Pi terminal:
# sudo apt update && sudo apt upgrade -y
# sudo apt install portaudio19-dev python3-pyaudio -y
# pip install google-generativeai SpeechRecognition pyaudio gTTS

import speech_recognition as sr
from gtts import gTTS
import os
import google.generativeai as gen

# Global variable to store conversation history
conversation_history = []

def get_child_friendly_answer(question):
  """
  Uses the Gemini API to answer a child's question in simple language,
  remembering previous turns in the conversation.

  Args:
    question: The child's question as a string.

  Returns:
    A string containing the child-friendly answer.
  """
  # Configure the Gemini API with your API key
  # Replace 'YOUR_API_KEY' with your actual API key
  # It is recommended to store your API key securely, e.g., using environment variables.
  gen.configure(api_key='YOUR_API_KEY')

  # Initialize the generative model
  # You can choose a different model if needed.
  model = gen.GenerativeModel('gemini-pro')

  # Add the current question to the conversation history
  conversation_history.append(f"Child: {question}")

  # Create a prompt for the model including the conversation history
  prompt = "Here is our conversation so far:\n" + "\n".join(conversation_history) + \
           "\n\nAnswer the child's question in simple language. After answering, ask if they want to start a new session to clear the history."

  # Generate a response from the model
  try:
    response = model.generate_content(prompt)
    # Extract the text content from the response
    answer = response.text
    # Add the AI's response to the conversation history
    conversation_history.append(f"AI: {answer}")
    return answer
  except Exception as e:
    print(f"Error generating response from Gemini API: {e}")
    return "Sorry, I couldn't get an answer for you right now."


def text_to_speech(text, output_file="output.mp3"):
  """
  Converts text to speech using gTTS and saves it to an audio file.

  Args:
    text: The input text string.
    output_file: The name of the output audio file (default: "output.mp3").
  """
  try:
    tts = gTTS(text=text, lang='en')
    tts.save(output_file)
    print(f"Speech saved to {output_file}")
    # On Raspberry Pi, you would typically play this file using a command like:
    # os.system("mpg321 " + output_file)
    # You might need to install mpg321: sudo apt-get install mpg321
  except Exception as e:
    print(f"Error during text-to-speech conversion: {e}")

def listen_and_respond():
  """
  Orchestrates the conversational flow: listen, process, and respond.
  Includes option to clear history.
  """
  global conversation_history # Declare that we are using the global variable
  r = sr.Recognizer()
  with sr.Microphone() as source:
    print("Adjusting for ambient noise, please wait...")
    r.adjust_for_ambient_noise(source, duration=5) # Adjust for 5 seconds of ambient noise
    print("Say something!")

    while True:
      try:
        print("Listening...")
        audio = r.listen(source, timeout=10, phrase_time_limit=10) # Listen for up to 10 seconds
        print("Processing...")
        text = r.recognize_google(audio)
        print(f"You said: {text}")

        if text.lower() == "goodbye":
          print("Goodbye!")
          text_to_speech("Goodbye!")
          break
        elif "start a new session" in text.lower() or "clear the history" in text.lower():
            print("Starting a new session...")
            conversation_history = [] # Clear the history
            text_to_speech("Okay, let's start a new session. What would you like to talk about?")
            continue # Skip to the next iteration of the loop

        answer = get_child_friendly_answer(text)
        print("Here is the answer...")
        print(answer)
        text_to_speech(answer)

      except sr.UnknownValueError:
        print("Could not understand audio")
        text_to_speech("Sorry, I didn't understand that.")
      except sr.RequestError as e:
        print(f"Could not request results from Google Speech Recognition service; {e}")
        text_to_speech("Sorry, there was an error with the speech recognition service.")
      except Exception as e:
        print(f"An unexpected error occurred: {e}")
        text_to_speech("Sorry, an error occurred.")

# To start the interaction on your Raspberry Pi, uncomment the line below and run the script:
# listen_and_respond()

# Example of how to get an answer without voice input (for testing the Gemini API part):
# test_question = "What do bees eat?"
# test_answer = get_child_friendly_answer(test_question)
# print(f"\nTest question: {test_question}")
# print(f"Test answer: {test_answer}")
# text_to_speech(test_answer, output_file="test_answer.mp3")