<a href="https://colab.research.google.com/github/kmpsaravanan/outskill/blob/main/Audio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to Colab!

## Explore the Gemini API
The Gemini API gives you access to Gemini models created by Google DeepMind. Gemini models are built from the ground up to be multimodal, so you can reason seamlessly across text, images, code and audio.

**How to get started**
*  Go to <a href="https://aistudio.google.com/">Google AI Studio</a> and log in with your Google Account.
*  <a href="https://aistudio.google.com/app/apikey">Create an API key</a>.
* Use a quickstart for <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Get_started.ipynb">Python</a> or call the REST API using <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/rest/Prompting_REST.ipynb">curl</a>.

**Discover Gemini's advanced capabilities**
*  Play with Gemini <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Image-out.ipynb">multimodal outputs</a>, mixing text and images in an iterative way.
*  Discover the <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Get_started_LiveAPI.ipynb">multimodal Live API</a> &#40;demo <a href="https://aistudio.google.com/live">here</a>&#41;.
*  Learn how to <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Spatial_understanding.ipynb&quot;">analyse images and detect items in your pictures</a> using Gemini &#40;bonus, there's a <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Spatial_understanding_3d.ipynb">3D version</a> as well!&#41;.
*  Unlock the power of the <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Get_started_thinking.ipynb">Gemini thinking model</a>, capable of solving complex tasks with its inner thoughts.
      
**Explore complex use cases**
*  Use <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Search_grounding_for_research_report.ipynb">Gemini grounding capabilities</a> to create a report on a company based on what the model can find on the Internet.
*  Extract <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Pdf_structured_outputs_on_invoices_and_forms.ipynb">invoices and form data from PDFs</a> in a structured way.
*  Create <a href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Book_illustration.ipynb">illustrations based on a whole book</a> using Gemini large context window and Imagen.

To learn more, take a look at the <a href="https://github.com/google-gemini/cookbook">Gemini cookbook</a> or visit the <a href="https://ai.google.dev/docs/">Gemini API documentation</a>.


# New section

In [7]:
#%pip install transformers
from transformers import pipeline

# summarizer = pipeline("summarization")
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

Device set to use cuda:0


In [8]:
sample_text = """
Transformers is a Python library from Hugging Face that provides APIs and tools to easily download and train state-of-the-art pretrained models. These models can be used for tasks like text classification, question answering, machine translation, and text summarization. The library is designed to be flexible and easy to use, with a focus on interoperability between different deep learning frameworks like PyTorch, TensorFlow, and JAX. It also includes tools for dataset loading, tokenization, and model training and evaluation.
"""

summary = summarizer(sample_text, max_length=50, min_length=25, do_sample=False)
print(summary[0]['summary_text'])

Transformers is a Python library from Hugging Face that provides APIs and tools to easily download and train state-of-the-art pretrained models. These models can be used for tasks like text classification, question answering, machine translation


In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [9]:
import gradio as gr
from transformers import pipeline
import datetime

# Ensure summarizer is defined (assuming LzSsLzRTxjEN was run)
try:
    if 'summarizer' not in locals() and 'summarizer' not in globals():
        summarizer = pipeline("summarization", model="facebook/bart-large-cnn") # Re-initialize if not found
except Exception as e:
    print(f"Error initializing summarizer outside function: {e}")

def process_text(detailed_text, max_length, min_length):
  # This function will be called when the button is clicked
  # Use the summarizer on the detailed_text with dynamic length parameters
  try:
      # Convert slider values to integers
      max_length = int(max_length)
      min_length = int(min_length)

      if 'summarizer' in locals() or 'summarizer' in globals():
          summary = summarizer(detailed_text, max_length=max_length, min_length=min_length, do_sample=False)
          return summary[0]['summary_text']
      else:
          return "Error: Summarizer not initialized."
  except Exception as e:
      return f"An error occurred during summarization: {e}"

def export_summary(summary_text):
  """Exports the summary text to a file with a timestamp."""
  if summary_text:
    timestamp = datetime.datetime.now().strftime("%H%M%S")
    filename = f"Summary_{timestamp}.txt"
    with open(filename, "w") as f:
      f.write(summary_text)
    return f"Summary exported to {filename}"
  else:
    return "No summary to export."

with gr.Blocks() as demo:
    gr.Label("Text Processing Interface")
    detailed_text_input = gr.Textbox(label="Detailed text")
    max_length_slider = gr.Slider(minimum=10, maximum=200, value=50, label="Max Length")
    min_length_slider = gr.Slider(minimum=5, maximum=100, value=25, label="Min Length")
    process_button = gr.Button("Process Text")
    output_text = gr.Textbox(label="Summary")
    export_button = gr.Button("Export Summary")
    # Define a component to display the export message
    export_message_output = gr.Textbox(label="Export Status", interactive=False)


    process_button.click(
        fn=process_text,
        inputs=[detailed_text_input, max_length_slider, min_length_slider],
        outputs=output_text
    )

    export_button.click(
        fn=export_summary,
        inputs=output_text,
        outputs=export_message_output # Use the explicit export message component
    )

demo.launch(share=False)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.
* To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



# Task
Integrate a text-to-audio conversion feature into the Gradio interface in cell `lQIPelZIzqCW`, allowing the user to listen to the summarized text.

## Install necessary libraries

### Subtask:
Install a Python library for text-to-speech conversion.


**Reasoning**:
Install the `gTTS` library using pip.



In [10]:
%pip install gTTS



## Import modules

### Subtask:
Import the required modules for text-to-speech.


**Reasoning**:
Import the necessary libraries for text-to-speech conversion as required by the instructions.



In [26]:
from gtts import gTTS
import os

## Add tts function

### Subtask:
Create a Python function that takes text as input and returns the path to an audio file or audio data.


**Reasoning**:
Define the `text_to_audio` function to convert text to speech and return the audio file path.



In [27]:
def text_to_audio(text_input):
  """Converts text to speech and saves it to a temporary file."""
  try:
    tts = gTTS(text=text_input, lang='en')
    audio_filename = "summary_audio.mp3"  # Use a consistent filename
    tts.save(audio_filename)
    return audio_filename
  except Exception as e:
    print(f"Error during text-to-audio conversion: {e}")
    return None

## Modify gradio interface

### Subtask:
Update the Gradio interface in cell `lQIPelZIzqCW` to include:
- An input for the text to be converted to audio (likely linked to the summary output).
- A button to trigger the text-to-audio conversion.
- An audio output component to play the generated audio.


**Reasoning**:
Update the Gradio interface to include the text-to-audio button and audio output component.



In [29]:
import gradio as gr
from transformers import pipeline
import datetime
from gtts import gTTS
import os
import numpy as np
import soundfile as sf

# Ensure summarizer is defined (assuming LzSsLzRTxjEN was run)
try:
    if 'summarizer' not in locals() and 'summarizer' not in globals():
        summarizer = pipeline("summarization", model="facebook/bart-large-cnn") # Re-initialize if not found
except Exception as e:
    print(f"Error initializing summarizer outside function: {e}")

def process_text(detailed_text, max_length, min_length):
  # This function will be called when the button is clicked
  # Use the summarizer on the detailed_text with dynamic length parameters
  try:
      # Convert slider values to integers
      max_length = int(max_length)
      min_length = int(min_length)

      if 'summarizer' in locals() or 'summarizer' in globals():
          summary = summarizer(detailed_text, max_length=max_length, min_length=min_length, do_sample=False)
          return summary[0]['summary_text']
      else:
          return "Error: Summarizer not initialized."
  except Exception as e:
      return f"An error occurred during summarization: {e}"

def export_summary(summary_text):
  """Exports the summary text to a file with a timestamp."""
  if summary_text:
    timestamp = datetime.datetime.now().strftime("%H%M%S")
    filename = f"Summary_{timestamp}.txt"
    with open(filename, "w") as f:
      f.write(summary_text)
    return f"Summary exported to {filename}"
  else:
    return "No summary to export."

def text_to_audio(text_input):
  """Converts text to speech and returns the audio data or an error message."""
  try:
    if not text_input:
        return None, "Error: No text provided for audio conversion."

    tts = gTTS(text=text_input, lang='en')
    audio_filename = "summary_audio.mp3"
    tts.save(audio_filename)

    try:
        audio_data, sample_rate = sf.read(audio_filename)
        os.remove(audio_filename) # Remove the temporary file
        # Return tuple of sample rate and audio data, and an empty error message
        return (sample_rate, audio_data), ""

    except Exception as e:
        # If soundfile fails, return None for audio and an error message
        error_message = f"Error reading audio file with soundfile: {e}"
        print(error_message) # Also print to console for debugging
        if os.path.exists(audio_filename):
            os.remove(audio_filename) # Clean up the temporary file if it exists
        return None, error_message

  except Exception as e:
    # Catch errors during gTTS conversion
    error_message = f"Error during text-to-audio conversion: {e}"
    print(error_message) # Also print to console for debugging
    return None, error_message


with gr.Blocks() as demo:
    gr.Label("Text Processing Interface")
    detailed_text_input = gr.Textbox(label="Detailed text")
    max_length_slider = gr.Slider(minimum=10, maximum=200, value=50, label="Max Length")
    min_length_slider = gr.Slider(minimum=5, maximum=100, value=25, label="Min Length")
    process_button = gr.Button("Process Text")
    output_text = gr.Textbox(label="Summary")
    export_button = gr.Button("Export Summary")
    export_message_output = gr.Textbox(label="Export Status", interactive=False)

    convert_audio_button = gr.Button("Convert Summary to Audio")
    summary_audio_output = gr.Audio(label="Summary Audio", interactive=True)
    # Add a textbox to display audio generation errors
    audio_error_output = gr.Textbox(label="Audio Error", interactive=False, visible=True)


    process_button.click(
        fn=process_text,
        inputs=[detailed_text_input, max_length_slider, min_length_slider],
        outputs=output_text
    )

    export_button.click(
        fn=export_summary,
        inputs=output_text,
        outputs=export_message_output
    )

    # Link the convert audio button to the text_to_audio function
    # Update outputs to include the audio error output
    convert_audio_button.click(
        fn=text_to_audio,
        inputs=output_text,
        outputs=[summary_audio_output, audio_error_output]
    )


demo.launch(share=False)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.
* To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



In [24]:
!apt-get update && apt-get install ffmpeg libsndfile1 -y

0% [Working]            Hit:1 https://cli.github.com/packages stable InRelease
0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com (91.18                                                                               Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:7 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:10 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,827 kB]
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubunt

## Link components and functions

### Subtask:
Connect the new input, button, and output components to the TTS function within the `gr.Blocks` structure.


**Reasoning**:
Connect the convert audio button to the text-to-audio function within the gradio interface.



In [None]:
# The previous code block successfully modified the Gradio interface to include the necessary components and linked the process and export buttons.
# The subtask is to link the convert audio button to the text-to_audio function.
# This was already accomplished in the previous code block.

# Therefore, the task is completed as the connection is already established in the previous step.
pass

## Summary:

### Data Analysis Key Findings

* The `gTTS` library was successfully installed and imported for text-to-speech conversion.
* A Python function `text_to_audio` was created to convert text input into an MP3 audio file named "summary\_audio.mp3" and return the file path.
* The Gradio interface was successfully modified to include a "Convert Summary to Audio" button and a "Summary Audio" output component.
* The "Convert Summary to Audio" button was linked to the `text_to_audio` function, taking the summary text as input and displaying the resulting audio in the dedicated audio output component.

### Insights or Next Steps

* The `text_to_audio` function was modified to return audio data directly to Gradio for playback and download.
* The Gradio `gr.Audio` component was made interactive to enable audio playback and download.
* Consider adding options for different languages or audio formats to the `text_to_audio` function.

## Summary:

### Data Analysis Key Findings

*   The `gTTS` library was successfully installed and imported for text-to-speech conversion.
*   A Python function `text_to_audio` was created to convert text input into an MP3 audio file named "summary\_audio.mp3" and return the file path.
*   The Gradio interface was successfully modified to include a "Convert Summary to Audio" button and a "Summary Audio" output component.
*   The "Convert Summary to Audio" button was linked to the `text_to_audio` function, taking the summary text as input and displaying the resulting audio in the dedicated audio output component.

### Insights or Next Steps

*   Ensure the "summary\_audio.mp3" filename is unique or handled appropriately for multiple conversions to prevent overwriting.
*   Consider adding options for different languages or audio formats to the `text_to_audio` function.
