## Conversational Audio Restoration Agent using Gemini and Gradio

This notebook implements an interactive web application where users can upload audio files, ask questions about them (like "What is the noise floor?"), and request noise reduction (like "Remove noise by 5 dB"). The application uses the Google Gemini API for natural language understanding and function calling, `librosa` and `noisereduce` for audio processing, and `Gradio` for the user interface.

**Key Features:**

*   **Conversational Interface:** Interact with the system using natural language queries.
*   **Audio Upload:** Supports uploading audio files (WAV recommended).
*   **Noise Floor Analysis:** Estimates and reports the background noise level in dB using the `get_noise_floor` function.
*   **Noise Reduction:** Applies noise reduction using spectral gating via the `reduce_noise_by_db` function, controlled by the user's request (e.g., "by 5 dB").
*   **Gemini Function Calling:** Leverages Gemini's ability to understand the user's intent and automatically call the appropriate Python function (`get_noise_floor` or `reduce_noise_by_db`) with the correct parameters.
*   **Visual Feedback:** Displays spectrograms of the original and denoised audio for visual comparison.
*   **Audio Playback:** Allows playback of the original (via re-upload if needed) and denoised audio.

**Potential Future Enhancements:**

*   **Support for More Formats:** Improve audio loading robustness (e.g., ensure FFmpeg is reliably used) to handle MP3, M4A, etc.
*   **Advanced Noise Reduction:** Integrate more sophisticated denoising models (e.g., deep learning based) or offer different algorithm choices.
*   **Parameter Tuning:** Allow users to fine-tune noise reduction parameters via the UI (e.g., aggressiveness, frequency range).
*   **Streaming Audio:** Support processing real-time audio streams.
*   **Deployment:** Package the application for deployment (e.g., using Docker, Hugging Face Spaces).


In [1]:
# Remove unused conflicting packages
#!pip uninstall -qqy jupyterlab kfp 2>/dev/null
# Install specific google-genai version used in the original notebook
!pip install -U -q "google-genai==1.7.0"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25h

## 1. Install Dependencies

*   This cell installs the necessary Python libraries required for the application.
*   `google-genai`: The official Google Gemini SDK for Python.
*   `gradio`: Used to create the interactive web UI.
*   `librosa`: A powerful library for audio analysis (loading, spectrograms).
*   `matplotlib`: Used by librosa for plotting spectrograms.
*   `noisereduce`: Performs the noise reduction algorithm.
*   `soundfile`: Used by librosa (and directly) for reading/writing audio files (often needs system libraries like `libsndfile`).
*   `numpy`: Fundamental package for numerical operations.
*   `ffmpeg-python`: Python bindings for FFmpeg. `librosa`'s fallback audio loading mechanism (`audioread`) often requires the FFmpeg multimedia framework to be installed on your system to handle various audio formats (like MP3 or certain WAV encodings). You might need to install FFmpeg separately using your system's package manager (e.g., `sudo apt update && sudo apt install ffmpeg` on Debian/Ubuntu, `brew install ffmpeg` on macOS).


In [2]:
# Install required packages if not already installed
# Uncomment the line below to run the installation if needed
!pip install  gradio librosa matplotlib noisereduce soundfile numpy ffmpeg-python --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.9/46.9 MB[0m [31m36.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m322.2/322.2 kB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.5/11.5 MB[0m [31m70.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m0:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h

## 2. Import Libraries

*   Imports all the necessary modules from the installed libraries for use in the script.
*   `os`: For interacting with the operating system (e.g., setting environment variables, path operations).
*   `gradio` as `gr`: For building the user interface.
*   `numpy` as `np`: For numerical calculations.
*   `librosa`, `librosa.display`: For audio loading and spectrogram visualization.
*   `matplotlib.pyplot` as `plt`: For finalizing and customizing plots.
*   `noisereduce` as `nr`: For the noise reduction function.
*   `soundfile` as `sf`: For writing audio files.
*   `google.genai` as `genai`, `google.genai.types`: For interacting with the Gemini API and its specific types.
*   `traceback`: For getting detailed error information in exception handlers.
*   `time`: For generating unique timestamps for filenames.

In [3]:
import os
import gradio as gr
import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
import noisereduce as nr
import soundfile as sf
from google import genai
from google.genai import types
import traceback # Import traceback for better error details
import time

## 3. Initialize Gemini Client

*   Sets the Google API key from an environment variable. **Remember to replace `"YOUR_GEMINI_API_KEY"` with your actual key.**
*   Creates the Gemini API client instance (`genai.Client`).


In [4]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("GOOGLE_API_KEY")
print("Running on Kaggle, API key loaded from Kaggle Secrets.")
client = genai.Client(api_key=api_key)

Running on Kaggle, API key loaded from Kaggle Secrets.


## 4. Define Audio Analysis Tool Functions

These Python functions will be made available to the Gemini model. The Gemini SDK's function calling feature allows the model to decide when to execute these functions based on the user's query.

#### `get_noise_floor(audio_path: str) -> dict`

*   **Purpose:** Estimates the background noise level (noise floor) of an audio file.
*   **Input:** `audio_path` (string) - The absolute path to the audio file.
*   **Process:**
    1.  Checks if the provided `audio_path` actually exists. Returns an error dictionary if not.
    2.  Loads the audio file using `librosa.load(audio_path, sr=None)`. `sr=None` preserves the original sample rate. Librosa might use `soundfile` or fallback to `audioread` (which may need FFmpeg).
    3.  Handles cases where the audio file is empty or contains only silence.
    4.  Estimates the noise floor amplitude: It calculates the 10th percentile of the *absolute* values of the audio samples. This is a simple heuristic assuming that the quietest 10% of the signal largely represents background noise.
    5.  Converts the noise amplitude to decibels (dB) relative to the maximum possible amplitude (0 dBFS). The formula used is $$ \text{dB} = 10 \log_{10}(\text{amplitude}^2 + 10^{-10}) $$. A small value ($$10^{-10}$$) is added before the logarithm for numerical stability, preventing $$ \log_{10}(0) $$.
*   **Output:** A dictionary containing either the calculated noise floor (`{'noise_floor_db': float_value}`) or an error message (`{'noise_floor_db': None, 'error': '...'}`).
*   **Gemini Integration:** The function's docstring and type hints allow the Gemini SDK to automatically create a schema. Gemini will call this function when the user asks a question like "What is the noise floor?".

#### `reduce_noise_by_db(audio_path: str, reduction_db: int) -> dict`

*   **Purpose:** Reduces background noise in an audio file using a spectral gating algorithm provided by the `noisereduce` library.
*   **Inputs:**
    *   `audio_path` (string): The absolute path to the input audio file.
    *   `reduction_db` (integer): The desired amount of noise reduction in decibels (dB). A higher value means more aggressive reduction.
*   **Process:**
    1.  Checks if the input `audio_path` exists.
    2.  Loads the audio file using `librosa.load`.
    3.  Handles empty audio files.
    4.  Estimates the noise profile: It takes the first 0.5 seconds of the audio (`noise_clip`) as representative of the background noise. Handles cases where the audio is shorter than 0.5 seconds or if the noise clip contains non-finite values (e.g., `NaN`, `inf`).
    5.  Applies noise reduction using `noisereduce.reduce_noise`:
        *   `y`: The full audio signal array.
        *   `sr`: The audio sample rate.
        *   `y_noise`: The noise profile estimated from `noise_clip`.
        *   `prop_decrease`: Controls the aggressiveness of the noise reduction. The `noisereduce` library expects a proportion (0 to 1), so the input `reduction_db` is scaled (approximately `reduction_db / 20`) and clamped between 0 and 1.
    6.  Generates a unique output filename using a timestamp (`denoised_{timestamp}.wav`) and constructs its absolute path in the current working directory.
    7.  Saves the noise-reduced audio (`reduced`) to the new file using `soundfile.write`.
    8.  Verifies that the output file was created successfully and is not empty using `os.path.exists` and `os.path.getsize`.
*   **Output:** A dictionary containing either the absolute path to the denoised file (`{'denoised_path': '/path/to/denoised_....wav'}`) or an error message (`{'denoised_path': None, 'error': '...'}`).
*   **Gemini Integration:** Gemini calls this function when the user asks to "remove noise", "denoise", etc., inferring the `reduction_db` amount from the query.


In [5]:
# --- Audio Analysis Functions (for Gemini Function Calling) ---

def get_noise_floor(audio_path: str) -> dict:
    """
    Calculate the noise floor (in dB) of the given audio file.
    The noise floor is estimated based on the 10th percentile amplitude.

    Args:
        audio_path: Absolute path to the audio file (wav recommended).

    Returns:
        Dictionary with 'noise_floor_db' as float, or 'error' on failure.
    """
    try:
        # Check if the file exists before trying to load
        if not os.path.exists(audio_path):
            print(f"[get_noise_floor] Error: File not found at specified path: {audio_path}")
            return {"noise_floor_db": None, "error": f"File not found: {audio_path}"}

        # Load audio using librosa (sr=None preserves original sample rate)
        y, sr = librosa.load(audio_path, sr=None)

        # Handle empty or silent audio
        if y.size == 0 or np.all(y == 0):
            print(f"[get_noise_floor] Warning: Audio file is empty or silent: {audio_path}")
            return {"noise_floor_db": -np.inf} # Represent silence as negative infinity dB

        # Estimate noise floor amplitude (10th percentile of absolute signal)
        noise_amplitude = np.percentile(np.abs(y), 10)

        # Convert amplitude to dB (relative to 1.0)
        # Add epsilon (1e-10) for numerical stability to avoid log10(0)
        noise_floor_db = 10 * np.log10(noise_amplitude**2 + 1e-10)

        # print(f"[get_noise_floor] Calculated noise floor for {os.path.basename(audio_path)}: {noise_floor_db:.2f} dB") # Optional debug log
        return {"noise_floor_db": float(noise_floor_db)}
    except Exception as e:
        # Log and return error if any step fails
        print(f"[get_noise_floor] ERROR processing {os.path.basename(audio_path)}: {e}")
        return {"noise_floor_db": None, "error": str(e)}

def reduce_noise_by_db(audio_path: str, reduction_db: int) -> dict:
    """
    Reduce noise in the audio file by a specified dB amount using spectral gating.

    Args:
        audio_path: Absolute path to the input audio file (wav recommended).
        reduction_db: Amount of noise reduction desired in dB (e.g., 5, 10).

    Returns:
        Dictionary with 'denoised_path' (absolute path to output wav file), or 'error'.
    """
    try:
        # Check if input file exists
        if not os.path.exists(audio_path):
            print(f"[reduce_noise_by_db] Error: Input file not found: {audio_path}")
            return {"denoised_path": None, "error": f"Input file not found: {audio_path}"}

        # Load audio
        y, sr = librosa.load(audio_path, sr=None)

        # Handle empty audio
        if y.size == 0:
            print(f"[reduce_noise_by_db] Warning: Input audio is empty: {audio_path}")
            return {"denoised_path": None, "error": "Input audio is empty"}

        # Estimate noise profile from the beginning of the audio (e.g., first 0.5 seconds)
        noise_clip_len = min(len(y), int(sr*0.5)) # Use min to handle short audio
        if noise_clip_len == 0:
             print(f"[reduce_noise_by_db] Warning: Audio too short for noise profiling: {audio_path}")
             # If audio is extremely short, maybe just return original or error
             return {"denoised_path": audio_path, "error": "Audio too short for noise profiling"}
        noise_clip = y[:noise_clip_len]

        # Ensure noise profile contains valid numbers
        if not np.all(np.isfinite(noise_clip)):
             print(f"[reduce_noise_by_db] Error: Non-finite values detected in noise clip for {audio_path}")
             return {"denoised_path": None, "error": "Non-finite values detected in noise clip"}

        # Perform noise reduction using noisereduce library
        # `prop_decrease` controls reduction amount (0-1 scale), map from dB and clamp
        prop_decrease = min(max(reduction_db / 20.0, 0.0), 1.0) # Clamp between 0 and 1
        print(f"[reduce_noise_by_db] Applying noise reduction with prop_decrease={prop_decrease:.2f} (from {reduction_db} dB)")
        reduced_audio = nr.reduce_noise(y=y, sr=sr, y_noise=noise_clip, prop_decrease=prop_decrease)

        # Generate unique output filename and absolute path in the current directory
        timestamp = int(time.time())
        out_filename = f"denoised_{timestamp}.wav"
        out_path = os.path.join(os.getcwd(), out_filename) # Use absolute path

        # Save the denoised audio
        sf.write(out_path, reduced_audio, sr)

        # Verify file was saved successfully
        if os.path.exists(out_path) and os.path.getsize(out_path) > 0:
            print(f"[reduce_noise_by_db] Noise reduction successful. Saved to: {out_path} (Size: {os.path.getsize(out_path)} bytes)")
            return {"denoised_path": out_path} # Return the absolute path
        else:
            # Handle file saving errors
            error_msg = f"Failed to write or created empty denoised file: {out_path}"
            print(f"[reduce_noise_by_db] ERROR: {error_msg}")
            return {"denoised_path": None, "error": error_msg}
    except Exception as e:
        # Log and return error if any step fails
        print(f"[reduce_noise_by_db] ERROR processing {os.path.basename(audio_path)}: {e}")
        # traceback.print_exc() # Uncomment for full traceback during debugging
        return {"denoised_path": None, "error": str(e)}


## 5. Define Spectrogram Plotting Function

*   **`plot_spectrogram(audio_path, title)`**:
    *   Takes an audio file path and a title string.
    *   Checks if the path is valid and the file exists and is not empty.
    *   Loads the audio using `librosa.load`.
    *   Calculates the Short-Time Fourier Transform (STFT) using `librosa.stft`.
    *   Converts the STFT magnitude to decibels using `librosa.amplitude_to_db`.
    *   Uses `librosa.display.specshow` to create a spectrogram plot (frequency vs. time, color intensity represents dB level).
    *   Adds a title and color bar.
    *   Uses `plt.close(fig)` to prevent the plot from displaying directly in the notebook output (Gradio will handle displaying it).
    *   Returns the `matplotlib.figure.Figure` object for Gradio to display, or `None` if an error occurred.


In [6]:
# --- Spectrogram Plotting Function ---

def plot_spectrogram(audio_path, title="Spectrogram"):
    """
    Generates a spectrogram plot for the given audio file.

    Args:
        audio_path: Absolute path to the audio file.
        title: Title for the plot.

    Returns:
        A matplotlib Figure object containing the plot, or None on error.
    """
    try:
        # Validate input path and file existence/size
        if not audio_path or not isinstance(audio_path, str) or not os.path.exists(audio_path):
             print(f"[plot_spectrogram] Skipped: File not found or path invalid: {audio_path}")
             return None
        # Check file size to avoid errors with empty files
        if os.path.getsize(audio_path) == 0:
            print(f"[plot_spectrogram] Skipped: Audio file is empty: {audio_path}")
            return None

        # print(f"[plot_spectrogram] Plotting: {audio_path}") # Optional debug log
        # Load audio
        y, sr = librosa.load(audio_path, sr=None)

        # Handle empty loaded audio data (should be caught by size check, but belt-and-suspenders)
        if y.size == 0:
            print(f"[plot_spectrogram] Skipped: Loaded audio data is empty after load: {audio_path}")
            return None

        # Compute STFT and convert to dB scale (log magnitude)
        D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)

        # Create the plot using matplotlib
        fig, ax = plt.subplots(figsize=(8, 3)) # Adjust figsize as needed
        # Display spectrogram with log frequency axis
        img = librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log', ax=ax)
        ax.set(title=title) # Set plot title
        fig.colorbar(img, ax=ax, format="%+2.0f dB") # Add color bar showing dB scale
        plt.tight_layout() # Adjust layout

        # IMPORTANT: Close the plot figure object to prevent duplicate display
        plt.close(fig)

        # print(f"[plot_spectrogram] Success for: {audio_path}") # Optional debug log
        return fig # Return the figure object for Gradio
    except Exception as e:
        # Log errors during plotting
        print(f"[plot_spectrogram] ERROR plotting spectrogram for {os.path.basename(audio_path)}: {e}")
        # traceback.print_exc() # Uncomment for detailed traceback during debugging
        return None # Return None if plotting fails


## 6. Define Main Agent Logic

*   **`agent(audio, user_query)`**: This is the core function that Gradio calls when the user clicks "Submit".
    *   **Inputs:**
        *   `audio`: The audio data uploaded by the user (as a tuple `(sample_rate, numpy_array)` because `type="numpy"` is used in `gr.Audio`).
        *   `user_query`: The text entered by the user.
    *   **Process:**
        1.  Performs initial checks: verifies the Gemini client is initialized and the audio input is valid.
        2.  Generates a unique input filename and saves the uploaded audio data to an absolute path using `soundfile.write`. Checks if saving was successful.
        3.  Defines the list of available Python tool functions (`tools = [get_noise_floor, reduce_noise_by_db]`).
        4.  Defines the `system_instruction` to guide the Gemini model on how to behave and use the tools.
        5.  Constructs the prompt for Gemini, explicitly including the absolute path to the saved input audio file.
        6.  Prepares the `GenerateContentConfig` object, passing the `tools` list and the `system_instruction`.
        7.  Calls the Gemini API using `client.models.generate_content`, providing the model name, the user prompt (`contents`), and the `config`. This triggers the automatic function calling process if needed.
        8.  Initializes variables for the output text, denoised audio path, and spectrogram plots. Plots the original spectrogram immediately using the absolute input path.
        9.  **Parses Function Call Results from History:** Iterates through the `response.automatic_function_calling_history`. This history contains the sequence of model turns and tool executions. It looks specifically for parts with a `function_response` added back by the SDK (usually under the `user` role in the history).
        10. If a `function_response` is found, it extracts the nested `result` dictionary (which contains the actual dictionary returned by your Python function, e.g., `{'denoised_path': '...'}`).
        11. Based on the function name (`func_name`), it updates the `output_text` by appending the function result, assigns the `denoised_audio_path` (using the absolute path returned by the function), and calls `plot_spectrogram` for the denoised file if applicable. Includes checks for valid results (e.g., path exists).
        12. Retrieves the final text generated by the model (after function calls) from `response.candidates`.
        13. Includes robust error handling using a `try...except` block around the entire process.
    *   **Outputs:** Returns a tuple containing the values needed to update the Gradio output components: `(output_text, orig_spec_plot, denoised_spec_plot, denoised_audio_path)`.


In [7]:
# --- Gradio + Gemini Conversational Agent Logic ---

def agent(audio, user_query):
    """
    Handles user interaction via Gradio: saves audio, calls Gemini with tools,
    parses results, generates plots, and returns outputs for the UI.
    """
    # --- 1. Input Validation & Setup ---
    if client is None:
        # Check if Gemini client failed to initialize
        return "ERROR: Gemini client not initialized. Please check API Key and restart.", None, None, None
    if audio is None or not isinstance(audio, tuple) or len(audio) != 2:
        return "ERROR: Invalid audio input. Please upload a valid audio file.", None, None, None

    sample_rate, audio_data = audio
    if audio_data is None or audio_data.size == 0:
        return "ERROR: Audio data is empty.", None, None, None

    # Generate unique absolute path for the input file
    timestamp = int(time.time())
    input_audio_filename = f"input_{timestamp}.wav"
    input_audio_path = os.path.join(os.getcwd(), input_audio_filename) # Use absolute path

    # Initialize output variables
    output_text = "Processing..."
    orig_spec_plot = None
    denoised_spec_plot = None
    denoised_audio_path = None # Store path to the final denoised audio

    try:
        # --- 2. Save Uploaded Audio ---
        print(f"[agent] Saving uploaded audio to: {input_audio_path}")
        sf.write(input_audio_path, audio_data, sample_rate)
        # Verify save operation
        if not os.path.exists(input_audio_path) or os.path.getsize(input_audio_path) == 0:
             print(f"[agent] ERROR: Failed to save uploaded audio file to {input_audio_path}")
             return "[agent] Error: Failed to save uploaded audio file.", None, None, None
        print(f"[agent] Successfully saved input file.")

        # --- 3. Prepare Gemini API Call ---
        tools = [get_noise_floor, reduce_noise_by_db]
        system_instruction_text = (
            "You are an audio analysis assistant. Always use the provided tools to answer questions about noise floor or to denoise audio. "
            "Do not attempt to answer directly—always invoke the relevant function. The audio file path is specified in the user prompt."
        )
        config = types.GenerateContentConfig(
            tools=tools,
            system_instruction=system_instruction_text
        )
        # Pass the absolute path to Gemini in the prompt
        prompt_for_gemini = f"{user_query}\n\nAudio path: '{input_audio_path}'"
        contents = [{"role": "user", "parts": [{"text": prompt_for_gemini}]}]

        # --- 4. Call Gemini API ---
        print(f"[agent] Sending request to Gemini for file: {input_audio_path}")
        response = client.models.generate_content(
            model="gemini-1.5-flash-latest", # Use a model supporting function calling
            contents=contents,
            config=config
        )
        print(f"[agent] Received response from Gemini.")

        # --- 5. Process Gemini Response ---
        # Get the final text response generated by the model (after function calls)
        final_text_response = ""
        if response.candidates and response.candidates[0].content.parts:
             final_text_response = "".join(part.text for part in response.candidates[0].content.parts if hasattr(part, 'text'))
        output_text = final_text_response if final_text_response else "Processing complete."

        # Plot original spectrogram immediately (using absolute path)
        orig_spec_plot = plot_spectrogram(input_audio_path, "Original Spectrogram")

        # Parse function call results from the automatic history
        if hasattr(response, 'automatic_function_calling_history'):
            print(f"[agent] Parsing automatic_function_calling_history...")
            history = response.automatic_function_calling_history
            # Iterate through history to find the SDK-added FunctionResponse
            for content in reversed(history): # Look from end
                 if content.role == 'user' and content.parts: # SDK adds result as 'user' role
                     for part in content.parts:
                         if hasattr(part, "function_response") and part.function_response is not None:
                            func_name = part.function_response.name
                            func_response_data = part.function_response.response # Outer dict: {'result': {...}}
                            print(f"[agent]   Found FunctionResponse in history for: {func_name}")

                            # Check the expected nested structure: {'result': {actual_dict}}
                            if isinstance(func_response_data, dict) and 'result' in func_response_data:
                                actual_result = func_response_data.get('result')

                                if isinstance(actual_result, dict):
                                    # --- Handle get_noise_floor result ---
                                    if func_name == "get_noise_floor":
                                        if 'noise_floor_db' in actual_result and actual_result.get('noise_floor_db') is not None and np.isfinite(actual_result.get('noise_floor_db', np.nan)):
                                            noise_db = actual_result['noise_floor_db']
                                            output_text += f"\n\n[Function Result]: Noise floor is {noise_db:.2f} dB"
                                        else:
                                             error_msg = actual_result.get('error', 'invalid value')
                                             output_text += f"\n\n[Function Error]: Could not get noise floor - {error_msg}"

                                    # --- Handle reduce_noise_by_db result ---
                                    elif func_name == "reduce_noise_by_db":
                                        if 'denoised_path' in actual_result:
                                            path_value = actual_result.get('denoised_path')
                                            # Check if path is valid (non-None string) AND file exists
                                            if path_value and isinstance(path_value, str) and os.path.exists(path_value):
                                                denoised_audio_path = path_value # Assign the absolute path
                                                print(f"[agent]     Assigned denoised path: {denoised_audio_path}")
                                                if "Noise reduction successful" not in output_text: # Append if not already said by model
                                                     output_text += f"\n\n[Function Result]: Noise reduction successful. Output path: {os.path.basename(denoised_audio_path)}"
                                                # Plot the denoised spectrogram
                                                denoised_spec_plot = plot_spectrogram(denoised_audio_path, "Denoised Spectrogram")
                                            else:
                                                 error_msg = f"Path '{path_value}' from function invalid or file missing."
                                                 print(f"[agent]     ERROR: {error_msg}")
                                                 if "Noise reduction failed" not in output_text: output_text += f"\n\n[Function Error]: Noise reduction failed - {error_msg}"
                                        else:
                                             error_msg = actual_result.get('error', "'denoised_path' key missing")
                                             print(f"[agent]     ERROR: {error_msg}")
                                             if "Noise reduction failed" not in output_text: output_text += f"\n\n[Function Error]: Noise reduction failed - {error_msg}"
                                    # If only one function call expected, can break history loop here
                                    # break
                                # else: print error if actual_result wasn't a dict
                                # else: print error if 'result' key missing or not dict
        else:
             print("[agent] No automatic_function_calling_history found or parsed.")

        # --- 6. Final Output Preparation ---
        print("-" * 20)
        print("[agent] Values FINALIZED for Gradio:")
        print(f"[agent] Output Text: {output_text[:500]}...")
        print(f"[agent] Original Plot Type: {type(orig_spec_plot)}")
        print(f"[agent] Denoised Plot Type: {type(denoised_spec_plot)}")
        print(f"[agent] Denoised Audio Path: {denoised_audio_path}")
        print("-" * 20)

        # Return values in the order expected by Gradio outputs
        return output_text, orig_spec_plot, denoised_spec_plot, denoised_audio_path

    # --- 7. Error Handling ---
    except Exception as e:
        # Catch any unexpected errors during the agent execution
        error_details = traceback.format_exc()
        print(f"[agent] FATAL ERROR in agent function: {error_details}")
        error_message = f"An unexpected error occurred: {str(e)}"
        # Return error message to the user
        return f"ERROR: {error_message}\n\n(Details logged server-side)", None, None, None
    # --- 8. Cleanup (Optional) ---
    finally:
        # Example: Delete the temporary input file after processing
        try:
            if input_audio_path and os.path.exists(input_audio_path):
                # os.remove(input_audio_path)
                # print(f"[agent] Cleaned up input file: {input_audio_path}")
                pass # Keeping files for now for inspection
        except Exception as clean_e:
            print(f"[agent] Error during cleanup: {clean_e}")
        # Avoid deleting denoised_audio_path here as Gradio needs it
        pass


## 7. Define Gradio User Interface

*   Uses `gr.Blocks` for a custom UI layout.
*   `gr.Markdown`: Displays introductory text.
*   `gr.Row`, `gr.Column`: Organizes components horizontally and vertically.
*   `gr.Audio`:
    *   Input (`audio_input`): Allows users to upload or record audio. `type="numpy"` makes the callback function (`agent`) receive the audio as a tuple `(sample_rate, numpy_array)`.
    *   Output (`denoised_audio`): Displays the processed audio file. `type="filepath"` means it expects an absolute file path string from the `agent` function. `interactive=False` prevents user input on this output component.
*   `gr.Textbox`:
    *   Input (`user_query`): For the user's text request.
    *   Output (`output_text`): Displays the text response from the agent. `interactive=False`.
*   `gr.Plot`: Displays the spectrogram images (`orig_spec`, `denoised_spec`) generated by `plot_spectrogram`.
*   `gr.Button`: The "Submit" button.
*   `.click()`: Connects the button click event to the `agent` function, mapping UI inputs (`audio_input`, `user_query`) to the function's arguments and the function's return values to the UI outputs (`output_text`, `orig_spec`, `denoised_spec`, `denoised_audio`) in the specified order.


In [8]:
# --- Gradio User Interface Definition ---

# Use gr.Blocks for more layout control
with gr.Blocks() as demo:
    # Add a title and description using Markdown
    gr.Markdown("# Conversational Audio Restoration Agent 🎤")
    gr.Markdown("Upload audio, ask questions (e.g., 'What is the noise floor?', 'Remove noise by 5 dB'), and see the results.")

    # Arrange components in rows and columns
    with gr.Row():
        # Left column for user inputs
        with gr.Column(scale=1):
            # Component for audio upload/recording
            audio_input = gr.Audio(
                label="Upload Audio (WAV recommended)",
                type="numpy" # Provides (sample_rate, data_array) to the backend function
            )
            # Textbox for the user's natural language query
            user_query = gr.Textbox(
                label="Ask the agent",
                placeholder="e.g. 'What is the noise floor?', 'Remove noise by 5 dB'"
            )
            # Button to trigger the agent function
            btn = gr.Button("Submit", variant="primary") # 'primary' makes it stand out

        # Right column for displaying outputs
        with gr.Column(scale=2):
            # Textbox to show the agent's text response
            output_text = gr.Textbox(
                label="Agent Response",
                lines=5, # Allow multiple lines for longer responses
                interactive=False # Output only
            )
            # Row specifically for the two plots side-by-side
            with gr.Row():
                 # Placeholder for the original audio spectrogram plot
                 orig_spec = gr.Plot(label="Original Spectrogram")
                 # Placeholder for the denoised audio spectrogram plot
                 denoised_spec = gr.Plot(label="Denoised Spectrogram")
            # Component to play back the denoised audio file
            denoised_audio = gr.Audio(
                label="Denoised Audio Output",
                type="filepath", # Expects a file path from the backend function
                interactive=False # Output only
            )

    # Define the action when the button is clicked
    btn.click(
        fn=agent,                           # The Python function to execute
        inputs=[audio_input, user_query],   # Components providing input to the function
        outputs=[output_text, orig_spec, denoised_spec, denoised_audio] # Components to update with function's return values
    )


## 8. Launch Gradio App

*   Checks if the `client` object was successfully initialized in Step 3.
*   If the client is ready, it calls `demo.launch(debug=True)`.
    *   `demo.launch()` starts the Gradio web server, making the UI accessible via a local URL (or a public one if `share=True` is used, though be cautious with API keys).
    *   `debug=True` provides more detailed error messages directly in the browser console and server logs if something goes wrong within Gradio or the callback function, which is very helpful during development.
*   If the client initialization failed, it prints an error message instead of launching the app.


In [None]:
# --- Launch Gradio App ---

# Check if the Gemini client was initialized successfully before launching the web UI
if __name__ == "__main__":
    # This check prevents running the server if the API key is invalid, for example
    if client:
        print("Gemini client initialized. Launching Gradio app...")
        # Start the Gradio web server interface
        # debug=True provides helpful error messages during development
        # share=True can create a temporary public link (use with caution regarding API keys/data)
        demo.launch(debug=True)
    else:
        # Inform the user if the app cannot launch due to client initialization failure
        print("ERROR: Gradio app cannot launch because Gemini client failed to initialize. Please check API key and restart.")


Gemini client initialized. Launching Gradio app...
* Running on local URL:  http://127.0.0.1:7860
It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

* Running on public URL: https://4995b3536ebf629ae2.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


[agent] Saving uploaded audio to: /kaggle/working/input_1745119294.wav
[agent] Successfully saved input file.
[agent] Sending request to Gemini for file: /kaggle/working/input_1745119294.wav
[agent] Received response from Gemini.
[agent] Parsing automatic_function_calling_history...
[agent]   Found FunctionResponse in history for: get_noise_floor
--------------------
[agent] Values FINALIZED for Gradio:
[agent] Output Text: The noise floor is -57.24 dB.


[Function Result]: Noise floor is -57.24 dB...
[agent] Original Plot Type: <class 'matplotlib.figure.Figure'>
[agent] Denoised Plot Type: <class 'NoneType'>
[agent] Denoised Audio Path: None
--------------------
[agent] Saving uploaded audio to: /kaggle/working/input_1745119320.wav
[agent] Successfully saved input file.
[agent] Sending request to Gemini for file: /kaggle/working/input_1745119320.wav
[reduce_noise_by_db] Applying noise reduction with prop_decrease=1.00 (from 20 dB)
[reduce_noise_by_db] Noise reduction successful. Saved