**Updated Code from 10/10/2024 to 12/04/24**

All Back End Code.
Takes 1 hour orginal Audio Files, Reduces the noise and creates a new 'reduced audio file', creates a new 1-4 second long audio file for each detected noises based on a certain threshold, put all this data into a excel sheet to visualize information better, user can then verify each detected noise by viewing it's Waveform, Spectrogram, and Audio to confirm if it's a SAW, SAW CALL, or Neither. After the user verifies each detection it updates the excel sheet.



---


# Installing & Importing necessary libraries


In [None]:
!sudo apt-get install portaudio19-dev

!pip install auditok
#library avaible on github. git it duration and what sound energy it's looking for and marks past that point

!pip install noisereduce
#it reduces the noise in the audio file. noises causes issues during analysis

!pip install matplotlib numpy scipy pandas

!pip install ipywidgets
#To add buttons in our output for user to verify detected noises

!pip install pydub
#For calculating energy for each detected noise for testing purposes

In [None]:
# Import Libraries
import glob #A way to handle directories in python
import numpy as np
from scipy.io import wavfile
from scipy.io.wavfile import write
import noisereduce as nr
import auditok
import pandas as pd
import datetime
import os
import statistics

import ipywidgets as widgets
from ipywidgets import Button, Output, IntSlider #For Classification during the verification stage
from IPython.display import display, clear_output, Audio, HTML  # Import Audio and display for playback

import matplotlib.pyplot as plt

import asyncio



---


# Mounting Google Drive and finding folders
The Google Drive has all the audio files we will need to look at
**sure you had starred the "Cat Song Meter Recordings" google drive to "My Drive"**

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


Google Drive Tree:
If you have created a shortcut for our Drive Folder then this will be the tree for it:

    \content
      \drive
        \MyDrive
          \Cat Song Meter Recordings
            \Desired Folder...
              \All orginal .wav audio files
              \output.xlsx
              \ReducedAudio
                \All reduced.wav files
              \DetectedSawCalls
                \Folder each reduced.wav file
                  \All detected .wav files for a reduced.wav
              



---


**Once you run the code below, in the output you will put the path to the folder you want to look at that contains 1 hour incremented audio files.**

In [None]:
#Directory/Folder containing orginal audio files and more
base_dir = input("Please type the whole directory of your SHORTCUT FOLDER below\nExample: /content/drive/MyDrive/Cat Song Meter Recordings/Capstone Code Review Folder \n" )

def validate_base_dir(base_dir):
  # Define the expected base path for Google Drive
  expected_drive_prefix = "/content/drive/MyDrive"

  # Check if base_dir starts with the expected Google Drive path
  if not base_dir.startswith(expected_drive_prefix):
      raise ValueError(f"Error: The base directory '{base_dir}' is not within the Google Drive path.")

  # Check if the base directory exists
  if not os.path.exists(base_dir):
      raise FileNotFoundError(f"Error: The base directory '{base_dir}' does not exist. \n Please rerun and insert the correct path way")

  print(f"\nBase directory '{base_dir}' is valid.")


def setup_directories(base_dir):
  # Validate the base directory
  validate_base_dir(base_dir)

  # Create directories for reduced audio and detected saw calls
  reduced_audio_dir = os.path.join(base_dir, 'ReducedAudio')  #  Directory/Folder containing the reduced audio files
  saw_calls_dir = os.path.join(base_dir, 'DetectedSawCalls')  # Directory/Folder containing folders that contain detected saw calls for each reduced audio file

  #Ensure directories exist in the drive
  os.makedirs(reduced_audio_dir, exist_ok=True)
  os.makedirs(saw_calls_dir, exist_ok=True)

  print(f"Directories 'ReducedAudio' and 'DetectedSawCalls' were created or already exist. \n")

  # Return both directories
  return reduced_audio_dir, saw_calls_dir

#Add's style to the output
def style_spacing():
  print("*" * 150 + "\n")

style_spacing()

# Attempt to set up directories
try:
  reduced_audio_dir, saw_calls_dir = setup_directories(base_dir)
except (ValueError, FileNotFoundError) as e:
  print(e)
else:
  #Extracts path for .wav files
  file_list = glob.glob(os.path.join(base_dir, '*.wav')) # extract all files with .wav
      # The '*' prints keep everything before the *
      # so it keeps up to '/content/drive/MyDrive/Cat Song Meter Recordings/...    when extracting a file

  #Print all the .wav files in that file_list
  print(f"All .wav files that will be looked at: {file_list}")


style_spacing()
for file in file_list:
  filename = file[-28:-4] #slicing- removing all the non-important details from the file name
  print("\nFile Name that will be Noise Reduced and Detect Noises: " + filename)

Please type the whole directory of your SHORTCUT FOLDER below
Example: /content/drive/MyDrive/Cat Song Meter Recordings/Capstone Code Review Folder 
/content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest
******************************************************************************************************************************************************


Base directory '/content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest' is valid.
Directories 'ReducedAudio' and 'DetectedSawCalls' were created or already exist. 

All .wav files that will be looked at: ['/content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest/SMM07257_20230317_163102.wav']
******************************************************************************************************************************************************


File Name that will be Noise Reduced



---


# Noise Reduction Phase:
Noise reduction is crucial for accurate sound detection, which is a core requirement in identifying the vocalizations of big cats. This code will take a 1-hour long RAW audio file and reduce the noise in it, creating a new reduces noise audio file.

Background: Weekly, the backend will process daily recordings, which consist of 16 one-hour incremented raw audio files (recorded between 4 pm and 8 am), reducing background noise and creating 16 new noise-reduced audio files.

In [None]:
def noise_reduction(file, reduced_audio_dir):
  """ Reduces background noise from audio files and saves them."""
  rate, data = wavfile.read(file)  # Read the .wav file

  # Define new file name for the reduced file as filename_reduced.wav
  filename = os.path.basename(file)[:-4] + "_reduced.wav"

  # Check if the reduced file already exists
  reduced_file_path = os.path.join(reduced_audio_dir, filename)
  if os.path.exists(reduced_file_path):
      # Prompt the user for action
      while True:
        user_input = input(f"Reduced file already exists: {reduced_file_path}\nDo you want to overwrite it? Type Y or N): ").strip().upper()
        if user_input in ["Y", "N"]:
            break
        print("\nInvalid input. Please type 'Y' for Yes or 'N' for No.")
      if user_input != 'Y':
          print(f"Keeping existing reduced file: {filename}")
          return  # Exit the function if the user chooses not to overwrite
      print("Overwritting the existing file and creating a new reduced file is in process")

  # Perform noise reduction
  reduced_noise = nr.reduce_noise(y=data, sr=rate)

  # Save the reduced audio to the output directory
  write(reduced_file_path, rate, reduced_noise)
  print(f"\nNoise reduction complete for {file}")


style_spacing()
# Iterate over all original .wav audio files and perform noise reduction(eliminating background noise) on each
for file in file_list:
  noise_reduction(file, reduced_audio_dir)

style_spacing()
# This extracts the path way for each reduced noise .wav file and prints it to user
reduced_audio_list = glob.glob(os.path.join(reduced_audio_dir, '*_reduced.wav'))
print(f"File path for the new reduced noise audio file: {reduced_audio_list}")
print("Reduced Audio Directory Contents:", reduced_audio_dir)

******************************************************************************************************************************************************



  rate, data = wavfile.read(file)  # Read the .wav file


Reduced file already exists: /content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest/ReducedAudio/SMM07257_20230317_163102_reduced.wav
Do you want to overwrite it? Type Y or N): N
Keeping existing reduced file: SMM07257_20230317_163102_reduced.wav
******************************************************************************************************************************************************

File path for the new reduced noise audio file: ['/content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest/ReducedAudio/SMM07257_20230317_163102_reduced.wav']
Reduced Audio Directory Contents: /content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest/ReducedAudio




---
# Plotting and Verifying Functions Phase
After performing the Detected Noise Extraction phase, these functions will plot all the detected noise and allow user to verify if their SAW CALLs or just SAWs. After verifiy it will update the excel with this information.
FYI, this wont output anything until you run the Detected Noise Extraction phase.



In [None]:
import datetime
# Helper function to convert seconds to HH:MM:SS.sss format
def convert_seconds_to_time_format(seconds):
    # Convert to float in case seconds is a string
    seconds = float(seconds)
    # Convert seconds to hours, minutes, seconds, and milliseconds
    time_obj = datetime.timedelta(seconds=seconds)
    total_seconds = time_obj.total_seconds()
    hours = int(total_seconds // 3600)
    minutes = int((total_seconds % 3600) // 60)
    seconds = total_seconds % 60
    return f"{hours}:{minutes:02}:{seconds:06.3f}"

In [None]:
# This function creates a waveform and spectrogram for each detected noise file and displays it.
def plot_waveform_and_spectrogram(file, start_time, end_time, reduced_file, total_visuals):

  # Convert start and end times to HH:MM:SS.sss format for plots
  start_time_str = convert_seconds_to_time_format(start_time)
  end_time_str = convert_seconds_to_time_format(end_time)

  # Read the audio file
  rate, data = wavfile.read(file)

  # Create time axis for waveform
  time = np.linspace(0, len(data) / rate, num=len(data))

  # Create figure for waveform and spectrogram
  fig, axs = plt.subplots(total_visuals, 1, figsize=(12, 8))

   # Plot spectrogram
  if total_visuals >= 1:
    axs = [axs]  # Convert axs to a list to allow indexing

    axs[0].specgram(data, Fs=rate, NFFT=512, noverlap=256, cmap='gray_r') #cmap is for the color of the spectrogram. cmap='viridis'
    axs[0].set_title(f'Spectrogram\n Reduced File: {reduced_file[-28:-4]}   Detected File: {file[-28:-4]},\n Region: {start_time_str} - {end_time_str}')
    axs[0].set_xlabel('Time (s)')
    axs[0].set_ylabel('Frequency (Hz)')

  # Plot waveform
  if total_visuals == 2:
    axs[0].specgram(data, Fs=rate, NFFT=512, noverlap=256, cmap='gray') #cmap is for the color of the spectrogram. cmap='viridis'
    axs[0].set_title(f'Spectrogram\n Reduced File: {reduced_file[-28:-4]}   Detected File: {file[-28:-4]},\n Region: {start_time} - {end_time}')
    axs[0].set_xlabel('Time (s)')
    axs[0].set_ylabel('Frequency (Hz)')

    axs[1].plot(time, data)
    axs[1].set_title(f'Waveform\n Reduced File: {reduced_file[-28:-4]}   Detected File: {file[-28:-4]},\n Region: {start_time} - {end_time}')
    axs[1].set_xlabel('Time (s)')
    axs[1].set_ylabel('Amplitude')

  # Add space between plots
  plt.subplots_adjust(hspace=1.5)  # Increase hspace to add space between the two plots
  plt.tight_layout()
  plt.show()
  print("\n")

In [None]:
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output

# Function where user can then verify each detected noise by confirming if it's a SAW, Neither, or SAW CALL
async def classify_detected_soundoldd(filename, base_dir, excel_file_path):
    global current_row_index  # Track the current row index across function calls
    global user_input_future  # Use global so it can be set from within the inner function
    user_input_future = asyncio.Future()  # Reset future for each classification

    # Load the existing data from the Excel file into a DataFrame
    df = pd.read_excel(excel_file_path)

    # Convert the DataFrame to a dataset (list of lists)
    dataset = df.values.tolist()

    # Check if the current row index is within the bounds of the dataset
    if current_row_index >= len(dataset):
        # Check if all rows have been classified
        if all(row[7] in ["SAW", "Neither", "CALLS"] for row in dataset):
            print("All detected noises have been verified. All rows have been processed and classified.")
            return  # Exit if all rows have been processed
        else:
            print("This detected noise has not been classified yet.")
            return

    # Create an output widget to capture user interactions
    output = widgets.Output()

    # Function to clear and display text input for user to enter SAW and/or CALL counts
    async def clear_and_ask_input(sound_type, index):
        with output:
            # Clear previous output
            clear_output(wait=True)
            print(f"How many {sound_type} were detected?")
            # Create a text input widget for manual input of counts
            input_box = widgets.Text(description='Enter count:', placeholder='Enter number')
            input_box.value = ''  # Clear previous input if any
            display(input_box)  # Display input box

            # Function triggered when user presses Enter
            def on_submit(change):
                # Load the existing data from the Excel file into a DataFrame
                df = pd.read_excel(excel_file_path)
                dataset = df.values.tolist()

                try:
                    count = int(input_box.value)  # Get input from the text box
                    if sound_type == "SAW":
                        dataset[index][7] = "SAW"   # Sound Type Column
                        dataset[index][8] = count   # Total SAWS Column

                    elif sound_type == "CALLS":
                        dataset[index][7] = "CALLS" # Sound Type Column
                        dataset[index][9] = count   # Total CALLS Column
                    input_box.value = f'Submitted: {count}'

                    # Update the DataFrame with the new values
                    df.at[index, 'Sound Type'] = dataset[index][7]
                    df.at[index, 'Total SAWS'] = dataset[index][8]
                    df.at[index, 'Total CALLS'] = dataset[index][9]

                    # Save the updated DataFrame back to the Excel file
                    df.to_excel(excel_file_path, index=False)
                    clear_output(wait=True)

                except ValueError:
                    clear_output(wait=True)
                    print(f"Invalid input. Please enter a valid number for {sound_type}.")
                    display(input_box)

            input_box.on_submit(on_submit)  # Triggered when user presses Enter

    # Button click actions Functions
    def on_saw_clicked(b, index):  # If user clicked the SAW button
        asyncio.ensure_future(clear_and_ask_input("SAW", index))

    def on_neither_clicked(b, index):  # If user clicked the neither button
        with output:
            # Load the existing data from the Excel file into a DataFrame
            df = pd.read_excel(excel_file_path)

            # Convert the DataFrame to a dataset (list of lists)
            dataset = df.values.tolist()
            clear_output(wait=True)
            dataset[index][7] = "Neither" # Sound Type Column
            dataset[index][8] = 0  # Total SAWS Column
            dataset[index][9] = 0  # Total CALLS Column

            # Update the DataFrame with the new values
            df.at[index, 'Sound Type'] = dataset[index][7]
            df.at[index, 'Total SAWS'] = dataset[index][8]
            df.at[index, 'Total CALLS'] = dataset[index][9]

            # Save the updated DataFrame back to the Excel file
            df.to_excel(excel_file_path, index=False)
            clear_output(wait=True)

            display("Submitted: Neither")

    def on_calls_clicked(b, index):  # If user clicked the CALL button
        asyncio.ensure_future(clear_and_ask_input("CALLS", index))

    # Save Excel function
    def save_excel(b):
        # Load the existing data from the Excel file into a DataFrame
        df = pd.read_excel(excel_file_path)
        df.to_excel(excel_file_path, index=False)
        print("Excel file saved successfully!")

    # Display buttons for the current row (region)
    button_saw = widgets.Button(description=f"SAWs {1+current_row_index}")
    button_neither = widgets.Button(description=f"Neither {1+current_row_index}")
    button_calls = widgets.Button(description=f"CALLs {1+current_row_index}")
    save_button = widgets.Button(description="Save Excel")

    # Assign button click functions with the current row index
    button_saw.on_click(lambda b, idx=current_row_index: on_saw_clicked(b, idx))
    button_neither.on_click(lambda b, idx=current_row_index: on_neither_clicked(b, idx))
    button_calls.on_click(lambda b, idx=current_row_index: on_calls_clicked(b, idx))
    save_button.on_click(save_excel)  # Bind the save function to the save button

    # Display buttons for the current row
    display(button_saw, button_neither, button_calls, save_button)

    # Display output widget
    display(output)

    # After processing the current row, increment the row index for the next function call
    current_row_index += 1


In [None]:
#Function where user can then verify each detected noise by confirming if it's a SAW, Neither, or SAW CALL
async def classify_detected_sound(filename, base_dir, excel_file_path):
    global current_row_index  # Track the current row index across function calls
    global user_input_future  # Use global so it can be set from within the inner function
    user_input_future = asyncio.Future()  # Reset future for each classification

    # Path to your output Excel file
    #excel_file_path = os.path.join(base_dir, 'output.xlsx')  # Ensure base_dir is defined

    # Load the existing data from the Excel file into a DataFrame
    df = pd.read_excel(excel_file_path)

    # Convert the DataFrame to a dataset (list of lists)
    dataset = df.values.tolist()
    '''
    # Check if the current row index is within the bounds of the dataset
    if current_row_index >= len(dataset):
        print("All rows have been processed.")
        return  # Exit if all rows have been processed
    '''
    # Check if the current row index is within the bounds of the dataset
    if current_row_index >= len(dataset):
      # Check if all rows have been classified
      if all(row[7] in ["SAW", "Neither", "CALLS"] for row in dataset):
          print("All detected noises have been verified. All rows have been processed and classified.")
          return  # Exit if all rows have been processed
      else:
          print("This detected noise has been classified yet.")
          #return  # Exit if all rows have been processed

    '''
    # Create buttons for user classification
    button_saw = widgets.Button(description="Just SAWssss")
    button_neither = widgets.Button(description="Neither")
    button_calls = widgets.Button(description="CALLs & SAWs")
    '''

    # Create an output widget to capture user interactions
    output = widgets.Output()


    # Function to clear and display text input for user to enter SAW and/or CALL counts
    async def clear_and_ask_input(sound_type, index):
        with output:

            #Creates the input box for user to add # of SAWs or CALLs to excel
            clear_output(wait=True)
            print(f"How many {sound_type} were detected?")
            # Create a text input widget for manual input of counts
            input_box = widgets.Text(description='Enter count:', placeholder='Enter number')
            #submit_button = widgets.Button(description="Submit Count")  # Define the submit button
            input_box.value = ''  # Clear previous input if any
            display(input_box)  # Display input box and submit button   display(input_box, submit_button)

            # Function triggered when user presses Enter
            def on_submit(change):
                # Load the existing data from the Excel file into a DataFrame
                df = pd.read_excel(excel_file_path)
                # Convert the DataFrame to a dataset (list of lists)
                dataset = df.values.tolist()

                try:
                    count = int(input_box.value)  # Get input from the text box
                    if sound_type == "SAW":
                        dataset[index][7] = "SAW"   # Sound Type Column
                        dataset[index][8] = count   # Total SAWS Column

                    elif sound_type == "CALLS":
                        dataset[index][7] = "CALLS" # Sound Type Column
                        dataset[index][9] = count   # Total CALLS Column
                    input_box.value = f'Submitted: {count}'


                    # Update the DataFrame with the new values
                    df.at[index, 'Sound Type'] = dataset[index][7]
                    df.at[index, 'Total SAWS'] = dataset[index][8]
                    df.at[index, 'Total CALLS'] = dataset[index][9]

                    # Save the updated DataFrame back to the Excel file
                    df.to_excel(excel_file_path, index=False)
                    clear_output(wait=True)

                    #print(f"Recorded {count} {sound_type}(s) for {filename[:-8]}, region {index}.")
                except ValueError:
                    clear_output(wait=True)
                    widgets.Text(description= 'f"Invalid input. Please enter a valid number for {sound_type}."', placeholder='Enter number')
                    #print(f"Invalid input. Please enter a valid number for {sound_type}.")
                    #display(input_box)

            input_box.on_submit(on_submit)  # Triggered when user presses Enter
            '''await asyncio.sleep(.03)  # Wait for user interaction'''

    # Button click actions Functions
    def on_saw_clicked(b, index):  # If user clicked the SAW button
        asyncio.ensure_future(clear_and_ask_input("SAW", index))

    def on_neither_clicked(b, index):  # If user clicked the neither button
        with output:
          # Load the existing data from the Excel file into a DataFrame
          df = pd.read_excel(excel_file_path)

          # Convert the DataFrame to a dataset (list of lists)
          dataset = df.values.tolist()
          clear_output(wait=True)
          dataset[index][7] = "Neither" # Sound Type Column
          dataset[index][8] = 0  # Total SAWS Column
          dataset[index][9] = 0  # Total CALLS Column

          # Update the DataFrame with the new values
          df.at[index, 'Sound Type'] = dataset[index][7]
          df.at[index, 'Total SAWS'] = dataset[index][8]
          df.at[index, 'Total CALLS'] = dataset[index][9]

          # Save the updated DataFrame back to the Excel file
          df.to_excel(excel_file_path, index=False)
          clear_output(wait=True)

          display("Submitted: Neither")

          #print(f"Recorded 'Neither' for region {index}.")
          #user_input_future.set_result(None)  # Set the future to indicate input is done

    def on_calls_clicked(b, index):  # If user clicked the CALL button
        asyncio.ensure_future(clear_and_ask_input("CALLS", index))
        #asyncio.ensure_future(clear_and_ask_input("SAW", index))

    # Display buttons for the current row (region)
    button_saw = widgets.Button(description=f"SAWs {1+current_row_index}")
    button_neither = widgets.Button(description=f"Neither {1+current_row_index}")
    button_calls = widgets.Button(description=f"CALLs {1+current_row_index}")

    # Assign button click functions with the current row index
    button_saw.on_click(lambda b, idx=current_row_index: on_saw_clicked(b, idx))
    button_neither.on_click(lambda b, idx=current_row_index: on_neither_clicked(b, idx))
    button_calls.on_click(lambda b, idx=current_row_index: on_calls_clicked(b, idx))

    # Display buttons for the current row
    display(button_saw, button_neither, button_calls)

    # Display output widget
    display(output)

    # After processing the current row, increment the row index for the next function call
    current_row_index += 1





---


# Detected Noise Extraction Phase:
Detection of Big Cat Sounds: The backend will analyze the noise-reduced audio files and extract sections where the energy exceeds a certain threshold. These snippets, representing potential Big Cat vocalizations (primarily SAW CALLS and SAWs), are saved as new files. However, It does sometimes detect other sounds similar to it and extracts them, we would need a user to double check these later.

In [None]:
# Function to load and plot detected noises from the first folder in the saw_calls_dir
async def load_and_plot_first_folder(saw_calls_dir, reduced_audio_dir, excel_file_path):
  # requesting user input to know what to display about the detected noise in the output
  while True:
      hear_audio = input("Do you want to hear detected noise to verify them? Type Y or N: ").strip().upper()
      if hear_audio in ["Y", "N"]:
          break
      print("Invalid input. Please type 'Y' for Yes or 'N' for No.")
  while True:
      try:
          total_visuals = int(input("Type 1 to view a spectrogram, type 2 to view both a spectrogram and waveform, or type 0 to see none: "))
          if total_visuals in [0, 1, 2]:
              break
          print("Invalid input. Please enter 0, 1, or 2.")
      except ValueError:
          print("Invalid input. Please enter a number: 0, 1, or 2.")
  while True:
      verify_classify_input = input("Do you want buttons to show up for each detected noise to verify them? Type Y or N: ").strip().upper()
      if verify_classify_input in ["Y", "N"]:
          break
      print("Invalid input. Please type 'Y' for Yes or 'N' for No.")

  # Get list of subfolders in saw_calls_dir
  subfolders = [f for f in os.listdir(saw_calls_dir) if os.path.isdir(os.path.join(saw_calls_dir, f))]

  if not subfolders:
      print(f"No folders found in {saw_calls_dir}. Please run detection first.")
      return

  # Select the first folder
  first_folder = subfolders[0]
  print(f"\nProcessing detected noises in the first folder: {first_folder}")

  # Path to the first folder in saw_calls_dir
  first_folder_path = os.path.join(saw_calls_dir, first_folder)

  # List all .wav files in the selected folder
  files_in_dir = [f for f in os.listdir(first_folder_path) if f.endswith('.wav')]

  if not files_in_dir:
      print(f"No detected .wav files found in {first_folder_path}.")
      return

  print(f"Loading {len(files_in_dir)} detected noises from {first_folder_path}...")
  style_spacing()


  # Path to the reduced audio file
  reduced_file_path = os.path.join(reduced_audio_dir, f"{first_folder}.wav")

  region_count= 0
  for file in files_in_dir:
      region_count = region_count + 1
      detected_file_path = os.path.join(first_folder_path, file)

      # Display the detected sound in the output
      if hear_audio == "y" or hear_audio == "Y":
        display(HTML(f"""
            <div style="text-align: center;">
                <h3>#{region_count} Detected File: {file}</h3>
                {Audio(detected_file_path)._repr_html_()}
            </div>
        """))

      # Extract start and end times from the filename (assuming region format 'region_start-end.wav')
      region_info = file.replace('region_', '').replace('.wav', '')
      start_time, end_time = region_info.split('-')

      # Plot waveform and spectrogram in the output
      if total_visuals == 1 or total_visuals == 2:
        plot_waveform_and_spectrogram(detected_file_path, start_time, end_time, reduced_file_path, total_visuals)

      #Display buttons for verifing and classifing detected sounds in the output
      if verify_classify_input == "y" or verify_classify_input == 'Y':
        #classify_sound(detected_file_path, dataset, filename, region_index,)
        await classify_detected_sound(filename, base_dir, excel_file_path)
        print("")




---


**This Function makes the output interactive allowing the user to verify if the detected noise found with the algorithm is a SAW, CALL, or Neither**





In [None]:
# Function to check if the saw_calls_dir (DetectedSawCalls Folder) contains detected audio files across subdirectories and report the count
async def check_and_prompt_for_deletion(saw_calls_dir, excel_file_path):

  print("\nChecking for old files...")
  await asyncio.sleep(.7)

  # Check if the base directory exists
  if os.path.exists(saw_calls_dir):
      # Initialize the file count
      total_file_count = 0

      # List all subdirectories inside saw_calls_dir
      subdirectories = [d for d in os.listdir(saw_calls_dir) if os.path.isdir(os.path.join(saw_calls_dir, d))]

      # Iterate over each subdirectory to count the detected files
      for subdir in subdirectories:
          subdir_path = os.path.join(saw_calls_dir, subdir)
          files_in_subdir = os.listdir(subdir_path)
          file_count = len([file for file in files_in_subdir if os.path.isfile(os.path.join(subdir_path, file))])  # Count only files
          total_file_count += file_count  # Add to the total count

      # If any detected files are found in the subdirectories
      if total_file_count > 0:
          print(f"\n{total_file_count} detected files were found across {len(subdirectories)} subdirectories in {saw_calls_dir}.")
          user_input = input("Do you want to delete them and rerun detection? (yes/no): ")

          if user_input.lower() == 'yes':
              # Delete all files in each subdirectory
              for subdir in subdirectories:
                  subdir_path = os.path.join(saw_calls_dir, subdir)
                  files_in_subdir = os.listdir(subdir_path)
                  for file in files_in_subdir:
                      file_path = os.path.join(subdir_path, file)
                      try:
                          if os.path.isfile(file_path):
                              os.remove(file_path)
                      except Exception as e:
                          print(f"Error deleting file {file}: {e}")
              print(f"All {total_file_count} old detected files deleted from {saw_calls_dir}.")

              # Check if the Excel file exists and clear it if so
              if os.path.exists(excel_file_path):
                  print("Clearing existing data in the Excel file.")
                  pd.DataFrame().to_excel(excel_file_path, index=False)  # Save an empty DataFrame to clear contents

              return True  # Proceed with the detection phase
          else:
              print("Skipping the detection phase.\n"); style_spacing()
              #load_and_plot_detected_noises(saw_calls_dir, reduced_audio_dir) # Load plots of already created detected noise files
              await load_and_plot_first_folder(saw_calls_dir, reduced_audio_dir, excel_file_path) # Load plots of already created detected noise files for the first folder
              return False  # Skip the detection phase

  return True  # If no files or subdirectories exist, proceed with the detection phase

In [None]:
from pydub import AudioSegment
from datetime import timedelta

# For each .wav file, this function extracts regions of interest (audio segments where noise exceeds a given threshold) and saves them as audio clips in a specified directory.
async def audio_pipeline(file, dataset, saw_calls_dir, excel_file_path):

  print("Starting audio_pipeline...")
  await asyncio.sleep(.7)

  # Clear the dataset at the beginning of each run
  dataset.clear()

  # Check if the user wants to delete old files before starting detection
  if not await check_and_prompt_for_deletion(saw_calls_dir, excel_file_path):
      return  # If the user doesn't want to rerun detection, exit this function

  # Extracts the filename without the directory path and file extension so from path/to/file/SMM10131_20240320_143902_reduced.wav to SMM10131_20240320_143902_reduced
  filename = os.path.basename(file)[:-4]

  # Check if the input file(reduced audio file) exists
  if not os.path.exists(file):
      print(f"Reduced File not found: {file}")
      return

  # Create a directory for each reduced file inside saw_calls_dir
  file_specific_dir = os.path.join(saw_calls_dir, filename)
  os.makedirs(file_specific_dir, exist_ok=True)  # Create the directory if it doesn't exist
  '''              '''
  # Get the current processing timestamp for debugging purposes
  processing_timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

  print(f"\nSplitting audio file where noise is detected: {file}")
  # Use auditok to split the audio into regions where the sound reaches a certain energy threshold (hears a possible SAW CALL noise)
  audio_regions = auditok.split(file,
                                min_dur=4,        # minimum duration of a valid audio event (.2 seconds)
                                max_dur=26,        # maximum duration of a detected sound region (.5 seconds)
                                max_silence=3.5,     # Maximum allowed silence within a sound region (0.3 seconds)
                                energy_threshold=63  # The energy level that qualifies as a sound to be detected (60)
                                )

  # If no audio regions were detected, print a message and return.
  if not audio_regions:
      print(f"No audio regions detected in {file}. Please, check the energy threshold or file content.\n")
      return

  # Initialize variables to track call counts in 15-second intervals
  average_n_calls = 0    # Tracks the average number of calls
  curr = 15               # Tracks the average number of calls
  curr_count = 0         # Counts how many calls are detected in the current 5-second window
  count_list = []        # Stores counts of calls detected in each 5-second window

  # Loop through the detected audio regions from 1 reduced audio file
  for i, r in enumerate(audio_regions):
    #print("Region {i}: {r.start:.3f}s -- {r.end:.3f}s".format(i=i, r=r))
    region = i

    def format_timedelta(seconds):
        """Formats a duration in seconds to 'hh:mm:ss.sss' format."""
        duration = timedelta(seconds=seconds)
        total_seconds = int(duration.total_seconds())
        hours, remainder = divmod(total_seconds, 3600)
        minutes, seconds = divmod(remainder, 60)
        milliseconds = int((duration.microseconds / 1000))  # Convert microseconds to milliseconds

        # Format with leading zeros, ensuring a consistent hh:mm:ss.sss format
        return f"{hours:02}:{minutes:02}:{seconds:02}.{milliseconds:03}"

    # Example usage in your loop:
    start = format_timedelta(r.start)
    end = format_timedelta(r.end)
    '''start = str(datetime.timedelta(seconds=r.start))
    end = str(datetime.timedelta(seconds=r.end))'''

    # Calculate the duration of the region
    dur = r.end - r.start

    # Format duration to include ' seconds' in output spreadsheet
    formatted_dur = f"{dur:.2f} seconds"  # Convert duration to string with ' seconds'

    # Skip regions shorter than 0.04 seconds (to filter out very short noise events)
    if dur < 1.40:
      print(f"Skipped, duration = {dur}")
      continue

    # Update the call count for the current 15-second interval
    if r.start < curr:
      curr_count += 1
    else:
      curr += 15   # Move to the next 15-second interval
      count_list.append(curr_count)  # Store the count for the previous interval
      curr_count = 1    # Start a new count for the new interval

    '''
    # Save detected noise region as a .wav audio file in the 'DetectedSawCalls' folder
    detected_file_path = os.path.join(saw_calls_dir, f"region_{r.start:.3f}-{r.end:.3f}.wav")
    r.save(detected_file_path)
    '''

    # Generate the region-based filename using start and end times
    region_name = f"region_{r.start:.3f}-{r.end:.3f}"

    # Save detected noise region as a .wav audio file in the file-specific folder
    detected_file_path = os.path.join(file_specific_dir, f"{region_name}.wav")
    r.save(detected_file_path)
    '''    '''

    # Storing information about the detected region in the dataset for further analysis shown in the output.xlsx
    dataset.append([
        filename[:-8], # Base Reduced filename
        region_name,   # Detected noise filename
        i + 1,           # Region index
        start[:-4],    # Start time
        end[:-4],      # End time
        formatted_dur,         # Duration of the detected sound in sec
        processing_timestamp,  # Timestamp of processing
        None,          # Placeholder for sound type (SAW or CALLS)
        "",             # Placeholder for number of SAWS detected
        ""              # Placeholder for number of CALLS detected
        ])
        # filename[:-8]: Removes the last 8 characters from the filename(typically the "_reduced" suffix)

    print(f"Detected Noises: {region_name}")

  # Once all regions have been processed, print a summary
  print(f"\nFile processed: {saw_calls_dir}")# or use filename[:-8]

  # If call counts were recorded, print the average number of calls per 5 seconds and the total number of calls
  if count_list:
    print(f"Avg/15sec: {statistics.fmean(count_list)}")
    print(f"Total number of Detected Saw Calls: {sum(count_list)+1}")
  else:
    print("No calls detected in the current 15-second intervals."); style_spacing()

  ''' Use this in the future To make each audio file have its own sheet (or "tab") in the Excel file,
  # Write data to an Excel file with each file's data in a separate sheet
  try:
      with pd.ExcelWriter(excel_file_path, engine="openpyxl", mode="a" if os.path.exists(excel_file_path) else "w") as writer:
          # Create a DataFrame and write it to a sheet with the filename as the sheet name
          df = pd.DataFrame(dataset, columns=['File', 'Detected File', 'Region', 'Start Time', 'End Time', 'Duration',
                                              'Processing Timestamp', 'Sound Type', 'Total SAWS', 'Total CALLS'])
          df.to_excel(writer, sheet_name=filename[:31], index=False)  # Limit sheet name to 31 chars if necessary
          print(f"Data for {filename} saved to sheet in Excel file at {excel_file_path}")
  except Exception as e:
      print(f"Failed to write to Excel file: {e}") '''

  # Convert the dataset to a pandas DataFrame with appropriate column names
  # This will save the DataFrame made from this function that has all the information we have collected about each detected noise to the excel sheet to view at any time.
  df = pd.DataFrame(dataset, columns=['File', 'Detected File', 'Region', 'Start Time', 'End Time', 'Duration', 'Processing Timestamp', 'Sound Type', 'Total SAWS', 'Total CALLS'])

  #excel_file_path = os.path.join(base_dir, 'output.xlsx')   # Change this path to the location where you want to save the file
  df.to_excel(excel_file_path, index=False)

  print(f"\nCreated an Excel file saving all data about each detected noise\nExcel file saved at: {excel_file_path}")


In [None]:
dataset = [] #This holds all the data about the detected noise that will be placed in a excel file

# Global variable to track the current row index
current_row_index = 0

print(f"{reduced_audio_list} \n")

#Creating the excel sheet to view at any time
excel_name = input("What do you want to name the excel file holding the data: ")
excel_file_path = os.path.join(base_dir, excel_name + '.xlsx')

# Check if the Excel file already exists; if not, create it with an empty DataFrame
if not os.path.isfile(excel_file_path):
    # Create an empty DataFrame and save it as an Excel file
    df = pd.DataFrame(columns=['File', 'Detected File', 'Region', 'Start Time', 'End Time', 'Duration', 'Processing Timestamp', 'Sound Type', 'Total SAWS', 'Total CALLS'])
    df.to_excel(excel_file_path, index=False)
    print(f"Created new Excel file at {excel_file_path}")
else:
    print(f"Excel file already exists at {excel_file_path}")

# iterating over all noise reduced files and applying the above function on each
for file in reduced_audio_list:
  style_spacing()
  await audio_pipeline(file, dataset, saw_calls_dir, excel_file_path)

['/content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest/ReducedAudio/SMM07257_20230317_163102_reduced.wav'] 

What do you want to name the excel file holding the data: CapstoneTest
Created new Excel file at /content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest/CapstoneTest.xlsx
******************************************************************************************************************************************************

Starting audio_pipeline...

Checking for old files...

25 detected files were found across 1 subdirectories in /content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest/DetectedSawCalls.
Do you want to delete them and rerun detection? (yes/no): yes
All 25 old detected files deleted from /content/drive/MyDrive/Cat Song Meter Recordings/FelidetectV3PerformanceTest/11 22: HabibulsVerificationTest/DetectedSaw

---

 # Plotting and Verifying Phase


In [None]:
# Global variable to track the current row index
current_row_index = 0

#region 1625. for 6o threshold
# iterating over all noise reduced files and applying the above function on each
for file in reduced_audio_list:
  style_spacing()
  await audio_pipeline(file, dataset, saw_calls_dir, excel_file_path)