<a href="https://colab.research.google.com/github/kaindad/masters-thesis/blob/main/audio_dataset_curation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Audio Data Preprocessing
### Upload the clean Mendeley data samples and curate them


The audio samples should be `.wav` format, mono, and 1 second long. Bitrate and bitdepth should not matter. Samples shorter than 1 second will be padded with 0s, and samples longer than 1 second will be truncated to 1 second. The exact name of each `.wav` matter, as they will be read, mixed with background noise, and saved to a separate file with an auto-generated name. Directory name does matter (it is used to determine the name of the class during neural network training).

Right-click and upload all of the samples on the top level directory . The directory structure should look like the following:

```
/
|- chicken-data-healthy-combined-clean.wav
|- chicken-data-noise-combined-clean.wav
|- chicken-data-unhealthy-combined-clean.wav
```

# Split the data

Run the WavSplitter methods as follows to split the data into 1 seconds each so that it can be uploaded to the Edge Impulse Studio:

```
# Example usage:
unhealthy_filename = "chicken-data-unhealthy-combined-clean.wav"
audio_splitter = WavFileSplitter(source_filename)
audio_splitter.split_audio_into_intervals(1, "unhealthy")
```



In [None]:
pip install pydub

Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


In [None]:
import os
from pydub import AudioSegment
import math

class WavFileSplitter():
    def __init__(self, source_filename):
        self.current_directory = os.getcwd()  # Get the current directory
        self.source_filename = source_filename
        self.source_filepath = os.path.join(self.current_directory, source_filename)

        self.audio_segment = AudioSegment.from_wav(self.source_filepath)

    def _calculate_audio_duration(self):
        return self.audio_segment.duration_seconds

    def _export_audio_slice(self, start_second, end_second, output_filename, destination_directory):
        start_time = start_second * 1000  # Convert to milliseconds
        end_time = end_second * 1000  # Convert to milliseconds
        audio_slice = self.audio_segment[start_time:end_time]
        audio_slice.export(os.path.join(destination_directory, output_filename), format="wav")

    def split_audio_into_intervals(self, seconds_per_slice, output_prefix):
        destination_directory = os.path.join(self.current_directory, output_prefix)
        if not os.path.exists(destination_directory):
            os.makedirs(destination_directory)  # Create the directory if it doesn't exist

        total_seconds = math.ceil(self._calculate_audio_duration())
        for i in range(0, total_seconds, seconds_per_slice):
            slice_filename = f"{output_prefix}_{i+1}.wav"  # Naming files like prefix_1.wav, prefix_2.wav, ...
            self._export_audio_slice(i, i+seconds_per_slice, slice_filename, destination_directory)
            print(f"Exported: {slice_filename}")
            if i == total_seconds - seconds_per_slice:
                print('All slices exported successfully')

# Example usage:
unhealthy_filename = "chicken-data-unhealthy-combined-clean.wav"
audio_splitter = WavFileSplitter(source_filename)
audio_splitter.split_audio_into_intervals(1, "unhealthy")  # This will split the WAV file into 1-second intervals with the prefix "unhealthy_slice"

# Example usage:
healthy_filename = "chicken-data-healthy-combined-clean.wav"
audio_splitter = WavFileSplitter(source_filename)
audio_splitter.split_audio_into_intervals(1, "healthy")  # This will split the WAV file into 1-second intervals with the prefix "unhealthy_slice"


# Example usage:
source_filename = "chicken-data-noise-combined-clean.wav"
audio_splitter = WavFileSplitter(source_filename)
audio_splitter.split_audio_into_intervals(1, "noise")  # This will split the WAV file into 1-second intervals with the prefix "unhealthy_slice"

In [None]:
import shutil
import os

def compress_directory(folder_name, output_file_name):
    """
    Compresses the specified directory into a zip file.

    Parameters:
    - folder_name (str): Name of the directory to be compressed.
    - output_file_name (str): Name of the output compressed file (including .zip extension).

    Returns:
    - str: Path to the compressed file.
    """
    source_directory = os.path.join(os.getcwd(), folder_name)

    # Ensure the output file name ends with .zip
    if not output_file_name.endswith('.zip'):
        output_file_name += '.zip'

    # Compress the directory
    shutil.make_archive(output_file_name[:-4], 'zip', source_directory)

    return os.path.join(os.getcwd(), output_file_name)

# Example usage:
compressed_file_path = compress_directory("unhealthy", "unhealthy_compressed.zip")
print(f"Directory compressed to: {compressed_file_path}")

