<a href="https://colab.research.google.com/github/ozzmanmuhammad/Sound-Data-Stuff/blob/main/EDA_SoundDatasets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EDA of Sound Data

We'll be using Mutagen library, other libraries are also present and can be used such as:


*   Audioread
*   Scipy
*   Librosa

And can be used to analyse following type of audio files

* wav (Waveform Audio File) format
* mp3 (MPEG-1 Audio Layer 3) format
* WMA (Windows Media Audio) format



In [9]:
import os

In [2]:
# installation
!pip install mutagen

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting mutagen
  Downloading mutagen-1.45.1-py3-none-any.whl (218 kB)
[K     |████████████████████████████████| 218 kB 20.9 MB/s 
[?25hInstalling collected packages: mutagen
Successfully installed mutagen-1.45.1


## Downloading Random sound data from kaggle 

Note: It'll will not work for others.. you have to upload your own dataset or download it from kaggle

In [None]:
!pip install kaggle

In [None]:
# !rm -r ~/.kaggle
!mkdir ~/.kaggle
!mv ./kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [7]:
!kaggle datasets download -d kenjee/z-by-hp-unlocked-challenge-3-signal-processing

Downloading z-by-hp-unlocked-challenge-3-signal-processing.zip to /content
100% 602M/604M [00:05<00:00, 181MB/s]
100% 604M/604M [00:05<00:00, 110MB/s]


In [10]:
os.rename("z-by-hp-unlocked-challenge-3-signal-processing.zip","data.zip")

In [11]:
import zipfile
zip_ref = zipfile.ZipFile('/content/data.zip', "r")
zip_ref.extractall()
zip_ref.close()

## Sound Data Analysis

In [15]:
import mutagen
from mutagen.wave import WAVE
from mutagen.mp3 import MP3

In [14]:
# function to convert the information into 
# some readable format
def audio_duration(length):
    hours = length // 3600  # calculate in hours
    length %= 3600
    mins = length // 60  # calculate in minutes
    length %= 60
    seconds = length  # calculate in seconds
  
    return hours, mins, seconds  # returns the duratio

In [16]:
# testing single file..
audio = MP3("/content/Forest Recordings/recording_00.mp3")
  
# contains all the metadata about the wavpack file
audio_info = audio.info
length = int(audio_info.length)
hours, mins, seconds = audio_duration(length)
print('Total Duration: {}:{}:{}'.format(hours, mins, seconds))

Total Duration: 0:3:0


In [20]:
import glob
import pandas as pd

In [24]:
# Reading entire folder
time = {}
for file in (glob.iglob('/content/Forest Recordings'+"/*", recursive=True)):

    audio = MP3(file)
  
    # contains all the metadata about the wavpack file
    audio_info = audio.info
    length = int(audio_info.length)
    hours, mins, seconds = audio_duration(length)
    totalTime = f"{hours}:{mins}:{seconds}"
    print(f"File:{file.split('/')[-1]}, totalTime:{totalTime}")
    time = {'Filenames':file.split('/')[-1], 'Total Time':totalTime}
    df = pd.DataFrame(time, index=[0])
    df.to_csv("output.csv", mode='a', header=None)

File:recording_75.mp3, totalTime:0:3:0
File:recording_49.mp3, totalTime:0:3:0
File:recording_97.mp3, totalTime:0:3:0
File:recording_32.mp3, totalTime:0:3:0
File:recording_99.mp3, totalTime:0:3:0
File:recording_11.mp3, totalTime:0:3:0
File:recording_68.mp3, totalTime:0:3:0
File:recording_39.mp3, totalTime:0:3:0
File:recording_54.mp3, totalTime:0:3:0
File:recording_70.mp3, totalTime:0:3:0
File:recording_57.mp3, totalTime:0:3:0
File:recording_28.mp3, totalTime:0:3:0
File:recording_44.mp3, totalTime:0:3:0
File:recording_85.mp3, totalTime:0:3:0
File:recording_64.mp3, totalTime:0:3:0
File:recording_51.mp3, totalTime:0:3:0
File:recording_14.mp3, totalTime:0:3:0
File:recording_24.mp3, totalTime:0:3:0
File:recording_82.mp3, totalTime:0:3:0
File:recording_27.mp3, totalTime:0:3:0
File:recording_59.mp3, totalTime:0:3:0
File:recording_84.mp3, totalTime:0:3:0
File:recording_77.mp3, totalTime:0:3:0
File:recording_15.mp3, totalTime:0:3:0
File:recording_37.mp3, totalTime:0:3:0
File:recording_06.mp3, to