<a href="https://colab.research.google.com/github/meetAmarAtGithub/15_Reva_Speech_Analytics/blob/main/Session3_Gender_detection_using_speech_signals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Identifying the Gender of a Voice using Machine Learning

https://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning/

Determining a person’s gender as male or female, based upon a sample of their voice seems to initially be an easy task. Often, the human ear can easily detect the difference between a male or female voice within the first few spoken words. However, designing a computer program to do this turns out to be a bit trickier.



This article describes the design of experiments using Machine Learning to model acoustic analysis of voices and speech for determining gender. The model is constructed using 3,168 recorded samples of male and female voices, speech, and utterances. The samples are processed using acoustic analysis and then applied to an artificial intelligence/machine learning algorithm to learn gender-specific traits. 

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [2]:
%cd "/content/gdrive/My Drive/Colab Notebooks/Reva/15_Speech_Analytics"

/content/gdrive/My Drive/Colab Notebooks/Reva/15_Speech_Analytics


In [3]:
import librosa
import numpy as np
from scipy.stats import entropy

In [4]:
def get_audio_duration(file_path):
    y, sr = librosa.load(file_path)
    duration = librosa.get_duration(y=y, sr=sr)
    return duration

In [5]:
# Example usage
# Specify the path to the audio file
audio_path = "/content/gdrive/My Drive/Colab Notebooks/Reva/15_Speech_Analytics/session1_violin-origional.wav"
duration = get_audio_duration(audio_path)
print("Duration:", duration, "seconds")

Duration: 5.0 seconds


In [6]:
def get_mean_frequency(file_path):
    audio_data, sample_rate = librosa.load(file_path)
    stft = librosa.stft(audio_data)
    frequencies = librosa.fft_frequencies(sr=sample_rate, n_fft=len(stft))
    mean_frequency = frequencies.mean()
    return mean_frequency

In [7]:
# Example usage
mean_frequency = get_mean_frequency(audio_path)
print("Mean frequency:", mean_frequency, "Hz")

Mean frequency: 5507.121951219512 Hz


In [8]:
def get_audio_std(file_path):
    audio_data, _ = librosa.load(file_path)
    std = np.std(audio_data)
    return std

In [9]:
# Example usage
std = get_audio_std(audio_path)
print("Standard deviation:", std, "Hz")

Standard deviation: 0.085269086 Hz


In [10]:
def get_audio_median_frequency(file_path):
    audio_data, sample_rate = librosa.load(file_path)
    stft = librosa.stft(audio_data)
    magnitudes = librosa.magphase(stft)[0]
    median_frequency = librosa.feature.rms(S=magnitudes).mean()
    return median_frequency

In [11]:
# Example usage
median_frequency = get_audio_median_frequency(audio_path)
print("Median Frequency:",median_frequency, "Hz")

Median Frequency: 0.0442345159563767 Hz


In [12]:
def extract_first_quartile(file_path):
    audio_data, _ = librosa.load(file_path)
    first_quartile = np.percentile(audio_data, 25)
    return first_quartile

In [13]:
first_quartile = extract_first_quartile(audio_path)
print("First Quartile:", first_quartile)

First Quartile: -0.042364017106592655


In [14]:
def get_third_quartile(file_path):
    audio_data, _ = librosa.load(file_path)
    third_quartile = np.percentile(audio_data, 75)
    return third_quartile

In [15]:
third_quartile = get_third_quartile(audio_path)
print("Third Quartile:", third_quartile)

Third Quartile: 0.03800918720662594


In [16]:
def get_interquartile_range(file_path):
    audio_data, _ = librosa.load(file_path)
    iqr = np.percentile(audio_data, 75) - np.percentile(audio_data, 25)
    return iqr

In [17]:
interquartile_range = get_interquartile_range(audio_path)
print("Interquartile Range:", interquartile_range)

Interquartile Range: 0.0803732043132186


In [18]:
def compute_skewness(audio_file):
    audio_data, _ = librosa.load(audio_file)
    spectral_centroids = librosa.feature.spectral_centroid(y=audio_data)
    skewness = np.mean((spectral_centroids - np.mean(spectral_centroids))**3) / np.std(spectral_centroids)**3
    return skewness
    

In [19]:
skewness = compute_skewness(audio_path)
print("Skewness:", skewness)

Skewness: 0.18007278264779683


In [20]:
def compute_kurtosis(audio_file):
    audio_data, _ = librosa.load(audio_file)
    spectral_rolloff = librosa.feature.spectral_rolloff(y=audio_data)
    kurtosis = np.mean((spectral_rolloff - np.mean(spectral_rolloff))**4) / np.std(spectral_rolloff)**4 - 3
    return kurtosis

In [21]:
kurtosis = compute_kurtosis(audio_path)
print("Kurtosis:", kurtosis)

Kurtosis: 2.0147509033131925


In [22]:
def compute_spectral_entropy(audio_file):
    audio_data, _ = librosa.load(audio_file)
    spectrogram = np.abs(librosa.stft(audio_data))
    spectral_entropy = entropy(spectrogram, axis=0).mean()
    return spectral_entropy

In [23]:
spectral_entropy = compute_spectral_entropy(audio_path)
print("Spectral Entropy:", spectral_entropy)

Spectral Entropy: 4.7107263


In [24]:
def compute_spectral_features(audio_file):
    audio_data, _ = librosa.load(audio_file)
    spectrogram = np.abs(librosa.stft(audio_data))
    
    # Spectral flatness
    flatness = librosa.feature.spectral_flatness(S=spectrogram).mean()
    
    # Spectral centroid
    centroid_data = librosa.feature.spectral_centroid(S=spectrogram)
    centroid = centroid_data.mean()
    
    # Spectral peak frequency
    peak_freq = spectrogram.max()
    
    # Spectral bandwidth
    bandwidth = librosa.feature.spectral_bandwidth(S=spectrogram).mean()
    
    # Mean frequency
    mean_freq = np.mean(centroid_data)
    
    # Minimum frequency
    min_freq = np.min(centroid_data)
    
    # Maximum frequency
    max_freq = np.max(centroid_data)
    
    return flatness, centroid, peak_freq, bandwidth, mean_freq, min_freq, max_freq


In [25]:
flatness, centroid, peak_freq, bandwidth, mean_freq, min_freq, max_freq = compute_spectral_features(audio_path)

print("Spectral Flatness:", flatness)
print("Spectral Centroid:", centroid)
print("Spectral Peak Frequency:", peak_freq)
print("Spectral Bandwidth:", bandwidth)
print("Mean Frequency:", mean_freq)
print("Minimum Frequency:", min_freq)
print("Maximum Frequency:", max_freq)

Spectral Flatness: 0.0023664844
Spectral Centroid: 2639.106878980175
Spectral Peak Frequency: 116.83646
Spectral Bandwidth: 2084.0336184206276
Mean Frequency: 2639.106878980175
Minimum Frequency: 1617.0613328601214
Maximum Frequency: 3885.0565294740177


In [26]:
def compute_dominant_frequencies(audio_file):
    audio_data, sample_rate = librosa.load(audio_file)
    
    # Compute the spectrogram
    spectrogram = np.abs(librosa.stft(audio_data))
    
    # Compute the dominant frequencies
    frequencies = librosa.fft_frequencies(sr=sample_rate, n_fft=spectrogram.shape[0])
    magnitudes = np.abs(spectrogram)
    dominant_frequencies = frequencies[np.argmax(magnitudes, axis=0)]
    
    # Compute statistics of the dominant frequencies
    mean_frequency = np.mean(dominant_frequencies)
    min_frequency = np.min(dominant_frequencies)
    max_frequency = np.max(dominant_frequencies)
    
    return mean_frequency, min_frequency, max_frequency


In [27]:
# Example usage

mean_freq, min_freq, max_freq = compute_dominant_frequencies(audio_path)

# Print the computed values
print("Mean Dominant Frequency:", mean_freq)
print("Minimum Dominant Frequency:", min_freq)
print("Maximum Dominant Frequency:", max_freq)

Mean Dominant Frequency: 1577.0630081300812
Minimum Dominant Frequency: 193.60975609756096
Maximum Dominant Frequency: 8776.975609756097


In [28]:
def compute_modulation_index(audio_file):
    audio_data, sample_rate = librosa.load(audio_file)
    
    # Compute the spectrogram
    spectrogram = np.abs(librosa.stft(audio_data))
    
    # Compute the magnitude modulation spectrum
    magnitude_spectrum = np.abs(spectrogram)
    modulation_spectrum = np.diff(magnitude_spectrum, axis=1)
    
    # Compute the modulation index
    modulation_index = np.mean(modulation_spectrum) / np.mean(magnitude_spectrum[:, :-1])
    
    return modulation_index

In [29]:
modulation_index = compute_modulation_index(audio_path)

# Print the computed modulation index
print("Modulation Index:", modulation_index)

Modulation Index: 0.0045333235


In [None]:
#Now over to modeling the gender