Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum. #2

gaushh · 2021-04-30T15:08:32Z

I am using Resemblyzer to encode the streaming input audio coming from the microphone and using links clustering to cluster the audio embedding. At low values of hyperparams, I am getting underwhelming results (new cluster not being created even with the change in speaker). When hyper params are set to high values (say (0.7, 0.7, 0.7)) I am getting the following error:

Traceback (most recent call last):
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 156, in
main()
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 130, in main
predicted_cluster = links_cluster.predict(vector)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 96, in predict
self.update_cluster(best_subcluster_cluster_id, best_subcluster_id)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 180, in update_cluster
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 0 was not found in cluster list of 0.

I have the following questions:

How to resolve the aforementioned error.
How to efficiently tune the hyperparameters. (I tried going through the paper but didn't understand much)
Is there a better way to perform this whole operation.

Here is my code:

import re
import sys
import numpy as np
import pyaudio
from six.moves import queue
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from links_clustering.links_cluster import LinksCluster
import wave
CHUNK = 16000
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 16000
RECORD_SECONDS = 20
WAVE_OUTPUT_FILENAME = "voice.wav"
p = pyaudio.PyAudio()
# Audio recording parameters
RATE = 44100
CHUNK = int(RATE/10)  # 100ms
encoder = VoiceEncoder("cpu")
links_cluster = LinksCluster(0.7, 0.7, 0.7) #LinksCluster(0.8, 0.7, 0.85)
class MicrophoneStream(object):
    """Opens a recording stream as a generator yielding the audio chunks."""
    def __init__(self, rate, chunk):
        self._rate = rate
        self._chunk = chunk
        # Create a thread-safe buffer of audio data
        self._buff = queue.Queue()
        self.closed = True
    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            format=pyaudio.paInt16,
            # The API currently only supports 1-channel (mono) audio
            # https://goo.gl/z757pE
            channels=2,
            rate=self._rate,
            input=True,
            frames_per_buffer=self._chunk,
            # Run the audio stream asynchronously to fill the buffer object.
            # This is necessary so that the input device's buffer doesn't
            # overflow while the calling thread makes network requests, etc.
            stream_callback=self._fill_buffer,
        )
        self.closed = False
        return self
    def __exit__(self, type, value, traceback):
        self._audio_stream.stop_stream()
        self._audio_stream.close()
        self.closed = True
        # Signal the generator to terminate so that the client's
        # streaming_recognize method will not block the process termination.
        self._buff.put(None)
        self._audio_interface.terminate()
    def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
        """Continuously collect data from the audio stream, into the buffer."""
        self._buff.put(in_data)
        return None, pyaudio.paContinue
    def generator(self):
        while not self.closed:
            # Use a blocking get() to ensure there's at least one chunk of
            # data, and stop iteration if the chunk is None, indicating the
            # end of the audio stream.
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]
            # Now consume whatever other data's still buffered.
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break
            yield b"".join(data)
def main():
    with MicrophoneStream(RATE, CHUNK) as stream:
        audio_generator = stream.generator()
        print(audio_generator)
        for content in audio_generator:
            write_frame('WAVE_OUTPUT_FILENAME_{}.wav'.format(i), content)
            numpy_array = np.frombuffer(content, dtype=np.int16)
            wav = preprocess_wav(numpy_array)
            _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
            for vector in cont_embeds:
                predicted_cluster = links_cluster.predict(vector)
                print(predicted_cluster)
if __name__ == "__main__":
    main()

The text was updated successfully, but these errors were encountered:

QEDan · 2021-05-02T00:16:47Z

Hi gaushh,

Thank you for opening an issue. I just pushed an update to the master branch that I hope fixes the bug that you ran into. Please try it again and let me know how it goes. If you continue to have problems, it is helpful to have a script that only depends on static data. In this case, it depends on data streaming from a microphone, which I can't reproduce exactly.

For your questions:

I think this was caused by one of two bugs that I just fixed. Please pull from the master branch and try again.
Tuning hyperparameters is difficult in general. One way is to examine your data, understand what each hyperparameter means, and make careful choices based on theory. But, another method that might work is choosing randomly and measuring the outcomes to find the best configuration. The former is generally better, but don't be ashamed to try the latter if it gets you where you need to be.
I think you are doing online clustering of speaker embeddings. This is related to a task called 'diarization', i.e. 'who spoke when'. Are you familiar with the pyannote library? It has some easy-to-use building blocks for diarization models. Some of their tools might be helpful.
https://github.com/pyannote/pyannote-audio

gaushh · 2021-05-03T17:17:25Z

Thanks for the prompt response @QEDan

The same issue still persists. (i.e. I'm still getting the same error)
Will try understanding the algo again by going through the paper
To my knowledge, pyannote doesn't provide the functionality of online speaker diarization (since, they use t-sne for clustering) which Is why I planned on using links clustering

Suma3 · 2021-05-04T08:07:07Z

Hi @QEDan
I am facing the same issue while trying to run link clustring code shared by you. Ofcourse i tried to understand code while refering paper. There are few doubts i m having -

in line 175 you have written code to raise error but according to paper(whatever i understood) this shouldn't be case i mean every cluster will be having all it's subcluster within itself.may be i m not able to understand actual logic so it will be great if you can explain
if connected_sc_idx is None:
raise ValueError(f"Connected subcluster of {sc_idx} "
f"was not found in cluster list of {cl_idx}.")
it will be great help if you can help out to resolve this issue.
Thanks !!

QEDan · 2021-05-07T04:32:36Z

I've pushed another bug fix. In this case, during update_cluster(), two subclusters could be merged with the deleted subcluster being treated as a severed subcluster in the following logic. This allowed edges to exist that didn't make sense. I hope this was the cause of the problems. If not, I find this problem difficult to replicate reliably, so it would be helpful to have a test case that doesn't depend on streaming data.

@gaushh Online speaker diarization doesn't have too many tools available, unfortunately. pyannote only helps with the offline case. One thing that you might find helpful is this thesis on a new task called Low-Latency Speaker Spotting, identifying a target speaker in the lowest possible time: https://www.researchgate.net/publication/338935292_Efficient_speaker_diarization_and_low-latency_speaker_spotting.

The lack of online clustering algorithms is exactly what motivated my to code up this algorithm, and probably why the authors developed it in the first place.

@Suma3 The exception that you mention should never get raised when the algorithm is working correctly. It implies an edge between one subcluster and another subcluster that isn't in the same cluster as the first. So, any time it is raised, it means there is a bug. In the bug I just fixed, there was a way for that to happen because of improperly deleting a merged subcluster.

QEDan · 2021-05-18T17:33:01Z

I will close this for now. Hopefully the fixes have worked for both of you. Please comment if there are still problems.

nguyenthienhy · 2021-10-08T03:51:22Z

Hi @QEDan
I am facing the same issue with streaming data through microphone today. Can you help me, I am from Vietnam. Thank you !
There is my error:
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 1 was not found in cluster list of 0.

nguyenthienhy · 2021-10-08T04:10:25Z

I found some unresonable in this block code (see image on link)

after method:
self.update_cluster(cl_idx, sc_idx1)
the visual code hint that the code :
self.clusters[cl_idx] = self.clusters[cl_idx][:sc_idx2]
+ self.clusters[cl_idx][sc_idx2 + 1:]
is blurred => thát means it will not be running.

Can you explain it ?

QEDan · 2021-10-09T05:15:00Z

Hi @nguyenthienhy ,
I think the line is blurred at the end of merge_subclusters() because the function does not make use of self.clusters after that assignment. This is usually helpful for IDEs to highlight in case you forgot to make use of a variable after assignment. But, in this case it's okay because the assigned data is used elsewhere in the class.

I just pushed an update that I hope solves the problem. It was possible for a 'ghost' subcluster to remain in the connected subclusters list after it was merged into a different subcluster. Please try it again and let me know if you are still having problems.

It is very difficult for me to reproduce problems based on streaming microphone data since I can't reproduce the input data. If you continue to have problems and are able to record some input data that triggers the issue, that would be helpful.

nguyenthienhy · 2021-10-11T08:52:27Z

Hi @nguyenthienhy , I think the line is blurred at the end of merge_subclusters() because the function does not make use of self.clusters after that assignment. This is usually helpful for IDEs to highlight in case you forgot to make use of a variable after assignment. But, in this case it's okay because the assigned data is used elsewhere in the class.

I just pushed an update that I hope solves the problem. It was possible for a 'ghost' subcluster to remain in the connected subclusters list after it was merged into a different subcluster. Please try it again and let me know if you are still having problems.

It is very difficult for me to reproduce problems based on streaming microphone data since I can't reproduce the input data. If you continue to have problems and are able to record some input data that triggers the issue, that would be helpful.

Thank you very much, seem problem is gone !!!

QEDan self-assigned this May 2, 2021

QEDan added bug Something isn't working question Further information is requested labels May 2, 2021

QEDan closed this as completed May 18, 2021

QEDan reopened this Oct 9, 2021

QEDan closed this as completed Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum. #2

Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum. #2

gaushh commented Apr 30, 2021

QEDan commented May 2, 2021

gaushh commented May 3, 2021

Suma3 commented May 4, 2021

QEDan commented May 7, 2021

QEDan commented May 18, 2021

nguyenthienhy commented Oct 8, 2021

nguyenthienhy commented Oct 8, 2021

QEDan commented Oct 9, 2021

nguyenthienhy commented Oct 11, 2021

Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum. #2

Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum. #2

Comments

gaushh commented Apr 30, 2021

QEDan commented May 2, 2021

gaushh commented May 3, 2021

Suma3 commented May 4, 2021

QEDan commented May 7, 2021

QEDan commented May 18, 2021

nguyenthienhy commented Oct 8, 2021

nguyenthienhy commented Oct 8, 2021

QEDan commented Oct 9, 2021

nguyenthienhy commented Oct 11, 2021