Compute embeddings from stream & unsupervised diarization #10

shashankpr · 2019-09-23T07:20:24Z

Hi, great work and great repo really. Your code and examples helped me understand the flow very easily.
I am currently working on a speaker identification task wherein I want to detect "who spoke when" with low latency. There are two tasks that I need to overcome and I was wondering if you had already worked on them or have plans to in future. If not, then I would be glad to contribute to your repo as a PR. The tasks are as follows:

How can I use the partial embeddings to identify speaker changes if I do not have pre-defined speaker embeddings (unlike the speaker diarization example that you gave)?
Can the embeddings be computed from a streaming input? Like directly reading wav bytes from microphone and computing them?

I know that they can be done with few tweaks but I would like to know your insight on them if you had already worked or have idea about them.
Thanks!

CorentinJ · 2019-09-23T07:56:20Z

I've investigated these areas but haven't yet implemented anything for them, even though I am considering it.

You would have to cluster the partial embeddings of the audio (generated with a moderately high rate, I'd use 4). There must be papers on how to do this out there, but you could try some intuitive approaches too. For example you could try to use some clustering algorithm that will create n + 1 clusters (where n is the known number of speakers), and hope that it will assign embeddings to the right clusters and keep a bin state. You might be able to filter out embeddings of clear speech from a single person from those computed from noise/silence or multiple speakers.

You might also be able to work with similarity. E.g. if you add these lines in demo 2 after having computed the continuous embedding:

import matplotlib.pyplot as plt
plt.imshow(cont_embeds @ cont_embeds.T)
plt.show()

You will get this:

Clearly you can detect some speakers there, by looking for pattern of high similarity:

This is definitely achievable. The sounddevice module can record audio and stream in real-time to numpy arrays, so you can work with that. You can then decompose the embed_utterance function to achieve your goal. Define a maximum duration for your audio (it can be an order of magnitude higher than necessary, that's not a problem) and compute the wav slices based on that length: https://github.com/resemble-ai/Resemblyzer/blob/master/resemblyzer/voice_encoder.py#L141. From the wav slices, you will know when you will be able to grab a partial wav from the numpy array being streamed to. For this partial wav, create a unique spectrogram and forward it (with a batch size of 1), and you will have a partial embedding. Keep doing this while the audio is being recorded.

CorentinJ · 2019-09-23T07:57:08Z

This is a demo I meant to make too, but it's certainly more work than the other 5. Hope we'll get there.

shashankpr · 2019-09-23T16:11:39Z

Thanks for your detailed explanations.

I agree with you. I have been reading about spectral clustering method which has been used in couple of papers for similar diarization task. I will follow your suggestion and try it out.
When you mention a batch size of 1, it means that the partial embedding output will have a shape of (number_of_partials, embedding_size) correct?

CorentinJ · 2019-09-23T16:25:58Z

I mean that at this point in the function: https://github.com/resemble-ai/Resemblyzer/blob/master/resemblyzer/voice_encoder.py#L151, the variable mels has shape (N, 160, 40), where N is the batch size. You will probably end up with a mel of shape (160, 40) so you will have to add an extra dimension (e.g. by doing mels[None, ...]) before forward the mel.

shashankpr · 2019-09-23T17:59:22Z

Got it! Thank you very much for clearing these doubts. I will close this and will update here when I will make significant progress with unsupervised and streaming diarization.
Great work once again!

CorentinJ · 2019-09-23T18:50:27Z

Sure, it's fine if you leave it open until we figure it out.

nikitalpopov · 2020-04-08T14:31:34Z

Hi, @shashankpr
Any progress on this task?

lonniehartley · 2020-04-08T17:06:56Z

Hello, could you please refferce the task descriptions in your email as there are many. Thank you, Lonnie Hartley Get Outlook for Android<https://aka.ms/ghei36>

…

________________________________ From: Nikita Popov <notifications@github.com> Sent: Wednesday, April 8, 2020 7:31:52 AM To: resemble-ai/Resemblyzer <Resemblyzer@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: Re: [resemble-ai/Resemblyzer] Compute embeddings from stream & unsupervised diarization (#10) Hi, @shashankpr<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fshashankpr&data=02%7C01%7C%7C0b3065ffb29c4e4c52b408d7dbc9948a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637219531125965499&sdata=08CV284zKbbKQ5KUQ%2BLVobBufEFwQ9txsPaVA1uzeVU%3D&reserved=0> Any progress on this task? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fresemble-ai%2FResemblyzer%2Fissues%2F10%23issuecomment-610994042&data=02%7C01%7C%7C0b3065ffb29c4e4c52b408d7dbc9948a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637219531125965499&sdata=8XylhOBft%2BdTybW18LgGBiVB1rhRiKQ2gx0c1fQJWrU%3D&reserved=0>, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCVZK5T6HE5TYMLUZWEOT3RLSDFRANCNFSM4IZHG45A&data=02%7C01%7C%7C0b3065ffb29c4e4c52b408d7dbc9948a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637219531125975485&sdata=RaxDCOOSpaiBJBjJZJ7WMWhdslp%2BeOpjdvbgKQmgZ7g%3D&reserved=0>.

shashankpr · 2020-04-09T08:12:15Z

Hi @nikitalpopov ,
I have been doing some experiments around this but haven't really got proper time to implement something good. I am going to start working on it from this week and I will update you if I make any progress

nikitalpopov · 2020-04-09T11:07:24Z

@shashankpr
Could I help you with something?

nikitalpopov · 2020-05-04T20:26:06Z

@CorentinJ @shashankpr
I tried to make it by myself, but results are horrible (DER is not getting any better than 60%). Could you, please, check my test notebook? https://github.com/nikitalpopov/master/blob/dev/demo.ipynb

RubenPants · 2021-10-12T13:19:29Z

Writing my solution here, since I've been trying to implement a way of embedding during streaming. In my use-case, streaming happens by pushing bytes of audio segments:

import io
import numpy as np
import soundfile as sf
from resemblyzer import VoiceEncoder

encoder = VoiceEncoder()

def embed(chunk_bytes: bytes) -> np.ndarray:
    """Embed the given chunk of WAV-bytes."""
    data, _ = sf.read(
            io.BytesIO(chunk_bytes),
            samplerate=16000,
            channels=1,
            format='RAW',
            subtype='PCM_16',
            endian='FILE',
    )
    return encoder.embed_utterance(data)

An example of this code's result (after PCA) are shown below:

shashankpr closed this as completed Sep 23, 2019

CorentinJ mentioned this issue Oct 12, 2020

Clarification on embeddings training #40

Open

CorentinJ mentioned this issue Nov 22, 2020

questions about fetures supported by Resemblyzer #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute embeddings from stream & unsupervised diarization #10

Compute embeddings from stream & unsupervised diarization #10

shashankpr commented Sep 23, 2019

CorentinJ commented Sep 23, 2019

CorentinJ commented Sep 23, 2019

shashankpr commented Sep 23, 2019

CorentinJ commented Sep 23, 2019

shashankpr commented Sep 23, 2019

CorentinJ commented Sep 23, 2019

nikitalpopov commented Apr 8, 2020

lonniehartley commented Apr 8, 2020 via email

shashankpr commented Apr 9, 2020

nikitalpopov commented Apr 9, 2020

nikitalpopov commented May 4, 2020 •

edited

RubenPants commented Oct 12, 2021 •

edited

Compute embeddings from stream & unsupervised diarization #10

Compute embeddings from stream & unsupervised diarization #10

Comments

shashankpr commented Sep 23, 2019

CorentinJ commented Sep 23, 2019

CorentinJ commented Sep 23, 2019

shashankpr commented Sep 23, 2019

CorentinJ commented Sep 23, 2019

shashankpr commented Sep 23, 2019

CorentinJ commented Sep 23, 2019

nikitalpopov commented Apr 8, 2020

lonniehartley commented Apr 8, 2020 via email

shashankpr commented Apr 9, 2020

nikitalpopov commented Apr 9, 2020

nikitalpopov commented May 4, 2020 • edited

RubenPants commented Oct 12, 2021 • edited

nikitalpopov commented May 4, 2020 •

edited

RubenPants commented Oct 12, 2021 •

edited