Add support for powerset segmentation models #198

hbredin · 2023-11-08T14:48:39Z

Addresses #186.

Note that this is a first (working) attempt that still needs some love. Hence the draft status...

As a bonus, you get the first (?) walrus operator of diart, yay!

hbredin · 2023-11-08T14:49:26Z

src/diart/models.py

-        return self.model(waveform)
+        predictions = self.model(waveform)
+
+        if (specs := self.model.specifications).powerset:


Look ma! A walrus operator!

juanmc2005 · 2023-11-08T20:10:39Z

@hbredin as you mentioned in #186, I would also prefer to have a single instantiation of Powerset that runs in the same device as SegmentationModel.

I think we have 2 options here:

We implement it as a kind of middleware or adapter. Essentially we could have a class PowersetAdapter so the PyannoteLoader can do something like return PowersetAdapter(Model.from_pretrained(model_info))
We implement it as a block. Here we could add a PowersetToMultilabel block that simply expects a powerset input and does the conversion. For this, we'd have to know from the model whether it is powerset or not, for example by adding a @property abstract method to SegmentationModel. This could simply default to False so that it isn't a concern for most users

I would prefer the first one for now because it's automatic and has minimal impact, but we may have to move to the second one if someone else (other than pyannote) releases a powerset model.

Example of (1)

class PowersetAdapter(nn.Module):
    def __init__(self, segmentation_model: nn.Module):
        self.model = segmentation_model
        self.powerset = Powerset(...)

    def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
        return self.powerset.to_multilabel(self.model(waveform), soft=False)

class PyannoteLoader:
    ...
    def __call__(self) -> nn.Module:
        model = pyannote_loader.get_model(self.model_info, self.hf_token)
        specs = getattr(model, "specifications", None)
        if specs is not None and specs.powerset:
            model = PowersetAdapter(model)
        return model

hbredin · 2023-11-08T21:51:08Z

Trying this but now diart.stream complains that AttributeError: 'PyannoteSegmentationModel' object has no attribute 'duration' even though I added the following properties to PowersetAdapter:

    @property
    def sample_rate(self) -> int:
        return self.model.hparams.sample_rate

    @property
    def duration(self) -> float:
        return self.model.specifications.duration

A bit lost here but it's late :-) Sleep will most likely help!

juanmc2005 · 2023-11-09T09:37:56Z

@hbredin that's weird, can you push the code so I can take a look?
you probably need to forward specifications from PowersetAdapter to Model:

class PowersetAdapter(nn.Module):
    def __init__(self, segmentation_model: nn.Module):
        self.model = segmentation_model
        self.powerset = Powerset(...)
    
    @property
    def specifications(self):
        return getattr(self.model, "specifications", None)

    def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
        return self.powerset.to_multilabel(self.model(waveform), soft=False)

Because PyannoteSegmentationModel will need the loaded model to have model.specifications.duration and model.specifications.sample_rate.
Again, this will disappear when I move the config to a yaml file. That way we won't need a default duration or sample rate, it will be expected in the config or CLI args

juanmc2005 · 2023-11-09T09:40:56Z

Thanks! I'll try to debug after work today or tomorrow and get back if it's not solved until then 😄

hbredin · 2023-11-09T12:37:27Z

Adding the specifications properties does not help.

juanmc2005 · 2023-11-09T12:42:16Z

@hbredin can you post the stacktrace?

hbredin · 2023-11-09T12:45:07Z

diart.stream --segmentation pyannote/segmentation-3.0 audio.wav

Traceback (most recent call last):
  File "REDACTED/bin/diart.stream", line 8, in <module>
    sys.exit(run())
  File "REDACTED/diart/src/diart/console/stream.py", line 107, in run
    pipeline = pipeline_class(config)
  File "REDACTED/diart/src/diart/blocks/diarization.py", line 97, in __init__
    msg = f"Latency should be in the range [{self._config.step}, {self._config.duration}]"
  File "REDACTED/diart/src/diart/blocks/diarization.py", line 74, in duration
    self._duration = self.segmentation.duration
  File "REDACTED/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'PyannoteSegmentationModel' object has no attribute 'duration'

juanmc2005 · 2023-11-09T13:32:54Z

@hbredin while we figure this out, you can override the duration with --duration 5. For the sample rate, which I imagine will be a similar problem, you can temporarily hard-code it in SpeakerDiarizationConfig. That should unblock you for now to try out the new model

hbredin · 2023-11-09T14:09:35Z

Ah! --duration=10 solves this first issue but it now complains about missing sample_rate :-)
And there does not seem to be a --sample-rate option :-/

hbredin · 2023-11-09T14:10:07Z

Ok. Misread your previous comment. You were already aware of it :)

src/diart/models.py

juanmc2005 · 2023-11-09T17:33:01Z

@hbredin I don't know if you branched from main but I highly recommend rebasing on top of develop now that #188 is merged

juanmc2005 · 2023-11-09T18:05:45Z

@hbredin we just broke a record here, performance on AMI using duration=10, step=0.5 and latency=5 (same as the paper except for the 10s context) gives DER=26.7. Previous best on AMI for that config was 27.3

This is without tuning rho_update and delta_new, which should squeeze a bit more performance. I would like to run the tuning myself but I fear my laptop will catch fire 😅 I'd really like to have a caching feature for that

hbredin · 2023-11-09T18:35:52Z

@hbredin we just broke a record here, performance on AMI using duration=10, step=0.5 and latency=5 (same as the paper except for the 10s context) gives DER=26.7. Previous best on AMI for that config was 27.3

This is without tuning rho_update and delta_new, which should squeeze a bit more performance. I would like to run the tuning myself but I fear my laptop will catch fire 😅 I'd really like to have a caching feature for that

Wait until I try with pyannote.premium ;-)
What's the command line I should run?

hbredin · 2023-11-09T20:16:27Z

All checks are failing but I don't think they are related to this PR.

juanmc2005 · 2023-11-09T21:19:47Z

yeah don't worry about the "Quick Runs" CI fails, it's unrelated. It needs a huggingface token to run, and it can't find it in your fork's secrets. This is actually why I want to host a pair of freely available ONNX models somewhere to run the CI, probably even quantized models.

However, please format with black so the lint passes.

You can run the following command for the AMI eval:

diart.benchmark /ami/audio/test --reference /ami/rttms/test --segmentation pyannote/segmentation-3.0 --duration 10 --latency 5 --step 0.5 --tau-active 0.507 --rho-update 0.006 --delta-new 1.057 --batch-size 32 --num-workers 3

Now you start to see why I want to put configs in a yml file 😅

sorgfresser · 2023-11-09T21:23:26Z

src/diart/models.py

-            return PretrainedSpeakerEmbedding(
-                self.model_info, use_auth_token=self.hf_token
-            )
+    def __call__(self) -> nn.Module:


I think the rebasing broke something here but the solution with the HTTPError was not very clean either. Is there anything better though?

I agree that the HTTPError wasn't the best solution but I'm not aware of anything better we can do. That said, something definitely went wrong during the rebase

juanmc2005 · 2023-11-09T21:23:49Z

@hbredin looks like something went wrong with your rebase. I'm missing changes from #188

src/diart/models.py

sorgfresser · 2023-11-09T21:39:03Z

If I am not mistaken @hbredin we should not need to tune Rho (e.g. the SpeakerThreshhold) for the Powerset model, as such it might be worth it to subclass the SpeakerDiarization pipeline with a custom hyper_parameters() function?

Edit: Rho should be tuned, see below, I confused rho and tau.

src/diart/models.py

juanmc2005 · 2023-11-09T21:43:21Z

@sorgfresser you may still want to tune rho_update, the powerset model doesn't relieve you of this parameter. It's removing the embedding model that would help you with that.

Keep in mind that rho_update can be interpreted as "what percentage of the chunk must this embedding represent in order to update the clustering centroids?"

sorgfresser · 2023-11-09T21:47:23Z

Sorry, I was referring to Tau, it's getting late...

juanmc2005 · 2023-11-09T21:56:47Z

Ok apart from the linting and the Inference import that we should remove, this is good to go from my side. I'll wait for those changes and merge

... though it has nothing to do with this PR...

hbredin · 2023-11-10T13:39:19Z

Quick pyannote.premium run without any hyperparameters tuning:

diart.benchmark /ami/audio/test --reference /ami/rttms/test \
                                --segmentation ... \ 
                                --latency ... --step 0.5 \
                                --tau-active 0.507 --rho-update 0.006 --delta-new 1.057

Segmentation	Embedding	Latency	FA	MD	SC
`pyannote/segmentation-3.0`	`pyannote/embedding`	5s	3.7	10.1	12.6
`pyannote/premium` 👀	`pyannote/embedding`	1s 👀	3.8	7.6 🎉	16.4 😠

FA = false alarm rate / MD = missed detection rate / SC = speaker confusion rate

Looks like pyannote/embedding default clustering hyper-parameters (degraded SC 😠) are not adapted to pyannote/premium segmentation (improved FA+MD 🎉).

Still needs a bit of hparams tuning but very promising!

juanmc2005 · 2023-11-10T13:48:30Z

@hbredin nice! I see you've been having fun with diart.benchmark then 😄

sorgfresser · 2023-11-10T15:51:10Z

Sidenote: this requires pyannote develop version as of now since pyannote/pyannote-audio#1516 is needed.

hbredin · 2023-11-10T16:01:09Z

Not sure when I'll release that so it would be safer to remove the use of soft=False which is anyway the default behavior in pyannote.audio 3.0.1

* feat: add support for powerset segmentation models * wip: trying this PowersetAdapter thing * fix: initialize nn.Module before setting attribute * Fix unresolved duration and sample rate * Apply suggestions from code review * fix: remove Inference import * fix: black embedding.py ... though it has nothing to do with this PR... --------- Co-authored-by: Juan Coria <juanmc2005@hotmail.com>

hbredin commented Nov 8, 2023

View reviewed changes

juanmc2005 added the feature New feature or request label Nov 8, 2023

juanmc2005 added this to the Version 0.9 milestone Nov 8, 2023

juanmc2005 marked this pull request as ready for review November 8, 2023 19:44

juanmc2005 self-requested a review November 9, 2023 16:58

juanmc2005 reviewed Nov 9, 2023

View reviewed changes

src/diart/models.py Outdated Show resolved Hide resolved

juanmc2005 changed the base branch from main to develop November 9, 2023 17:31

juanmc2005 changed the title ~~feat: add support for powerset segmentation models~~ Add support for powerset segmentation models Nov 9, 2023

hbredin and others added 4 commits November 9, 2023 20:57

feat: add support for powerset segmentation models

bde46ff

wip: trying this PowersetAdapter thing

b9e2c96

fix: initialize nn.Module before setting attribute

11181ea

Fix unresolved duration and sample rate

bd2313f

hbredin force-pushed the feat/186-powerset-segmentation-model branch from 3080cd1 to bd2313f Compare November 9, 2023 20:00

sorgfresser reviewed Nov 9, 2023

View reviewed changes

juanmc2005 reviewed Nov 9, 2023

View reviewed changes

src/diart/models.py Outdated Show resolved Hide resolved

src/diart/models.py Outdated Show resolved Hide resolved

src/diart/models.py Outdated Show resolved Hide resolved

Apply suggestions from code review

a1aabf7

juanmc2005 reviewed Nov 9, 2023

View reviewed changes

src/diart/models.py Outdated Show resolved Hide resolved

juanmc2005 requested changes Nov 9, 2023

View reviewed changes

src/diart/models.py Show resolved Hide resolved

hbredin added 2 commits November 10, 2023 10:04

fix: remove Inference import

dfd7f33

fix: black embedding.py

0dd713f

... though it has nothing to do with this PR...

juanmc2005 approved these changes Nov 10, 2023

View reviewed changes

juanmc2005 merged commit d5cbf72 into juanmc2005:develop Nov 10, 2023
1 of 3 checks passed

juanmc2005 mentioned this pull request Nov 10, 2023

Add compatibility with new pyannote segmentation model #186

Closed

juanmc2005 mentioned this pull request Nov 18, 2023

Version 0.9 #217

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for powerset segmentation models #198

Add support for powerset segmentation models #198

hbredin commented Nov 8, 2023

hbredin Nov 8, 2023

juanmc2005 commented Nov 8, 2023 •

edited

Loading

hbredin commented Nov 8, 2023 •

edited

Loading

juanmc2005 commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 9, 2023 •

edited

Loading

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 9, 2023

hbredin commented Nov 9, 2023 •

edited

Loading

juanmc2005 commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 9, 2023

hbredin commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

sorgfresser Nov 9, 2023

juanmc2005 Nov 9, 2023

juanmc2005 commented Nov 9, 2023

sorgfresser commented Nov 9, 2023 •

edited

Loading

juanmc2005 commented Nov 9, 2023

sorgfresser commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 10, 2023

juanmc2005 commented Nov 10, 2023

sorgfresser commented Nov 10, 2023

hbredin commented Nov 10, 2023

Add support for powerset segmentation models #198

Add support for powerset segmentation models #198

Conversation

hbredin commented Nov 8, 2023

hbredin Nov 8, 2023

Choose a reason for hiding this comment

juanmc2005 commented Nov 8, 2023 • edited Loading

Example of (1)

hbredin commented Nov 8, 2023 • edited Loading

juanmc2005 commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 9, 2023 • edited Loading

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 9, 2023

hbredin commented Nov 9, 2023 • edited Loading

juanmc2005 commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 9, 2023

hbredin commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

sorgfresser Nov 9, 2023

Choose a reason for hiding this comment

juanmc2005 Nov 9, 2023

Choose a reason for hiding this comment

juanmc2005 commented Nov 9, 2023

sorgfresser commented Nov 9, 2023 • edited Loading

juanmc2005 commented Nov 9, 2023

sorgfresser commented Nov 9, 2023

juanmc2005 commented Nov 9, 2023

hbredin commented Nov 10, 2023

juanmc2005 commented Nov 10, 2023

sorgfresser commented Nov 10, 2023

hbredin commented Nov 10, 2023

juanmc2005 commented Nov 8, 2023 •

edited

Loading

hbredin commented Nov 8, 2023 •

edited

Loading

hbredin commented Nov 9, 2023 •

edited

Loading

hbredin commented Nov 9, 2023 •

edited

Loading

sorgfresser commented Nov 9, 2023 •

edited

Loading