Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for powerset segmentation models #198

Merged

Conversation

hbredin
Copy link
Collaborator

@hbredin hbredin commented Nov 8, 2023

Addresses #186.

Note that this is a first (working) attempt that still needs some love. Hence the draft status...

As a bonus, you get the first (?) walrus operator of diart, yay!

return self.model(waveform)
predictions = self.model(waveform)

if (specs := self.model.specifications).powerset:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look ma! A walrus operator!

@juanmc2005 juanmc2005 added the feature New feature or request label Nov 8, 2023
@juanmc2005 juanmc2005 added this to the Version 0.9 milestone Nov 8, 2023
@juanmc2005 juanmc2005 marked this pull request as ready for review November 8, 2023 19:44
@juanmc2005
Copy link
Owner

juanmc2005 commented Nov 8, 2023

@hbredin as you mentioned in #186, I would also prefer to have a single instantiation of Powerset that runs in the same device as SegmentationModel.

I think we have 2 options here:

  1. We implement it as a kind of middleware or adapter. Essentially we could have a class PowersetAdapter so the PyannoteLoader can do something like return PowersetAdapter(Model.from_pretrained(model_info))
  2. We implement it as a block. Here we could add a PowersetToMultilabel block that simply expects a powerset input and does the conversion. For this, we'd have to know from the model whether it is powerset or not, for example by adding a @property abstract method to SegmentationModel. This could simply default to False so that it isn't a concern for most users

I would prefer the first one for now because it's automatic and has minimal impact, but we may have to move to the second one if someone else (other than pyannote) releases a powerset model.

Example of (1)

class PowersetAdapter(nn.Module):
    def __init__(self, segmentation_model: nn.Module):
        self.model = segmentation_model
        self.powerset = Powerset(...)

    def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
        return self.powerset.to_multilabel(self.model(waveform), soft=False)

class PyannoteLoader:
    ...
    def __call__(self) -> nn.Module:
        model = pyannote_loader.get_model(self.model_info, self.hf_token)
        specs = getattr(model, "specifications", None)
        if specs is not None and specs.powerset:
            model = PowersetAdapter(model)
        return model

@hbredin
Copy link
Collaborator Author

hbredin commented Nov 8, 2023

Trying this but now diart.stream complains that AttributeError: 'PyannoteSegmentationModel' object has no attribute 'duration' even though I added the following properties to PowersetAdapter:

    @property
    def sample_rate(self) -> int:
        return self.model.hparams.sample_rate

    @property
    def duration(self) -> float:
        return self.model.specifications.duration

A bit lost here but it's late :-) Sleep will most likely help!

@juanmc2005
Copy link
Owner

@hbredin that's weird, can you push the code so I can take a look?
you probably need to forward specifications from PowersetAdapter to Model:

class PowersetAdapter(nn.Module):
    def __init__(self, segmentation_model: nn.Module):
        self.model = segmentation_model
        self.powerset = Powerset(...)
    
    @property
    def specifications(self):
        return getattr(self.model, "specifications", None)

    def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
        return self.powerset.to_multilabel(self.model(waveform), soft=False)

Because PyannoteSegmentationModel will need the loaded model to have model.specifications.duration and model.specifications.sample_rate.
Again, this will disappear when I move the config to a yaml file. That way we won't need a default duration or sample rate, it will be expected in the config or CLI args

@juanmc2005
Copy link
Owner

Thanks! I'll try to debug after work today or tomorrow and get back if it's not solved until then 😄

@hbredin
Copy link
Collaborator Author

hbredin commented Nov 9, 2023

Adding the specifications properties does not help.

@juanmc2005
Copy link
Owner

@hbredin can you post the stacktrace?

@hbredin
Copy link
Collaborator Author

hbredin commented Nov 9, 2023

diart.stream --segmentation pyannote/segmentation-3.0 audio.wav
Traceback (most recent call last):
  File "REDACTED/bin/diart.stream", line 8, in <module>
    sys.exit(run())
  File "REDACTED/diart/src/diart/console/stream.py", line 107, in run
    pipeline = pipeline_class(config)
  File "REDACTED/diart/src/diart/blocks/diarization.py", line 97, in __init__
    msg = f"Latency should be in the range [{self._config.step}, {self._config.duration}]"
  File "REDACTED/diart/src/diart/blocks/diarization.py", line 74, in duration
    self._duration = self.segmentation.duration
  File "REDACTED/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'PyannoteSegmentationModel' object has no attribute 'duration'

@juanmc2005
Copy link
Owner

@hbredin while we figure this out, you can override the duration with --duration 5. For the sample rate, which I imagine will be a similar problem, you can temporarily hard-code it in SpeakerDiarizationConfig. That should unblock you for now to try out the new model

@hbredin
Copy link
Collaborator Author

hbredin commented Nov 9, 2023

Ah! --duration=10 solves this first issue but it now complains about missing sample_rate :-)
And there does not seem to be a --sample-rate option :-/

@hbredin
Copy link
Collaborator Author

hbredin commented Nov 9, 2023

Ok. Misread your previous comment. You were already aware of it :)

@juanmc2005 juanmc2005 self-requested a review November 9, 2023 16:58
src/diart/models.py Outdated Show resolved Hide resolved
@juanmc2005 juanmc2005 changed the base branch from main to develop November 9, 2023 17:31
@juanmc2005 juanmc2005 changed the title feat: add support for powerset segmentation models Add support for powerset segmentation models Nov 9, 2023
@juanmc2005
Copy link
Owner

@hbredin I don't know if you branched from main but I highly recommend rebasing on top of develop now that #188 is merged

@juanmc2005
Copy link
Owner

@hbredin we just broke a record here, performance on AMI using duration=10, step=0.5 and latency=5 (same as the paper except for the 10s context) gives DER=26.7. Previous best on AMI for that config was 27.3

This is without tuning rho_update and delta_new, which should squeeze a bit more performance. I would like to run the tuning myself but I fear my laptop will catch fire 😅 I'd really like to have a caching feature for that

@hbredin
Copy link
Collaborator Author

hbredin commented Nov 9, 2023

@hbredin we just broke a record here, performance on AMI using duration=10, step=0.5 and latency=5 (same as the paper except for the 10s context) gives DER=26.7. Previous best on AMI for that config was 27.3

This is without tuning rho_update and delta_new, which should squeeze a bit more performance. I would like to run the tuning myself but I fear my laptop will catch fire 😅 I'd really like to have a caching feature for that

Wait until I try with pyannote.premium ;-)
What's the command line I should run?

@hbredin hbredin force-pushed the feat/186-powerset-segmentation-model branch from 3080cd1 to bd2313f Compare November 9, 2023 20:00
@hbredin
Copy link
Collaborator Author

hbredin commented Nov 9, 2023

All checks are failing but I don't think they are related to this PR.

@juanmc2005
Copy link
Owner

yeah don't worry about the "Quick Runs" CI fails, it's unrelated. It needs a huggingface token to run, and it can't find it in your fork's secrets. This is actually why I want to host a pair of freely available ONNX models somewhere to run the CI, probably even quantized models.

However, please format with black so the lint passes.

You can run the following command for the AMI eval:

diart.benchmark /ami/audio/test --reference /ami/rttms/test --segmentation pyannote/segmentation-3.0 --duration 10 --latency 5 --step 0.5 --tau-active 0.507 --rho-update 0.006 --delta-new 1.057 --batch-size 32 --num-workers 3

Now you start to see why I want to put configs in a yml file 😅

return PretrainedSpeakerEmbedding(
self.model_info, use_auth_token=self.hf_token
)
def __call__(self) -> nn.Module:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rebasing broke something here but the solution with the HTTPError was not very clean either. Is there anything better though?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the HTTPError wasn't the best solution but I'm not aware of anything better we can do. That said, something definitely went wrong during the rebase

@juanmc2005
Copy link
Owner

@hbredin looks like something went wrong with your rebase. I'm missing changes from #188

src/diart/models.py Outdated Show resolved Hide resolved
src/diart/models.py Outdated Show resolved Hide resolved
src/diart/models.py Outdated Show resolved Hide resolved
src/diart/models.py Outdated Show resolved Hide resolved
@sorgfresser
Copy link
Contributor

sorgfresser commented Nov 9, 2023

If I am not mistaken @hbredin we should not need to tune Rho (e.g. the SpeakerThreshhold) for the Powerset model, as such it might be worth it to subclass the SpeakerDiarization pipeline with a custom hyper_parameters() function?

Edit: Rho should be tuned, see below, I confused rho and tau.

src/diart/models.py Show resolved Hide resolved
@juanmc2005
Copy link
Owner

@sorgfresser you may still want to tune rho_update, the powerset model doesn't relieve you of this parameter. It's removing the embedding model that would help you with that.

Keep in mind that rho_update can be interpreted as "what percentage of the chunk must this embedding represent in order to update the clustering centroids?"

@sorgfresser
Copy link
Contributor

Sorry, I was referring to Tau, it's getting late...

@juanmc2005
Copy link
Owner

Ok apart from the linting and the Inference import that we should remove, this is good to go from my side. I'll wait for those changes and merge

... though it has nothing to do with this PR...
@juanmc2005 juanmc2005 merged commit d5cbf72 into juanmc2005:develop Nov 10, 2023
1 of 3 checks passed
@hbredin
Copy link
Collaborator Author

hbredin commented Nov 10, 2023

Quick pyannote.premium run without any hyperparameters tuning:

diart.benchmark /ami/audio/test --reference /ami/rttms/test \
                                --segmentation ... \ 
                                --latency ... --step 0.5 \
                                --tau-active 0.507 --rho-update 0.006 --delta-new 1.057 
Segmentation Embedding Latency FA MD SC
pyannote/segmentation-3.0 pyannote/embedding 5s 3.7 10.1 12.6
pyannote/premium 👀 pyannote/embedding 1s 👀 3.8 7.6 🎉 16.4 😠

FA = false alarm rate / MD = missed detection rate / SC = speaker confusion rate

Looks like pyannote/embedding default clustering hyper-parameters (degraded SC 😠) are not adapted to pyannote/premium segmentation (improved FA+MD 🎉).

Still needs a bit of hparams tuning but very promising!

@juanmc2005
Copy link
Owner

@hbredin nice! I see you've been having fun with diart.benchmark then 😄

@sorgfresser
Copy link
Contributor

Sidenote: this requires pyannote develop version as of now since pyannote/pyannote-audio#1516 is needed.

@hbredin
Copy link
Collaborator Author

hbredin commented Nov 10, 2023

Not sure when I'll release that so it would be safer to remove the use of soft=False which is anyway the default behavior in pyannote.audio 3.0.1

juanmc2005 added a commit that referenced this pull request Nov 11, 2023
* feat: add support for powerset segmentation models

* wip: trying this PowersetAdapter thing

* fix: initialize nn.Module before setting attribute

* Fix unresolved duration and sample rate

* Apply suggestions from code review

* fix: remove Inference import

* fix: black embedding.py

... though it has nothing to do with this PR...

---------

Co-authored-by: Juan Coria <juanmc2005@hotmail.com>
@juanmc2005 juanmc2005 mentioned this pull request Nov 18, 2023
juanmc2005 added a commit that referenced this pull request Nov 19, 2023
* feat: add support for powerset segmentation models

* wip: trying this PowersetAdapter thing

* fix: initialize nn.Module before setting attribute

* Fix unresolved duration and sample rate

* Apply suggestions from code review

* fix: remove Inference import

* fix: black embedding.py

... though it has nothing to do with this PR...

---------

Co-authored-by: Juan Coria <juanmc2005@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants