-
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for powerset segmentation models #198
Add support for powerset segmentation models #198
Conversation
src/diart/models.py
Outdated
return self.model(waveform) | ||
predictions = self.model(waveform) | ||
|
||
if (specs := self.model.specifications).powerset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look ma! A walrus operator!
@hbredin as you mentioned in #186, I would also prefer to have a single instantiation of I think we have 2 options here:
I would prefer the first one for now because it's automatic and has minimal impact, but we may have to move to the second one if someone else (other than pyannote) releases a powerset model. Example of (1)class PowersetAdapter(nn.Module):
def __init__(self, segmentation_model: nn.Module):
self.model = segmentation_model
self.powerset = Powerset(...)
def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
return self.powerset.to_multilabel(self.model(waveform), soft=False)
class PyannoteLoader:
...
def __call__(self) -> nn.Module:
model = pyannote_loader.get_model(self.model_info, self.hf_token)
specs = getattr(model, "specifications", None)
if specs is not None and specs.powerset:
model = PowersetAdapter(model)
return model |
Trying this but now @property
def sample_rate(self) -> int:
return self.model.hparams.sample_rate
@property
def duration(self) -> float:
return self.model.specifications.duration A bit lost here but it's late :-) Sleep will most likely help! |
@hbredin that's weird, can you push the code so I can take a look? class PowersetAdapter(nn.Module):
def __init__(self, segmentation_model: nn.Module):
self.model = segmentation_model
self.powerset = Powerset(...)
@property
def specifications(self):
return getattr(self.model, "specifications", None)
def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
return self.powerset.to_multilabel(self.model(waveform), soft=False) Because |
Thanks! I'll try to debug after work today or tomorrow and get back if it's not solved until then 😄 |
Adding the |
@hbredin can you post the stacktrace? |
diart.stream --segmentation pyannote/segmentation-3.0 audio.wav
|
@hbredin while we figure this out, you can override the duration with |
Ah! |
Ok. Misread your previous comment. You were already aware of it :) |
@hbredin we just broke a record here, performance on AMI using duration=10, step=0.5 and latency=5 (same as the paper except for the 10s context) gives DER=26.7. Previous best on AMI for that config was 27.3 This is without tuning |
Wait until I try with |
3080cd1
to
bd2313f
Compare
All checks are failing but I don't think they are related to this PR. |
yeah don't worry about the "Quick Runs" CI fails, it's unrelated. It needs a huggingface token to run, and it can't find it in your fork's secrets. This is actually why I want to host a pair of freely available ONNX models somewhere to run the CI, probably even quantized models. However, please format with black so the lint passes. You can run the following command for the AMI eval: diart.benchmark /ami/audio/test --reference /ami/rttms/test --segmentation pyannote/segmentation-3.0 --duration 10 --latency 5 --step 0.5 --tau-active 0.507 --rho-update 0.006 --delta-new 1.057 --batch-size 32 --num-workers 3 Now you start to see why I want to put configs in a yml file 😅 |
src/diart/models.py
Outdated
return PretrainedSpeakerEmbedding( | ||
self.model_info, use_auth_token=self.hf_token | ||
) | ||
def __call__(self) -> nn.Module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rebasing broke something here but the solution with the HTTPError
was not very clean either. Is there anything better though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the HTTPError wasn't the best solution but I'm not aware of anything better we can do. That said, something definitely went wrong during the rebase
If I am not mistaken @hbredin we should not need to tune Rho (e.g. the SpeakerThreshhold) for the Powerset model, as such it might be worth it to subclass the Edit: Rho should be tuned, see below, I confused rho and tau. |
@sorgfresser you may still want to tune Keep in mind that |
Sorry, I was referring to Tau, it's getting late... |
Ok apart from the linting and the |
... though it has nothing to do with this PR...
Quick diart.benchmark /ami/audio/test --reference /ami/rttms/test \
--segmentation ... \
--latency ... --step 0.5 \
--tau-active 0.507 --rho-update 0.006 --delta-new 1.057
Looks like Still needs a bit of hparams tuning but very promising! |
@hbredin nice! I see you've been having fun with |
Sidenote: this requires pyannote develop version as of now since pyannote/pyannote-audio#1516 is needed. |
Not sure when I'll release that so it would be safer to remove the use of |
* feat: add support for powerset segmentation models * wip: trying this PowersetAdapter thing * fix: initialize nn.Module before setting attribute * Fix unresolved duration and sample rate * Apply suggestions from code review * fix: remove Inference import * fix: black embedding.py ... though it has nothing to do with this PR... --------- Co-authored-by: Juan Coria <juanmc2005@hotmail.com>
* feat: add support for powerset segmentation models * wip: trying this PowersetAdapter thing * fix: initialize nn.Module before setting attribute * Fix unresolved duration and sample rate * Apply suggestions from code review * fix: remove Inference import * fix: black embedding.py ... though it has nothing to do with this PR... --------- Co-authored-by: Juan Coria <juanmc2005@hotmail.com>
Addresses #186.
Note that this is a first (working) attempt that still needs some love. Hence the draft status...
As a bonus, you get the first (?) walrus operator of
diart
, yay!