Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Feat/vtc #694

Closed
wants to merge 13 commits into from
Closed

[WIP] Feat/vtc #694

wants to merge 13 commits into from

Conversation

hadware
Copy link
Contributor

@hadware hadware commented Jul 1, 2021

This is a working PR on the future VTC implementation inspired from @MarvinLvn 's work, and to be merged into the next release of pyannote-audio.

Note: nothing has been done yet, this is just to get things started.

@hbredin hbredin mentioned this pull request Jul 2, 2021
3 tasks
@codecov
Copy link

codecov bot commented Jul 11, 2021

Codecov Report

Merging #694 (5fe2153) into develop (fd0c42c) will increase coverage by 0.63%.
The diff coverage is 0.00%.

❗ Current head 5fe2153 differs from pull request most recent head a28bacb. Consider uploading reports for the commit a28bacb to get more accurate results

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #694      +/-   ##
===========================================
+ Coverage    37.35%   37.98%   +0.63%     
===========================================
  Files           50       50              
  Lines         3167     3046     -121     
===========================================
- Hits          1183     1157      -26     
+ Misses        1984     1889      -95     
Impacted Files Coverage Δ
pyannote/audio/pipelines/multilabel_detection.py 0.00% <0.00%> (ø)
...io/tasks/segmentation/voice_type_classification.py 0.00% <0.00%> (ø)
pyannote/audio/utils/signal.py 0.00% <0.00%> (-20.39%) ⬇️
pyannote/audio/core/inference.py 62.26% <0.00%> (ø)
pyannote/audio/pipelines/utils.py 0.00% <0.00%> (ø)
pyannote/audio/pipelines/__init__.py 0.00% <0.00%> (ø)
pyannote/audio/pipelines/clustering.py 0.00% <0.00%> (ø)
pyannote/audio/pipelines/resegmentation.py 0.00% <0.00%> (ø)
pyannote/audio/pipelines/speaker_diarization.py 0.00% <0.00%> (ø)
pyannote/audio/utils/metric.py
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd0c42c...a28bacb. Read the comment docs.

@hadware
Copy link
Contributor Author

hadware commented Sep 30, 2021

All right, It's currently working on my test dataset, I'll try to test it some more on my fake data, and then on some real data (@MarvinLvn 's and our clinical data) to see if it matches (or hopefully even beats) the former scores.

Comment on lines +114 to +227
class_name: ParamDict(
onset=Uniform(0., 1.),
offset=Uniform(0., 1.),
min_duration_on=Uniform(0., 2.),
min_duration_off=Uniform(0., 2.),
pad_onset=Uniform(-1., 1.),
pad_offset=Uniform(-1., 1.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In relation with pyannote/pyannote-pipeline#34, this is a good use case for freezing only parts of ParamDict. pad_onset and pad_offset are seldom useful and it makes sense to reduce the dimension of the hyperparameter search space by freezing them to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I wondered if pad_{off,on}set was relevant or not, since you weren't using it in the VAD pipeline. I'll freeze that once it's freezable. (What I could also do is not parameterize it though?)

@hadware
Copy link
Contributor Author

hadware commented Oct 5, 2021

I actually thought a bit about things, and It needs a bit more tweaking of the "custom" "MultilabelFScore" metric i've implemented for the optimization part: it currently doesn't support intersections and unions of classes.

@stale
Copy link

stale bot commented Dec 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 7, 2021
@hbredin
Copy link
Member

hbredin commented Dec 8, 2021

Hey @hadware, the stale bot seems unhappy with the status of this PR... Any update on your side?

@stale stale bot removed the wontfix label Dec 8, 2021
@hadware
Copy link
Contributor Author

hadware commented Dec 8, 2021

Ah! Well, I'm not planning on giving up on this, but I'm currently a bit overwhelmed by some other work. The dust should settle by mid december, but I might need some input from you in the meantime to write a proper test of the new pipeline: how do you advise that I test it now that the pyannote-audio CLI endpoint is "deprecated"?

@hbredin
Copy link
Member

hbredin commented Dec 9, 2021

Ah! Well, I'm not planning on giving up on this, but I'm currently a bit overwhelmed by some other work. The dust should settle by mid december, but I might need some input from you in the meantime to write a proper test of the new pipeline: how do you advise that I test it now that the pyannote-audio CLI endpoint is "deprecated"?

Tasks are currently tested by simply checking that training is not broken (i.e. does not raise any Exception).
This is done by running notebooks available in /notebook/ directory. You could add your task to the example.ipynb notebook for example.

As far as pipelines are concerned, they are simply not tested for now 👎

@hadware
Copy link
Contributor Author

hadware commented Jan 7, 2022

Sorry for the "supremely concise" commit history, I tried to rebase on your develop branch but really bad things happened. Git is still an unforgiving master main for me, it seems.

@hadware
Copy link
Contributor Author

hadware commented Jan 11, 2022

Small question needed for model testing: what would you advise that I use for checkpointing? Pytorch-lightning's way? I can't really figure out how you're "expecting" it to be done in Pyannote's current iteration.

@hadware
Copy link
Contributor Author

hadware commented Jan 11, 2022

Nevermind, I figured I should use the pyannote-audio-train CLI endpoint.

@hadware
Copy link
Contributor Author

hadware commented Jan 12, 2022

Sorry for the dumbi-ish question, but i've never used Hydra on my own so I'm not very familiar with it. Here's the problem:

  • from what i've understood, since i've added a VoiceTypeClassification.yaml "subconfig" to pyannote, i'm mostly good to go
  • from what i've understood, if users want to "override" some of theses values, they can just do so using the CLI
  • however, there are some parameters that are mandatory for this task and that cannot have default values and shouldn't be defined in the CLI: classes, classes unions and classes intersections
  • in my opinion, these should be defined in a user-specified YAML file, that should be given as an input when running the pipeline via the CLI, but the YAML config should be "overloading" the one already specified for the VoiceTypeClassification task

So here's my question:

  • what's the structure for such a yaml config file
  • how do you feed it to hydra's CLI
    (sorry that's actually two questions, but you get the point).

@hbredin
Copy link
Member

hbredin commented Jan 12, 2022

Hmmm. I am not sure why you need to use the CLI for testing the model.
This should be enough:

task = VoiceTypeClassification(...)
model = PyanNet(task=task)
trainer = Trainer(max_epochs=1)
trainer.fit(model)

Also, what do you mean by "model testing"?

@hadware
Copy link
Contributor Author

hadware commented Jan 13, 2022

Sorry, by testing i meant "using the new VTC task + pipeline on our data to see if it reproduces our well-known results, or even beats them". Thus this means training, validation and testing. I mostly need help on the training part (which I intend to do using the CLI, c.f. my previous post).

@hbredin
Copy link
Member

hbredin commented Jan 13, 2022

Don't bother using the CLI -- try something like this:

from pyannote.database import get_protocol
dataset = get_protocol('YourDataset.SpeakerDiarization.YourProtocol) 

from pyannote.audio.tasks import VoiceActivityDetection
vad = VoiceActivityDetection(dataset)

from pyannote.audio.models.segmentation import PyanNet
model = PyanNet(task=vad, sincnet={"stride": 10})

from pytorch_lightning.callbacks import EarlyStopping
from pytorch_lightning.callbacks.model_checkpoint import ModelCheckpoint
from pytorch_lightning.loggers import TensorBoardLogger

value_to_monitor, min_or_max = vad.val_monitor

model_checkpoint = ModelCheckpoint(
    monitor=value_to_monitor, 
    mode=min_or_max, 
    save_top_k=5, 
    every_n_epochs=1, 
    save_last=True, 
    dirpath=".", 
    filename=f"{{epoch}}-{{{value_to_monitor}:.6f}}",
    verbose=True)

early_stopping = EarlyStopping(
    monitor=value_to_monitor,
    mode=min_or_max,
    min_delta=0.0,
    patience=10.,
    strict=True,
    verbose=False) 

logger = TensorBoardLogger(".", name="", version="", log_graph=False)

from pytorch_lightning import Trainer
trainer = Trainer(gpus=1, callbacks=[model_checkpoint, early_stopping], logger=logger)
trainer.fit(model)

@hadware
Copy link
Contributor Author

hadware commented Jan 29, 2022

Thanks a lot, this was very helpful!

I have some good news: the whole pipeline seems to be working and giving out nice results. Here's a small table that i'm going to update as we run more tests and might fix some things:

Pyan. Vers. Model Dataset Data Augment. Best Epoch Tuning Iterations Tuning Metric Fscore IER
V1 Pyannet Clinical Itws Noise (MUSAN) 100 - Fscore 86.6 19.6
V2 Pyannet Clinical Itws None 38 50 IER 86.80 20.69
V2 Pyannet Clinical Itws None 38 50 Fscore 86.49 21.47

Note: on V1, I used the cyclic rate scheduler, as per @MarvinLvn 's worthy advice. On V2, nothing in the training part of the experiment has been gridsearched or else, I basically used what you've given and didn't question anything. In my opinion, these results are promising since I haven't started to try and add data augmentation or tweak the learning-rate, etc.

@hadware hadware mentioned this pull request Mar 8, 2022
3 tasks
@stale
Copy link

stale bot commented Apr 4, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 4, 2022
@stale stale bot closed this Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants