Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying the diarization pipeline on random .wav files #114

Closed
saisumit opened this issue Jul 24, 2018 · 26 comments
Closed

Trying the diarization pipeline on random .wav files #114

saisumit opened this issue Jul 24, 2018 · 26 comments

Comments

@saisumit
Copy link

saisumit commented Jul 24, 2018

Hey, as suggested by the detailed tutorials, i went through them and trained all the models required for the pipeline. The pipeline is working on the AMI dataset but when i try to reproduce the results on other .wav files sampled at 16k, mono, and 256bps, it is not able to diarize the audio.
Here is the breif of what i actually did.

  1. Took a random meeting audio file, sampled at 16k , mono and 256bps
  2. renamed it to ES2003a and replaced it with actual ES2003a ( thought it as a turnaround of creating another database )
  3. ran all the pipelines ( sad,scd, emb, diarization )

Output :

  1. Speaker activity detection works perfectly and is able to classify regions of speech.
  2. Speaker diarization does't works, everything is classified as 0

can you please tell if its because of replacing the actual file that the pipeline is giving wrong outputs for the diarization, and whats a better way to test the pipeline on random audios.

@bml1g12
Copy link
Contributor

bml1g12 commented Jul 25, 2018

I'm not the developer but If you do this method I would think you will need to regenerate the Precomputed MFCCs for each step.

That being said, I suspect there is a more elegant way of using the pipeline for a new .wav without making a new database which would instead make use of the underlying python API; I've done this for embeddings but haven't used the pipeline yet.

@saisumit
Copy link
Author

saisumit commented Jul 25, 2018

Actually i renamed the .wav to ES2003a that is actually the first file for which mfcc's, raw sad values and scd values are generated so you can simply break the loop after that and still get the reqd files. It would be great if you can tell me how to do it with the complete pipeline. That being said am i actually doing anything wrong as the result of diarization are simply useless ( in comparison to LIUM / AALTO diarization libraries ) which have much more robust results

@hedonistrh
Copy link

Hi, I have some problem about speaker change detection. It always says there is a no change for whole file. #111

I suggest that you should firstly check your training for speaker change detection. Maybe, you have same kind of problem like me.

@saisumit
Copy link
Author

As i said it works perfectly on AMI dataset so i don't think that the problem is there.
These are the two files i tried this on :
https://drive.google.com/open?id=15Stt_JjWT7rzypHP5v9FfFTU1NmMHcew
https://drive.google.com/open?id=1IP8v5_VMiQQnk426R6p-fZajyhER04Hj

@hedonistrh
Copy link

hedonistrh commented Jul 25, 2018

Hi,
It is quite interesting. How you can check it works perfectly or not? Can you share result of pyannote metrics for test files like this script.


# Loop on Test Files
from pyannote.database import get_annotated
for test_file in protocol.test():
    # print (test_file)
    # load reference annotation
    reference = test_file['annotation']
    uem = get_annotated(test_file)

    # load precomputed change scores as pyannote.core.SlidingWindowFeature
    scd_scores = precomputed(test_file)

    # binarize scores to obtain speech regions as pyannote.core.Timeline
    hypothesis = peak.apply(scd_scores, dimension=1)

    # evaluate speech activity detection
    metric(reference, hypothesis.to_annotation(), uem=uem)
    purity, coverage, fmeasure = metric.compute_metrics()
    print(f'Purity = {100*purity:.1f}% / Coverage = {100*coverage:.1f}%')

purity, coverage, fmeasure = metric.compute_metrics()
print(f'Purity = {100*purity:.1f}% / Coverage = {100*coverage:.1f}%')

I ask this because you said this.

Speaker diarization does't works, everything is classified as 0

If you got %100 coverage result from the script, it can be reason of your problem.

@saisumit
Copy link
Author

saisumit commented Jul 27, 2018

Hey, i tried out you suggetion, seems like i am getting 1% coverage. Any idea what can be done ?

from pyannote.database import get_protocol
protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset')
from pyannote.audio.features import Precomputed
precomputed = Precomputed('/media/DataDriveA/Datasets/sumit/scd')
from pyannote.audio.signal import Peak
peak = Peak(alpha=0.5, min_duration=1.0, log_scale=True)
from pyannote.metrics.diarization import DiarizationPurityCoverageFMeasure
metric = DiarizationPurityCoverageFMeasure()
from pyannote.database import get_annotated
for test_file in protocol.test():
... reference = test_file['annotation']
... uem = get_annotated(test_file)
... scd_scores = precomputed(test_file)
... hypothesis = peak.apply(scd_scores, dimension=1)
... metric(reference, hypothesis.to_annotation(), uem=uem)
...
0.019149014550107722
0.014072230507041606
0.03635586201714968
0.018868535259038234
0.016567144528221323
0.018798456401816547
0.03245998534873909
0.01930586206686937
0.018597586570416588
0.013406219189916744
0.04511515526311587
0.02177338540954314
0.025974370018109923
0.022023084858175945
0.02715933469271844
0.0197217875965152
0.018951645548169263
0.014496894798537002
0.023111019879448098
0.014170020247782397
0.013125944790148964
0.013026454562392011
purity, coverage, fmeasure = metric.compute_metrics()
print(f'Purity = {100purity:.1f}% / Coverage = {100coverage:.1f}%')
Purity = 78.4% / Coverage = 1.0%

@hedonistrh
Copy link

Thanks for the reply.

May you look your hypothesis.to_annotation() result. It will give some idea to us.

It is quite interesting. We are trying somewhat same thing, however, we get different results. I think, the only difference comes from here. I use weights from 5. epoch.

!pyannote-change-detection apply tutorials/change-detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0005.pt AMI.SpeakerDiarization.MixHeadset raw_scores

@yinruiqing
Copy link
Contributor

yinruiqing commented Jul 27, 2018

In fact, if you choose difference thresholds in Peak, you'll get different results. Usually, threshold is chosen by validation. If you don't want do validation. You can try different thresholds by yourself to plot the coverage and purity curve.

@hedonistrh
Copy link

hedonistrh commented Jul 27, 2018

Thanks for the interest.

When I tried this and look for the output which shows that there are always changes.

peak = Peak(alpha=0.1, min_duration=1.0, log_scale=True)

If I tried this, it gives %100 coverage.

peak = Peak(alpha=0.13, min_duration=1.0, log_scale=True)

@saisumit
Copy link
Author

saisumit commented Jul 27, 2018

hey @hbredin @yinruiqing @hedonistrh as i get it, these hyperparameters are chosen at the end by the final pipeline module that we run, this being said am i correct about this "The scd training module is not working" because even if i use the trained model at 500th epoch, it gives strange results for coverage using the default parameters provided by the author :
peak = Peak(alpha=0.5, min_duration=1.0, log_scale=True)
I am assuming that both the values of purity and coverage should lie in [ 70,90 ] range going by the results described in this paper, though its for a slightly different dataset ( https://pdfs.semanticscholar.org/edff/b62b32ffcc2b5cc846e26375cb300fac9ecc.pdf )

@hedonistrh
Copy link

Yes, it is for ETAPE dataset, however, I think, the results that we get are not appropriate according to paper.

Also, you can visualize your outputs. It can give some idea.

@bml1g12
Copy link
Contributor

bml1g12 commented Jul 30, 2018

This is validation taken every 100 iterations:
image
And the following results using the tutorial settings after the 1000th epoch and min_duration=1.0 and the first test file in AMI

image

A very large minimum duration is required to get a reasonable coverage. The results do seem quite different to the ETAPE results in the paper.

@hedonistrh
Copy link

@bml1g12 Thanks for sharing these results. Yes, according to paper both metric should be like %90. We use different dataset, however, I think, results are not good.

@hbredin
Copy link
Member

hbredin commented Jul 30, 2018

Taking a (short) break from my (long) summer break to comment on this issue.

You are actually comparing apples and oranges.

The original paper reports SegmentationPurity and SegmentationCoverage while the above script reports DiarizationPurity and DiarizationCoverage.

You should use SegmentationPurity and SegmentationCoverage to reproduce results in the paper.

More info about the different metrics can be found in pyannote.metrics paper.

@bml1g12
Copy link
Contributor

bml1g12 commented Jul 31, 2018

Indeed, using SegmentationPurity and SegmentationCoverage I obtain:
image

@hedonistrh
Copy link

@hbredin Thanks for the comment.

@bml1g12 May you share hypothesis.to_annotation() result? Also, thanks for sharing these results.

@bml1g12
Copy link
Contributor

bml1g12 commented Jul 31, 2018

Using alpha=0.2, min_duration=1.0 on the first file EN2002b.Mix-Headset, zoomed in on the first 200 seconds for clarity notebook.crop = Segment(0,200)

image

@hedonistrh
Copy link

Thanks for the reply.

I will try to train it again. Because, as I wrote, I always get %100 coverage.

@bml1g12
Copy link
Contributor

bml1g12 commented Jul 31, 2018

Maybe you are using a bad epoch - how does your validation coverage for the epoch you are using compare? i.e. in the tensorboard file

@bml1g12
Copy link
Contributor

bml1g12 commented Jul 31, 2018

Ah I see your using the 5th epoch, whereas I'm using 1000th - I suspect that is the issue

@hedonistrh
Copy link

hedonistrh commented Jul 31, 2018

I agree with you. My computing resource is limited, so that, I have used a few epochs. When I tried for 100 epochs, loss was non-decreasing after 10th epochs.

Edit: I have tried with weights from 50th epoch and take the alpha as a 0.25. Now, the result make
sense.

screenshot_2018-08-01 jupyterlab 1

Thanks for the helps. 🙏

@hedonistrh
Copy link

@bml1g12 Hello, may you share the weights file? Because, I can not train until 1000th epoch. :)

@saisumit
Copy link
Author

saisumit commented Aug 1, 2018

here you go https://drive.google.com/open?id=10kLHAOBcsvOUlnC_glYFjpuHV25QpmD7

@hedonistrh
Copy link

@saisumit Thanks! 🙏 I have tried it, however, I got this error.
CUDA driver version is insufficient for CUDA runtime version

I have no access for Nvidia Gpu and I am using 16.04 Ubuntu. Probably, error occurs because of these.

@bml1g12
Copy link
Contributor

bml1g12 commented Aug 2, 2018

I'm afraid mine was also run on a cuda GPU so I can't help you either

@hbredin
Copy link
Member

hbredin commented Sep 3, 2018

Closing as it seems that this issue has diverged from the original one.

@hbredin hbredin closed this as completed Sep 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants