Merge branch 'release/3.1.1'

pyannote · Dec 1, 2023 · 6a972c0 · 6a972c0
2 parents f45da71 + c657362
commit 6a972c0
Show file tree

Hide file tree

Showing 7 changed files with 1,186 additions and 3,865 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,14 @@
 # Changelog
 
-## `develop` branch
+## Version 3.1.1 (2023-12-01)
+
+### TL;DR
+
+Providing `num_speakers` to [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) now [works as expected](https://github.com/pyannote/pyannote-audio/issues/1567).
+
+### Fixes
+
+- fix(pipeline): fix support for setting `num_speakers` in [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) pipeline
 
 ## Version 3.1.0 (2023-11-16)
 

diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,128 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, religion, or sexual identity
+and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the
+  overall community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or
+  advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email
+  address, without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+herve.bredin@irit.fr.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series
+of actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or
+permanent ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior,  harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within
+the community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.0, available at
+https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+
+Community Impact Guidelines were inspired by [Mozilla's code of conduct
+enforcement ladder](https://github.com/mozilla/diversity).
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see the FAQ at
+https://www.contributor-covenant.org/faq. Translations are available at
+https://www.contributor-covenant.org/translations.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-Using `pyannote.audio` open-source toolkit in production?  
+Using `pyannote.audio` open-source toolkit in production?
 Make the most of it thanks to our [consulting services](https://herve.niderb.fr/consulting.html).
 
 # `pyannote.audio` speaker diarization toolkit
@@ -9,19 +9,17 @@ Make the most of it thanks to our [consulting services](https://herve.niderb.fr/
  <a href="https://www.youtube.com/watch?v=37R_R82lfwA"><img src="https://img.youtube.com/vi/37R_R82lfwA/0.jpg"></a>
 </p>
 
-
 ## TL;DR
 
-1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.0` with `pip install pyannote.audio`
+1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) with `pip install pyannote.audio`
 2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
-3. Accept [`pyannote/speaker-diarization-3.0`](https://hf.co/pyannote/speaker-diarization-3.0) user conditions
+3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
 4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
 
-
 ```python
 from pyannote.audio import Pipeline
 pipeline = Pipeline.from_pretrained(
-    "pyannote/speaker-diarization-3.0",
+    "pyannote/speaker-diarization-3.1",
     use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
 
 # send pipeline to GPU (when available)
@@ -47,50 +45,53 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
 - :snake: Python-first API
 - :zap: multi-GPU training with [pytorch-lightning](https://pytorchlightning.ai/)
 
-
 ## Documentation
 
 - [Changelog](CHANGELOG.md)
 - [Frequently asked questions](FAQ.md)
 - Models
-    - Available tasks explained
-    - [Applying a pretrained model](tutorials/applying_a_model.ipynb)
-    - [Training, fine-tuning, and transfer learning](tutorials/training_a_model.ipynb)
+  - Available tasks explained
+  - [Applying a pretrained model](tutorials/applying_a_model.ipynb)
+  - [Training, fine-tuning, and transfer learning](tutorials/training_a_model.ipynb)
 - Pipelines
-    - Available pipelines explained
-    - [Applying a pretrained pipeline](tutorials/applying_a_pipeline.ipynb)
-    - [Adapting a pretrained pipeline to your own data](tutorials/adapting_pretrained_pipeline.ipynb)
-    - [Training a pipeline](tutorials/voice_activity_detection.ipynb)
+  - Available pipelines explained
+  - [Applying a pretrained pipeline](tutorials/applying_a_pipeline.ipynb)
+  - [Adapting a pretrained pipeline to your own data](tutorials/adapting_pretrained_pipeline.ipynb)
+  - [Training a pipeline](tutorials/voice_activity_detection.ipynb)
 - Contributing
-    - [Adding a new model](tutorials/add_your_own_model.ipynb)
-    - [Adding a new task](tutorials/add_your_own_task.ipynb)
-    - Adding a new pipeline
-    - Sharing pretrained models and pipelines
+  - [Adding a new model](tutorials/add_your_own_model.ipynb)
+  - [Adding a new task](tutorials/add_your_own_task.ipynb)
+  - Adding a new pipeline
+  - Sharing pretrained models and pipelines
 - Blog
-    - 2022-12-02 > ["How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"](tutorials/adapting_pretrained_pipeline.ipynb)
-    - 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
-    - 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)
+  - 2022-12-02 > ["How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"](tutorials/adapting_pretrained_pipeline.ipynb)
+  - 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
+  - 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)
 - Videos
   - [Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min
   - [Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min
-  - [First releaase of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 /  8 min
+  - [First releaase of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
 
 ## Benchmark
 
-Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.0) v3.0 is expected to be much better (and faster) than v2.x.  
+Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.1) v3.1 is expected to be much better (and faster) than v2.x.
 Those numbers are diarization error rates (in %):
 
-| Dataset \ Version      | v1.1 | v2.0 | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.0](https://hf.co/pyannote/speaker-diarization-3.0) |  <a href="mailto:herve-at-niderb-dot-fr?subject=Premium pyannote.audio pipeline&body=Looks like I got your attention! Drop me an email for more details. Hervé.">Premium</a>  |
-| ---------------------- | ---- | ---- | ------ | ------ | --------- |
-| AISHELL-4              | -    | 14.6 |  14.1  |  12.3  | 12.3      |
-| AliMeeting (channel 1) | -    | -    |  27.4  |  24.3  | 19.4      |
-| AMI (IHM)              | 29.7 | 18.2 |  18.9  |  19.0  | 16.7      |
-| AMI (SDM)              | -    | 29.0 |  27.1  |  22.2  | 20.1      |
-| AVA-AVD                | -    | -    |  -     |  49.1  | 42.7      |
-| DIHARD 3 (full)        | 29.2 | 21.0 |  26.9  |  21.7  | 17.0      |
-| MSDWild                | -    | -    |  -     |  24.6  | 20.4      |
-| REPERE (phase2)        | -    | 12.6 |   8.2  |   7.8  |  7.8      |
-| VoxConverse (v0.3)     | 21.5 | 12.6 |  11.2  |  11.3  |  9.5      |
+| Benchmark              | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) |
+| ---------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ---------------------------------------------- |
+| AISHELL-4              | 14.1                                                   | 12.3                                                   | 11.9                                           |
+| AliMeeting (channel 1) | 27.4                                                   | 24.5                                                   | 22.5                                           |
+| AMI (IHM)              | 18.9                                                   | 18.8                                                   | 16.6                                           |
+| AMI (SDM)              | 27.1                                                   | 22.6                                                   | 20.9                                           |
+| AVA-AVD                | 66.3                                                   | 50.0                                                   | 39.8                                           |
+| CALLHOME (part 2)      | 31.6                                                   | 28.4                                                   | 22.2                                           |
+| DIHARD 3 (full)        | 26.9                                                   | 21.4                                                   | 17.2                                           |
+| Ego4D (dev.)           | 61.5                                                   | 51.2                                                   | 43.8                                           |
+| MSDWild                | 32.8                                                   | 25.4                                                   | 19.8                                           |
+| REPERE (phase2)        | 8.2                                                    | 7.8                                                    | 7.6                                            |
+| VoxConverse (v0.3)     | 11.2                                                   | 11.2                                                   | 9.4                                            |
+
+[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
 
 ## Citations
 

diff --git a/doc/requirements.txt b/doc/requirements.txt
@@ -1,4 +1,4 @@
-ipython==7.16.3
+ipython==8.10.0
 recommonmark
-Sphinx==2.2.2
+Sphinx==3.0.4
 sphinx_rtd_theme==0.4.3
diff --git a/pyannote/audio/pipelines/clustering.py b/pyannote/audio/pipelines/clustering.py
@@ -97,7 +97,13 @@ def filter_embeddings(
         speaker_idx : (num_embeddings, ) array
         """
 
-        chunk_idx, speaker_idx = np.where(~np.any(np.isnan(embeddings), axis=2))
+        # whether speaker is active
+        active = np.sum(segmentations.data, axis=1) > 0
+        # whether speaker embedding extraction went fine
+        valid = ~np.any(np.isnan(embeddings), axis=2)
+
+        # indices of embeddings that are both active and valid
+        chunk_idx, speaker_idx = np.where(active * valid)
 
         # sample max_num_embeddings embeddings
         num_embeddings = len(chunk_idx)
@@ -240,6 +246,7 @@ def __call__(
         )
 
         num_embeddings, _ = train_embeddings.shape
+
         num_clusters, min_clusters, max_clusters = self.set_num_clusters(
             num_embeddings,
             num_clusters=num_clusters,