Skip to content

๐ŸŽ™๏ธ Enhanced Speaker Diarisation ๐Ÿ“’ with OSD, SS, and Advanced VAD๐Ÿ—ฃ๏ธ.

Notifications You must be signed in to change notification settings

orelz890/Speaker_Diarization_Deep_learning

Repository files navigation

๐ŸŽ™ Enhanced Speaker Diarisation with OSD, SS, and Improved Methods ๐Ÿ“Š๐Ÿ—ฃ๏ธ

Welcome to the Enhanced Speaker Diarisation project repository! This project is an evolution of the original VBx diarisation approach developed by Brno University of Technology.

In addition to incorporating Overlapped Speech Detection (OSD) and Source Separation (SS), this project explores various enhancements like the Linkage method, the Knee Locator, and Principal Component Analysis (PCA). While some avenues didn't lead to the desired improvements, this repository serves as a valuable experiment and a platform for potential future advancements.

๐Ÿ› ๏ธ About Enhanced Methods and Techniques

Overlapped Speech Detection (OSD) ๐Ÿ‘ฅ๐Ÿ”Š

We integrated an Overlapped Speech Detection module into our diarisation pipeline. The OSD module, powered by the osd_detection.py script, enabled us to identify segments within conversations where multiple speakers are talking simultaneously. By marking these instances, we were able to gain a more comprehensive understanding of speaker interactions.

Source Separation (SS) ๐Ÿงฉ๐ŸŽค

Another pivotal addition was the implementation of Source Separation using the source_separation.py script. This enhancement tackled the complexity of overlapping speech segments. By disentangling these segments and segregating them into individual speaker recordings, we laid the groundwork for more precise speaker identification.

Linkage Method (Average) ๐Ÿงฌ๐Ÿงฎ

Our project incorporated an Average Linkage method for hierarchical clustering. Applied to the x-vector data, this technique contributed to the creation of clusters that effectively grouped together segments with similar characteristics. This consolidation of similar segments improved the accuracy of our speaker identification process.

Knee Locator and PCA ๐Ÿ”๐Ÿ“ˆ

As part of our exploration, we also experimented with advanced techniques such as the Knee Locator and Principal Component Analysis (PCA). While these methodologies did not lead to the anticipated enhancements in this iteration, they exemplified our commitment to pushing the boundaries of speaker diarisation methodologies.

๐Ÿš€ Challenges Faced and Future Directions

Source Separation Enhancement ๐Ÿ’ก๐ŸŽฏ

The project acknowledges the need for further refinement in the Source Separation step. Exploring advanced algorithms and techniques for separating overlapping speech segments could significantly enhance the accuracy of the diarisation process.

Accurate Voice Activity Detection (VAD) ๐Ÿ”Š๐ŸŽš๏ธ

Enhancing the VAD module remains a priority. Utilizing state-of-the-art VAD algorithms or fine-tuning existing ones can contribute to the accurate identification of speech segments, a fundamental component of speaker diarisation.

Real-time Diarisation โฑ๏ธ๐ŸŒ

As an exciting future prospect, the project envisions extending its capabilities beyond recorded files to real-time diarisation. This expansion could have applications in live meetings, conferences, and various interactive scenarios.

Additional Speaker Attributes ๐Ÿ‘ค๐Ÿ“Š

Incorporating identification of additional speaker attributes such as gender, age, and emotional tone opens doors to rich statistical analyses and insights into conversations.

๐Ÿ“ธ Project Visuals

To provide you with a clearer understanding of our enhanced speaker diarisation project, we've included some visuals below:

Voice Activity Detection (VAD) in Action

VAD Example

Before any diarisation process takes place, the VAD module identifies segments of speech within an audio recording, which are crucial for subsequent analysis.

Overlapped Speech Detection (OSD) Example

OSD Example

In this example, the OSD module identifies overlapping segments within a conversation, helping to pinpoint instances where multiple speakers are talking simultaneously.

Source Separation (SS) Demonstration

The SS enhancement untangles overlapping speech segments and separates them into distinct speaker recordings and outputs a new .lab file, improving the accuracy of speaker identification.

๐Ÿ“Š Results

We're excited to present the tangible improvements achieved through our enhanced speaker diarisation approach. Below, you'll find a comparison of the new and improved RTTM files generated by our methods, showcasing the advancements over the previous project's results.

RTTM Comparison

Comparison

Key Improvements:

  • Enhanced Speaker Boundaries: Our OSD and SS modules contributed to more accurate identification of speaker boundaries, leading to reduced errors in segment start and end times.
  • Reduced Overlaps: Overlapped speech segments are effectively separated, resulting in clearer and less cluttered speaker interactions.
  • Refined Speaker Labels: The clustering enhancements, including the Linkage method, led to more coherent and consistent speaker labels in the RTTM file.

We encourage you to explore the provided RTTM files to witness the impact of our enhancements on speaker diarisation accuracy and quality.

๐Ÿ“œ License

This project is built upon the original VBx diarisation approach developed by Brno University of Technology. As such, it adheres to the Apache License, Version 2.0, which governs the usage of the original VBx project.

Licensed under the Apache License, Version 2.0 (the "License"). You may not use this project or its components except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, this project and its components are distributed under the License on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

๐Ÿ“ฌ Contact Information

If you have any questions, suggestions, or would like to collaborate, please feel free to open an issue on this repository. We value your feedback and engagement, and we'll be more than happy to discuss any inquiries you may have.

About

๐ŸŽ™๏ธ Enhanced Speaker Diarisation ๐Ÿ“’ with OSD, SS, and Advanced VAD๐Ÿ—ฃ๏ธ.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages