🎙 Enhanced Speaker Diarisation with OSD, SS, and Improved Methods 📊🗣️

Welcome to the Enhanced Speaker Diarisation project repository! This project is an evolution of the original VBx diarisation approach developed by Brno University of Technology.

In addition to incorporating Overlapped Speech Detection (OSD) and Source Separation (SS), this project explores various enhancements like the Linkage method, the Knee Locator, and Principal Component Analysis (PCA). While some avenues didn't lead to the desired improvements, this repository serves as a valuable experiment and a platform for potential future advancements.

🛠️ About Enhanced Methods and Techniques

Overlapped Speech Detection (OSD) 👥🔊

We integrated an Overlapped Speech Detection module into our diarisation pipeline. The OSD module, powered by the osd_detection.py script, enabled us to identify segments within conversations where multiple speakers are talking simultaneously. By marking these instances, we were able to gain a more comprehensive understanding of speaker interactions.

Source Separation (SS) 🧩🎤

Another pivotal addition was the implementation of Source Separation using the source_separation.py script. This enhancement tackled the complexity of overlapping speech segments. By disentangling these segments and segregating them into individual speaker recordings, we laid the groundwork for more precise speaker identification.

Linkage Method (Average) 🧬🧮

Our project incorporated an Average Linkage method for hierarchical clustering. Applied to the x-vector data, this technique contributed to the creation of clusters that effectively grouped together segments with similar characteristics. This consolidation of similar segments improved the accuracy of our speaker identification process.

Knee Locator and PCA 🔍📈

As part of our exploration, we also experimented with advanced techniques such as the Knee Locator and Principal Component Analysis (PCA). While these methodologies did not lead to the anticipated enhancements in this iteration, they exemplified our commitment to pushing the boundaries of speaker diarisation methodologies.

🚀 Challenges Faced and Future Directions

Source Separation Enhancement 💡🎯

The project acknowledges the need for further refinement in the Source Separation step. Exploring advanced algorithms and techniques for separating overlapping speech segments could significantly enhance the accuracy of the diarisation process.

Accurate Voice Activity Detection (VAD) 🔊🎚️

Enhancing the VAD module remains a priority. Utilizing state-of-the-art VAD algorithms or fine-tuning existing ones can contribute to the accurate identification of speech segments, a fundamental component of speaker diarisation.

Real-time Diarisation ⏱️🌐

As an exciting future prospect, the project envisions extending its capabilities beyond recorded files to real-time diarisation. This expansion could have applications in live meetings, conferences, and various interactive scenarios.

Additional Speaker Attributes 👤📊

Incorporating identification of additional speaker attributes such as gender, age, and emotional tone opens doors to rich statistical analyses and insights into conversations.

📸 Project Visuals

To provide you with a clearer understanding of our enhanced speaker diarisation project, we've included some visuals below:

Voice Activity Detection (VAD) in Action

Before any diarisation process takes place, the VAD module identifies segments of speech within an audio recording, which are crucial for subsequent analysis.

Overlapped Speech Detection (OSD) Example

In this example, the OSD module identifies overlapping segments within a conversation, helping to pinpoint instances where multiple speakers are talking simultaneously.

Source Separation (SS) Demonstration

The SS enhancement untangles overlapping speech segments and separates them into distinct speaker recordings and outputs a new .lab file, improving the accuracy of speaker identification.

📊 Results

We're excited to present the tangible improvements achieved through our enhanced speaker diarisation approach. Below, you'll find a comparison of the new and improved RTTM files generated by our methods, showcasing the advancements over the previous project's results.

RTTM Comparison

Key Improvements:

Enhanced Speaker Boundaries: Our OSD and SS modules contributed to more accurate identification of speaker boundaries, leading to reduced errors in segment start and end times.
Reduced Overlaps: Overlapped speech segments are effectively separated, resulting in clearer and less cluttered speaker interactions.
Refined Speaker Labels: The clustering enhancements, including the Linkage method, led to more coherent and consistent speaker labels in the RTTM file.

We encourage you to explore the provided RTTM files to witness the impact of our enhancements on speaker diarisation accuracy and quality.

📜 License

This project is built upon the original VBx diarisation approach developed by Brno University of Technology. As such, it adheres to the Apache License, Version 2.0, which governs the usage of the original VBx project.

Licensed under the Apache License, Version 2.0 (the "License"). You may not use this project or its components except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, this project and its components are distributed under the License on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

📬 Contact Information

If you have any questions, suggestions, or would like to collaborate, please feel free to open an issue on this repository. We value your feedback and engagement, and we'll be more than happy to discuss any inquiries you may have.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
AMI-diarization-setup		AMI-diarization-setup
OSD		OSD
SS		SS
VAD		VAD
VBx		VBx
data		data
dscore		dscore
example		example
pics		pics
pretrained_models/sepformer-wsj02mix		pretrained_models/sepformer-wsj02mix
.gitmodules		.gitmodules
AMI_run.sh		AMI_run.sh
CALLHOME_run.sh		CALLHOME_run.sh
DIHARD2_run.sh		DIHARD2_run.sh
ES2005a.lab		ES2005a.lab
README.md		README.md
clean.py		clean.py
convert_mp3_to_wav.py		convert_mp3_to_wav.py
icsiBuild-143331-Sun-Dec-11-2022.wget.sh		icsiBuild-143331-Sun-Dec-11-2022.wget.sh
requirements.txt		requirements.txt
run_example.sh		run_example.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙 Enhanced Speaker Diarisation with OSD, SS, and Improved Methods 📊🗣️

🛠️ About Enhanced Methods and Techniques

Overlapped Speech Detection (OSD) 👥🔊

Source Separation (SS) 🧩🎤

Linkage Method (Average) 🧬🧮

Knee Locator and PCA 🔍📈

🚀 Challenges Faced and Future Directions

Source Separation Enhancement 💡🎯

Accurate Voice Activity Detection (VAD) 🔊🎚️

Real-time Diarisation ⏱️🌐

Additional Speaker Attributes 👤📊

📸 Project Visuals

Voice Activity Detection (VAD) in Action

Overlapped Speech Detection (OSD) Example

Source Separation (SS) Demonstration

📊 Results

RTTM Comparison

📜 License

📬 Contact Information

About

Releases

Packages

Languages

orelz890/Speaker_Diarization_Deep_learning

Folders and files

Latest commit

History

Repository files navigation

🎙 Enhanced Speaker Diarisation with OSD, SS, and Improved Methods 📊🗣️

🛠️ About Enhanced Methods and Techniques

Overlapped Speech Detection (OSD) 👥🔊

Source Separation (SS) 🧩🎤

Linkage Method (Average) 🧬🧮

Knee Locator and PCA 🔍📈

🚀 Challenges Faced and Future Directions

Source Separation Enhancement 💡🎯

Accurate Voice Activity Detection (VAD) 🔊🎚️

Real-time Diarisation ⏱️🌐

Additional Speaker Attributes 👤📊

📸 Project Visuals

Voice Activity Detection (VAD) in Action

Overlapped Speech Detection (OSD) Example

Source Separation (SS) Demonstration

📊 Results

RTTM Comparison

📜 License

📬 Contact Information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages