gsoc2018-redhen-saisumit

Mentors:

Mehul Bhatt ( http://www.mehulbhatt.org/ )
Jakob Suchan ( https://cosy.informatik.uni-bremen.de/staff/jakob-suchan.html )
Sri Krishna ( https://skrish13.github.io )

Organisation : Redhen Labs ( http://www.redhenlab.org/ )

Audio Analysis of Egocentric and Third-person videos using Red-Hen Labs :

The following library provides the functionality of the following types:

Speech Identification ( When the person is speaking or not )
Speaker Diarization using Lium and Aalto diarization libraries.
Scene Identification
Speech Recognition

Running the pipeline

Clone the repository
cd Audio-Analysis-RedHen-saisumit
Place the desired input file as input.mp4.
Run one of the followig undermentioned pipelines.

Speech Identification

Speech identification uses the problem of identifying the speaking regions in the media. It classifies the regions in the 3 categories :

Silent Regions
People-speaking Regions
Other Regions


 ./audio_runner.sh -f sound_event_detection

References :

Speaker Diarization

Speaker diarization is a task of identifying the speaker and indexing those speakers. Support of two diarizartion libraries is provided in this project :

Aalto Diarization Tool
Lium Diarization Tool

Aalto is used for displaying the video results but Lium Results are also avaiable in text format which can be accessed as described in the Results.

 ./audio_runner.sh  -f speaker_diarization

References :

Scene Identification

Scene identification is task of identifying the scene of the media. Following categories are considered and classification is provided based on that:

Bus
Car
City Center
Residential Area/Meeting Roomn
Home
Beach
Library
Metro Station
Office
Train
Tram
Park
Pub

 ./audio_runner.sh  -f scene_identification

References:

Speech Recognition

Speech Recognition is identifying what the person is speaking. It makes use of Google Cloud Speech-to-text to identify what the person is speaking. It runs the complete pipeline giving you the combined results of all the previous pipelines.

References:

 ./audio_runner.sh

RESULTS

There are two types of results available with this pipeline.

The final video output with results are available in final_output/result.mp4
Also all the intermediate results of individual pipeline are available in Output_Files/

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
evaluation_setup		evaluation_setup
feat		feat
fin_model		fin_model
lium-diarization		lium-diarization
models		models
speech-to-text		speech-to-text
README.md		README.md
audio_runner.sh		audio_runner.sh
feature.py		feature.py
metrics.py		metrics.py
metrics.pyc		metrics.pyc
mute_audio.py		mute_audio.py
new_runner.sh		new_runner.sh
people_speaking.csv		people_speaking.csv
results_sed.csv		results_sed.csv
runner.sh		runner.sh
scene.py		scene.py
scene_feature_gen.py		scene_feature_gen.py
scene_label_generator.py		scene_label_generator.py
sed.py		sed.py
silence_sed_detector.py		silence_sed_detector.py
speaker_diarization.py		speaker_diarization.py
speech_rec.py		speech_rec.py
test.csv		test.csv
transcript.txt		transcript.txt
ur file.csv		ur file.csv
utils.py		utils.py
utils.pyc		utils.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gsoc2018-redhen-saisumit

Audio Analysis of Egocentric and Third-person videos using Red-Hen Labs :

Running the pipeline

Speech Identification

Speaker Diarization

Scene Identification

Speech Recognition

RESULTS

About

Releases

Packages

Languages

saisumit/gsoc2018-redhen-saisumit

Folders and files

Latest commit

History

Repository files navigation

gsoc2018-redhen-saisumit

Audio Analysis of Egocentric and Third-person videos using Red-Hen Labs :

Running the pipeline

Speech Identification

Speaker Diarization

Scene Identification

Speech Recognition

RESULTS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages