SpeechTextValidation

This project makes use of syllable counting and segment counting to compare a text prompt to a snippet of speech.

Requirements

Installable via pip:

Numpy
Pandas
Scipy
Matplotlib
Parselmouth

Requires access:

Textmt
- G2PTool
- Tokenise

Scripts

These are the main scripts contained in this project. Any other scripts appearing in this repository are for testing purposes. mfcc_extract.py must be used first for the other scripts to work.

mfcc_extract.py

This script extracts features from speech recordings in all of the folders of a given directory. It saves the extracted MFCC features for each .wav file as a .npy file. It also stores the filepaths for each .wav and .npy file in a .csv file for later use. It is designed to work with the file structures of the TIMIT and WSJCAM0 corpora. Changes need to be made to accomodate other file structures.

The directory path can be changed at the start of the main method on line 29.

syllable_nuclei.py

This script was originally taken from a GitHub repository by David Feinberg (2019), and adapted for the needs of this project. A citeable link to this repository and its contents can be found here: http://doi.org/10.17605/OSF.IO/6DWR3

This script detects syllable nuclei and returns the number of detected syllables and the number of pauses, amongst other metrics which are not used in this project.

kernel_gram.py

This script uses Gaussian Kernel Similarity to generate a kernel-gram matrix for each speech file given. Using this, it can roughly detect segment boundaries in speech. This approach is based on the system outlined by Bhati, Nayak and Sri Rama Murty (2018). This script has many functions. It can return csv files showing the evaluation of this system on the TIMIT and WSJCAM0 corpora or show spectrogram and kernel gram comparisons between the gold standard segment boundaries and those generated by the script.

The root directory can be changed on line 309 towards the beginning of the main method. Hyperparameters may also be ammended in likes 301-306. Certain sections of code may need to be commented/uncommented to switch between evaluation on TIMIT and WSJCAM corpora or displaying comparison spectrograms.

sound_analysis.py

Uses Maltese G2P tool to transcribe prompts and performs further rule-based transformations to make these prompts comparable with the syllable and segment counting tools. It then calls functions from syllable_nuclei.py and kernel_gram.py to count syllables and segments, and returns relevant information from each of these tools as a .csv file.

The root directory chan be changed at the start of the main method on line 44.

References

Bhati, S., Nayak, S., & Sri Rama Murty, K. (2018). Unsupervised segmentation of speech signals using kernel-gram matrices. In R. Rameshan, C. Arora, & S. Dutta Roy (Eds.), Computer vision, pattern recognition, image processing, and graphics(pp. 139–149). Singapore: Springer Singapore.

Feinberg, D. R. (2019, Oct). Parselmouth praat scripts in python. OSF. Retrieved from osf.io/6dwr3 doi: 10.17605/OSF.IO/6DWR3

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
__pycache__		__pycache__
venv		venv
.gitattributes		.gitattributes
README.md		README.md
kernel_gram.py		kernel_gram.py
kernel_gram_fixed.py		kernel_gram_fixed.py
mfcc_extract.py		mfcc_extract.py
paths.csv		paths.csv
sound_analysis.py		sound_analysis.py
speechrate_data.csv		speechrate_data.csv
syllable_nuclei.py		syllable_nuclei.py
vowel_onset_detection.py		vowel_onset_detection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechTextValidation

Requirements

Scripts

mfcc_extract.py

syllable_nuclei.py

kernel_gram.py

sound_analysis.py

References

About

Releases

Packages

Languages

ianpadovani/SpeechTextValidation

Folders and files

Latest commit

History

Repository files navigation

SpeechTextValidation

Requirements

Scripts

mfcc_extract.py

syllable_nuclei.py

kernel_gram.py

sound_analysis.py

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages