Skip to content

B.Sc. Dissertation Code. Compare Maltese speech to the prompt that was read out to ensure they match. Generate syllable distance and segment distance between the speech and text. Unsupervised and low-resource.

Notifications You must be signed in to change notification settings

ianpadovani/SpeechTextValidation

Repository files navigation

SpeechTextValidation

This project makes use of syllable counting and segment counting to compare a text prompt to a snippet of speech.

Requirements

Installable via pip:

  • Numpy
  • Pandas
  • Scipy
  • Matplotlib
  • Parselmouth

Requires access:

  • Textmt
    • G2PTool
    • Tokenise

Scripts

These are the main scripts contained in this project. Any other scripts appearing in this repository are for testing purposes. mfcc_extract.py must be used first for the other scripts to work.

mfcc_extract.py

This script extracts features from speech recordings in all of the folders of a given directory. It saves the extracted MFCC features for each .wav file as a .npy file. It also stores the filepaths for each .wav and .npy file in a .csv file for later use. It is designed to work with the file structures of the TIMIT and WSJCAM0 corpora. Changes need to be made to accomodate other file structures.

The directory path can be changed at the start of the main method on line 29.

syllable_nuclei.py

This script was originally taken from a GitHub repository by David Feinberg (2019), and adapted for the needs of this project. A citeable link to this repository and its contents can be found here: http://doi.org/10.17605/OSF.IO/6DWR3

This script detects syllable nuclei and returns the number of detected syllables and the number of pauses, amongst other metrics which are not used in this project.

kernel_gram.py

This script uses Gaussian Kernel Similarity to generate a kernel-gram matrix for each speech file given. Using this, it can roughly detect segment boundaries in speech. This approach is based on the system outlined by Bhati, Nayak and Sri Rama Murty (2018). This script has many functions. It can return csv files showing the evaluation of this system on the TIMIT and WSJCAM0 corpora or show spectrogram and kernel gram comparisons between the gold standard segment boundaries and those generated by the script.

The root directory can be changed on line 309 towards the beginning of the main method. Hyperparameters may also be ammended in likes 301-306. Certain sections of code may need to be commented/uncommented to switch between evaluation on TIMIT and WSJCAM corpora or displaying comparison spectrograms.

sound_analysis.py

Uses Maltese G2P tool to transcribe prompts and performs further rule-based transformations to make these prompts comparable with the syllable and segment counting tools. It then calls functions from syllable_nuclei.py and kernel_gram.py to count syllables and segments, and returns relevant information from each of these tools as a .csv file.

The root directory chan be changed at the start of the main method on line 44.

References

Bhati, S., Nayak, S., & Sri Rama Murty, K. (2018). Unsupervised segmentation of speech signals using kernel-gram matrices. In R. Rameshan, C. Arora, & S. Dutta Roy (Eds.), Computer vision, pattern recognition, image processing, and graphics(pp. 139–149). Singapore: Springer Singapore.

Feinberg, D. R. (2019, Oct). Parselmouth praat scripts in python. OSF. Retrieved from osf.io/6dwr3 doi: 10.17605/OSF.IO/6DWR3

About

B.Sc. Dissertation Code. Compare Maltese speech to the prompt that was read out to ensure they match. Generate syllable distance and segment distance between the speech and text. Unsupervised and low-resource.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published