@inproceedings{hough-schlangen-2017-joint, title = "Joint, Incremental Disfluency Detection and Utterance Segmentation from Speech", author = "Hough, Julian and Schlangen, David", booktitle = "Proceedings of the 15th Conference of the {E}uropean Chapter of the Association for Computational Linguistics: Volume 1, Long Papers", month = apr, year = "2017", address = "Valencia, Spain", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/E17-1031", pages = "326--336", abstract = "We present the joint task of incremental disfluency detection and utterance segmentation and a simple deep learning system which performs it on transcripts and ASR results. We show how the constraints of the two tasks interact. Our joint-task system outperforms the equivalent individual task systems, provides competitive results and is suitable for future use in conversation agents in the psychiatric domain.", } Source Repo Link: https://github.com/clp-research/deep_disfluency
For installation follow instructions on source repo.
A file named check_language_tagger.py does the job of getting clean text after removing disfluencies. The source and destination path needs to changed in the python file while installing it on local machine.