Speech Quefrency Transform (SQT)

This is an Automatic Speech Recognition (ASR) interface that converts speech audio to and from the cepstral domain. In other words, it extracts the locally stationary speech featuers from the speech signals, and it reconstructs them back to speech singals. It represents the speech series by two features and has several applications.

Demo

Abstract

The efficient use of a communication bandwidth starts with the data source. The features of the speech signals can be extracted and reconstructed to lower the Internet traffic of the acoustic artificial agents and increase the quality of the automatic speech recognition systems. The Speech Quefrency Transform (SQT) is hereby introduced in the work to enrich the communication space between the artificial agents and mankind. We describe the motivation, methodology, and deep learning approach in detail as we apply the SQT technology to several applications: sharp pitch track extraction, real-time speech communications, and emotion recognition. Combining multiple processes, attenuating background noises, and enabling distant-speech recognition, we introduce the SQT cepstrograms as well as multiple quefrency scales. SQT is a set of frequency transforms whose spectral leakages are controlled per a frequency-modulation model. SQT captures the stationarity of time series onto a hyperspace that resembles the cepstrogram when it is reduced for pitch track extraction. This library is an interface for Automatic Speech Recognition (ASR) for converting an audio series to and from the cepstral domain.

Speech Featrues

Pitch Tracks
Responsive-Band Spectrograms (harmonic series).

Quefrency Scales

Linear-Space
Reciprocal (this is similar to the MFCC scale)
Geometrical (this is based on the two dimensional view of the pitch function)

Applications

Speech Analysis and Synthesis
Natural Language Processing (NLP)
Machine Learning (ML)
Telecommunications

Python Library

To get started, please the following instructions. There is also a Jupyter Notebook Demo in the package directory.

Install:

pip install sqtpy

Initalize a new instance

import sqtpy
sqt = sqtpy.SQT( N = 50 , Rs = 30 , Fs = 8000 )

Extract Pitch Track and speech features

[F0, Hm, Et] = sqt.encode( I0 )

Estimate the original speech signal from the extracted speech features

I = sqt.decode( F0, Hm )

The Int16 Class

We prepared this version of the sqt library for the Raspberry Pi combatablity. This version prioritizes space over speed and quality. In this vesrion, 256-level variables are multiplied while they are in 16-bit-integer-data type, and its temporal and cepstral filterings are done sequantally inside for-loops, so low-order filtering is sticktly recommened in this version.

sqt = sqtpy.SQTint16( N = 50 , Rs = 30 , Fs = 8000 )

JavaScript Library

No JS version has been released yet. Please check again later.

For Developers & Researchers

Repository:

https://github.com/hasanainaz/sqt

BibTeX Citations

Engienering Scientists:

@article{hasanain2022sqtp,
  title={Multi-dimensional spectral process for cepstral feature engineering \& Formant Coding},
  author={Hasanain, Ahmad Z. and Syed, Muntaser M. and Kepuska, Veton Z and Silaghi, Marius C},
  journal={J Electrical Electron Eng},
  publisher = {Opast Group {LLC}},
  url={https://opastpublishers.com/open-access/multi-dimensional-spectral-process-for-cepstral-feature-engineering-rnformant-coding.pdf},
  volume={1},
  number={1},
  pages={1--20},
  year={2022},
  month={aug},
}

Linguistics Scientists:

@article{hasanain2022sqtm,
  title={A Robust Speech Features Extractor \& Reconstructor For Artificial Intelligence Frontends},
  author={Hasanain, A.Z. and Strother, J.B. and Silaghi, M.C. and Kepuska, V.Z. and Kostanic, I.N. and Anagnostopoulos, G.C.},
  journal = {Journal of Humanities \& Social Sciences},
  publisher = {Opast Group {LLC}},
  doi={10.33140/JHSS.05.03.06},
  url={http://doi.org/10.33140/JHSS.05.03.06},
  volume={5},
  number={3},
  pages={230--247},
  year={2022},
  month={aug},
}

License:

The SQT programs are free software. They are provided for personal use in the hope that they will be useful but without any warranties, including without limitation warranties of fitness for a particular purpose and merchantability. For commerical, please find the open-source license.

GNU Affero General Public License v3

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
docs		docs
supplementary		supplementary
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Quefrency Transform (SQT)

Demo

Abstract

Speech Featrues

Quefrency Scales

Applications

Python Library

The Int16 Class

JavaScript Library

For Developers & Researchers

BibTeX Citations

License:

About

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Speech Quefrency Transform (SQT)

Demo

Abstract

Speech Featrues

Quefrency Scales

Applications

Python Library

The Int16 Class

JavaScript Library

For Developers & Researchers

BibTeX Citations

License:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages